The corpus consists of essays written by native Spanish students who are learning English within the Department of English at the Universidad Autonoma de Madrid. In particular, students within the first and third year of the degree take Academic Writing courses, and these students were asked if the essays they submitted for marking could be included in the corpus.
- Collection of essays and metadata: Students submitted their essays, either by email or on disk. Additionally,
they passed to the teacher two information sheets:
- Release forms/Essay Profile: For each essay submitted, the student provided an "essay profile" form, detailing the resources they used to write the essay. The form also includes a section where the student grants permission for the essay to ve used for research purposes. An example of the form is available here.
- Learner Profile: Each student filled in a "learner profile" form, which solicits information regarding age, gender, language background, English language proficiency, etc. An example of the form is available here.
- Oxford Quick Placement Test: Each student took the Oxford Quick Placement Test at a time close to the writing of the essays.
The first year students took the test within a month of writing the essays, while the third year students took the test within the same
- Data Entry: The learner and essay profile forms were entered into an Excel spreadsheet by Ivan Teomiro.
- Text Normalisation: Paul Rollinson normalised each submitted text in accordance with the process used in the ICLE corpus. All personal data, titles, footnotes, endnotes, graphics, maps and bibliographies were stripped out, and quotations and references were replaced with <Q> and <R> respectively.
Data collection began in October 2005. As of October 2008, the corpus consists of approximately 752 essays, containing around 750,000 words. The essays are stored in electronic format and range from 500 words up to 2,000 words.