Data Collection

The corpus consists of essays written by native Spanish students who are learning English within the Department of English at the Universidad Autonoma de Madrid. In particular, students within the first and third year of the degree take Academic Writing courses, and these students were asked if the essays they submitted for marking could be included in the corpus.

  1. Collection of essays and metadata: Students submitted their essays, either by email or on disk. Additionally, they passed to the teacher two information sheets:
  2. Oxford Quick Placement Test: Each student took the Oxford Quick Placement Test at a time close to the writing of the essays. The first year students took the test within a month of writing the essays, while the third year students took the test within the same semester.

  3. Data Entry: The learner and essay profile forms were entered into an Excel spreadsheet by Ivan Teomiro.
  4. Text Normalisation: Paul Rollinson normalised each submitted text in accordance with the process used in the ICLE corpus. All personal data, titles, footnotes, endnotes, graphics, maps and bibliographies were stripped out, and quotations and references were replaced with <Q> and <R> respectively.

Data collection began in October 2005. As of October 2008, the corpus consists of approximately 752 essays, containing around 750,000 words. The essays are stored in electronic format and range from 500 words up to 2,000 words.