Expert Advisory Group on Language Engineering Standards
Corpus Group / Text Representation subgroup
Contact address
Nancy Ide, chair
The MULTEXT project and the EAGLES subgroup on Text Representation have joined efforts to develop a Corpus Encoding Standard (CES) optimally suited for use in language engineering, which can serve as a widely accepted set of encoding standards for European corpus work. The overall goal is the identification of a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and linguistic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding conventions for more extensive encoding and for linguistic annotation.
MULTEXT/EAGLES
Corpus Encoding Standard: Background and Principles (postscript)
MULTEXT/EAGLES
Corpus Encoding Standard
EAGLES Workshop (Madrid, 18-20 Jan. 1996)
Invitation to Text Representation session
Transparencies postscript
(ca. 600 Ko) or
Powerpoint (Mac)
Page under construction