Methods and Practical Issues in Evaluating Alignment Techniques
Philippe Langlais1, Michel Simard2 and Jean Véronis3
|
|
This paper describes the work achieved in the 2-year cooperative research project ARCADE, supported and coordinated by AUPELF-UREF. The project is concerned with the evaluation of the alignment of bi- or multi-lingual textual corpora. Six laboratories are involved in this project and had the responsibility of; a) the production of a bilingual corpus suited for the alignment task and of b) the reflection and the phasing-in of a protocol in order to objectively evaluate alignment systems. This endeavour leads to two main repercussions. First, a large and rich standardized bilingual (French-English) corpus is now made available gathering texts of several natures with various degrees of difficulty for the alignment task. Second, significant methodological progress was made not only from an evaluation point of view, but also regarding the algorithms implemented in the different systems. The paper also reports the results of a comparative study gathering six alignment systems. The protocol phased in is discussed in the light of the results observed. |
|
|
|
|
1KTH-CTT S-10044 Stockholm, Sweden & CERI-LIA, AGROPARC BP 1228, 84911 Avignon Cedex 9, France Philippe.Langlais@speech.kth.se 2DIRO-RALI, Université de Montréal (Québec), Canada H3C 3J7 simardm@IRO.UMontreal.CA 3LPL, Université de Provence & CNRS, 13621 Aix-en-Provence Cedex 1, France Jean.Veronis@lpl.univ-aix.fr |
ARCADE is financed by AUPELF•UREF. This paper reflects many of the discussions among the ARCADE participants. We thank Pierre Isabelle and Michel Simard (RALI), Laurent Romary and Patrice Bonhomme (LORIA), Fathi Debili and Emna Souissi (IRMC), Susan Armstrong and Pieter Theron (ISSCO) for their collaboration. We also thank Joseph Mariani, co-ordinator of the ARC, for his contant encouragements and help.