ARCADE     
Word track - Overview
 

Introduction

Evaluation of word alignment between parallel texts poses a lot more problems than the evaluation of sentence alignment. First, the problem is much more difficult in itself, given the differences in word order between languages, the high frequency of multi-token expressions and their potential discontinuity, the difference in part-of-speech and syntactic constructs between the source and its translation, etc. Second, there is no prior experience in evaluation of parallel text alignment at the word level; in particular, unlike the sentence track, there was no evaluation at word level in the first campaign of ARCADE.  

Therefore, the goals, at least in the first round (1998), will necessarily be very modest, involving a very restricted task, and it is clear that the evaluation will be imperfect and non satisfactory in many ways. However, it seems worth trying, and the main results are likely to be what we will learn in doing the exercise, though the discussion among the participants (and observers).   

The task proposed for the first round is that of translation spotting, which is only a subtask of the global word alignment problem.  
 

Translation spotting

Full alignment at word level between a text and it translation is a very difficult problem.The figure below shows an example of such an alignment (of course some alignment decisions taken in this example can be questionned, as in almost every sentence). 
Full alignment at word level
  
Translation spotting can be seen as a simpler sub-problem of full alignment. Given a particular word or expression in the source text, it consists in detecting its translation in the target text. An obvious application is the highlighting of translations for particular words on parallel texts presented on a screen, as in multilingual concordancing: 
 
 
La BERD apporte une contribution supplémentaire et joue un rôle catalyseur en promouvant l'investissement dans les pays où elle opère. The EBRD brings an additional contribution and plays a catalytic rôle to foster investments in its countries of operations.
Le même numéro de cette revue apporte de nouvelles précisions sur des initiatives prises par des entreprises japonaises pour favoriser la recherche fondamentale dans des laboratoires créés à cette fin aux États-Unis d'Amérique. The very same issue, however, contains new information regarding initiatives by Japanese companies aimed at promoting basic research in laboratories set up for that purpose in the US.
  Translation spotting
 

Difficulties

Of course, if a system is capable of spotting the right translation for all words in the source language, it can also perform full alignment. However, the translation spotting task can be made less difficult than full alignment, since the task can be restricted to a set of words, and can entirely avoid particularly difficult cases, such as the translation of grammatical words (prepositions, auxiliary verbs, etc.). 

There are, however, many difficulties that will pose problems for the constitution of a reference corpus and for the evaluation. The straighforward case where one token in the source correspond to one token in the translation (as illustrated above) is unfortunately the only one. 

For example, a single-token word in the source can correspond to a multi-word unit in the target, either for lexical reasons or grammatical ones (such as inclusion of particle, change in tense or mood, etc.): 
 
Quelles modifications la Commission compte-t-elle apporter en matière fiscale pour introduire un taux réduit incitatif en vue de développer le débroussaillage? What tax changes does the Commission intend to introduce to encourage clearance of undergrowth?
La Communauté européenne apporte une aide aux réfugiés palestiniens depuis 1972. The European Community has been providing assistance to Palestinian refugees since 1972.
 
Conversely, in many cases, the source token cannot be translated in isolation, but is part of an expression that is translated as a whole: 
 
Ce rapport apporte la preuve qu'une aussi grande importance soit accordée aux véhicules automobiles. The objective of the report on cities without cars was to demonstrate the principles and socioeconomic consequences of fundamentally changing the priorities in city transportation.
La Commission pourrait-elle apporter des éclaircissements sur sa position vis-à-vis de la demande d'adhésion à la Communauté économique européenne présentée par le Conseil fédéral suisse, sur la base de l'article 237 modifié du traité de Rome? Can the Commission clarify its position on the Swiss Federal Council's application to join the European Economic Community, on the basis of amended Article 237 of the Treaty of Rome?
Les services de la Commission ont effectué une mission en avril 1992 dans les régions les plus touchées par la sécheresse, afin de constater la situation sur place et d'examiner avec les autorités portugaises une série de mesures supplémentaires destinées à apporter une réponse adaptée à la gravité de la situation. In April Commission officials visited the areas worst affected by the drought in order to assess the situation first hand and discuss with the Portuguese authorities a number of additional measures with the aim of tackling the situation with the seriousness which the circumstances demand.
In some cases, the source sentence is paraphrased in such a way that the exact correspondence between the two texts is particularly difficult to find: 
 
 
Une réunion, qui s'est tenue à Bruxelles le 29 mai 1991 et à laquelle ont participé plusieurs organisations de Gitans, a permis d'accentuer l'effort pour apporter des éléments concrets de réponse aux préoccupations exprimées par l'honorable parlementaire. A meeting held in Brussels on 29 May 1991 and attended by representatives from a number of gypsies' organizations went a long way towards meeting the concerns expressed by the Honourable Member.
Cette situation est une source de difficultés pour les entreprises concernées. Quelle solution la Commission compte-t-elle apporter au problème? This situation is causing difficulties for the firms involved. How does the Commission intend to solve this problem?
La Commission voudrait-elle apporter des éclaircissements sur sa position vis-à-vis de la demande d'adhésion à la Communauté économique européenne présentée par la Finlande sur la base de l'article 237 modifié du traité de Rome? What view does the Commission take of Finland's application for accession to the European Economic Community under Article 237 of the Treaty of Rome as amended?
One of the beneficial effects of ARCADE could be an inventory of such problems and guidelines for human annotators. 
 

Procedure

The word track will take place in several steps, according to a schedule that will be updated as we go along:  
  • Step 1: the raw corpus will be distributed to participants well in advance, in order for them to understand the formats, interface their systems, tune and train them.
  • Step 2: a dry run will take place in order to check the procedures and evaluation programs.
  • Step 3: the test words will be distributed to the participants
  • Step 4: the participants will return the aligned corpus to the coordinator in the agreed format.
  • Step 5: the proposed alignements will be evaluated and the results returned on the discussion list.
  • Step 6: the results will be discussed on the list and at the SENSEVAL workshop (2-4 september)
  • Step 7: a longer discussion and analysis of results will take place in the fall, with the goal of publishing the results and planning the second round.

Evaluation method

Even more than in the sentence track, evaluation must be approached cautiously, with all the possible disclaimers. The discussion given in the section on evaluation methods for the sentence track does apply here as well, wtih aggravated intensity. 

This entire round can be seen as no more than a rehearsal intended to discuss and elaborate methodologies, evaluation metrics and protocols, refine the manual annotation process, etc.  

Much more discussion is needed before the format of results, metrics can be defined, and before the manual annotation of the reference corpus can be completed.