|Word sense disambiguation
(WSD) has been recognised as a central (and difficult) problem in the very
first paper on computer treatment of language, Weaver's memorandum (Weaver,
1949). Since then, there has been continuous research on WSD, in the
context of various sub-fields (machine translation, information retrieval,
content analysis, natural language understanding, etc. -- for a recent
survey, see Ide and Véronis,
1998). An impressive array of methods has been proposed, and occasionally
rediscovered over the years, and various claims of efficiency have been
made. However, it is extremely difficult to compare the results, and therefore
the methods: the texts, words and sense lists used are widely different
across studies, as well as the evaluation protocols and metrics. Under
the auspices of ACL-SIGLEX and EURALEX, the SENSEVAL
evaluation exercise is attempting for the first time to run an ARPA-like
competition between WSD systems.
Discussions among the SENSEVAL
program committee members pointed out the differences in existing linguistic
resources (corpora, dictionaries, etc.) between English and other languages
and decided to organise within SENSEVAL a specific competition for Romance
languages, called ROMANSEVAL. A six-month test campaign is planned in coordination
with the ARCADE
project on multilingual text alignment, whose word track will use the same
corpus and test words. Results will be presented at the SENSEVAL
workshop in September 1998.
|Given the short time span,
and lack of prior experience in WSD evaluation, the goals must necessarily
be modest. The program committee agreed on a very simple task, in which
60 words (20 nouns, 20 adjectives, 20 verbs) will be submitted to the various
systems. The systems will return the words tagged according to simple widely-available
commercial dictionary (such as the Petit Larousse for French). The
systems will therefore have to map their own sense system/ontology to the
senses provided by that dictionary.
Word terre in Petit
Tagging (zero, one or several
senses can be provided):
|The exercise will take place
in several steps, according to a schedule that
will be updated as we go along:
|Evaluation must be approached
cautiously, with all the possible disclaimers. Feasibility
constraints (time and human ressources) partially drive what can be practically
done -- as opposed to what would be theoretically perfect. However, as
in any competition, we must make every effort to ensure fairness and openness
of the evaluation process.
The first point that must be stressed is that the idea of "competition" is only a pretext to do collectively an interesting piece of scientific work, and improving our systems. The final ranking of systems (if any such ranking is possible) is not very important. It it is extremely difficult to compare systems with different goals and different resources. This entire six-month period can be seen as no more than a rehearsal intended to discuss and elaborate methodologies, evaluation metrics and protocols, refine the manual annotation process, etc.
The discussion seemed to move toward agreement on several ideas.