|
The alignment of paragraphs
containing the source word and its translation is available on the arcftp
site (directory word/align) or from the romanftp site (directory
fr/align), file parag.zip (access restricted to participants).
If you do use this information
in your system, please mention it for fair evaluation.
Format is as follows:
|
| Cat |
grammatical category of
word |
| NoOccur |
occurrence number |
| Lemma |
lemma of occurrence |
| ParSce |
number
of paragraph containing occurrence |
| Char |
character offset of the
occurrence within paragraph |
| Len |
character length of occurrence |
| Occur |
exact form of occurrence |
| Source |
source paragraph |
| Target |
target paragraph |
|
The combination of the two
fist columns uniquely identifies each occurrence.
Example:
| Cat |
NoOccur |
Lemma |
ParSce |
ParCib |
Char |
Len |
Occur |
Source |
Target |
| A |
1 |
biologique |
1608 |
1601 |
264 |
11 |
biologiques |
En outre, dans
le cadre… |
Moreover, within
the framework… |
| A |
2 |
biologique |
1645 |
1638 |
682 |
10 |
biologique |
Située
à proximité de… |
One of the
most beautiful … |
|