| Aligned corpus |
The alignment of Multext corpus has been performed by a tool developped by the University of Lancaster. The result is conformant to the specifications of the CESANA format.
Files have been aligned to english version of JOC files ; the result is to be read inside XTARGETS attribute of each LINK with the ";" sign as delimiter :
<LINK XTARGETS="english sentence reference ; other languages sentence reference">
<!DOCTYPE CESALIGN PUBLIC "-//CES//DTD cesAlign//EN" []&;
<CESALIGN VERSION="1.14">
<LINKLIST>
<LINKGRP><LINK XTARGETS="C1P1S1 ;C1P1S1 ">
<LINK XTARGETS="C1P2S1 ;C1P2S1 ">
<LINK XTARGETS="C1P3S1 ;C1P3S1 ">
<LINK XTARGETS="C1P4S1 ;C1P4S1 ">
This format may be visualized in an annex file (extension : .al ) such as :
------
<S ID="C1P1S1">
Subject: The staffing in the Commission of the European Communities
<S ID="C1P1S1">
Asunto: Situacịn del personal en la Comisịn de la CEE
------
<S ID="C1P2S1">
Can the Commission say:
<S ID="C1P2S1">
Se ruega a la Comisịn que indique:
------
Note
The result of alignment has been hand-validated on links files, so the files showing the actual sentences do not reflect exactly the alignment when corrections have been made on first files.
You are invited to send comments and feedback to multext@lpl.univ-aix.fr.