Multext - Document MUL4. Corpora. Version 0.1. Last modified 20 December 1996.




logo

Aligned corpus



Content

1. Format of alignment result

2. Different states of aligned sentences

3. Samples of aligned Files


The alignment of Multext corpus has been performed by a tool developped by the University of Lancaster. The result is conformant to the specifications of the CESANA format.

1. Format of alignment result

Files have been aligned to english version of JOC files ; the result is to be read inside XTARGETS attribute of each LINK with the ";" sign as delimiter :

<LINK XTARGETS="english sentence reference ; other languages sentence reference">

<!DOCTYPE CESALIGN PUBLIC "-//CES//DTD cesAlign//EN" []&;
<CESALIGN VERSION="1.14">
<LINKLIST>
<LINKGRP><LINK XTARGETS="C1P1S1 ;C1P1S1 ">
<LINK XTARGETS="C1P2S1 ;C1P2S1 ">
<LINK XTARGETS="C1P3S1 ;C1P3S1 ">
<LINK XTARGETS="C1P4S1 ;C1P4S1 ">

This format may be visualized in an annex file (extension : .al ) such as :

------
<S ID="C1P1S1">
Subject: The staffing in the Commission of the European Communities
<S ID="C1P1S1">
Asunto: Situacịn del personal en la Comisịn de la CEE

------
<S ID="C1P2S1">
Can the Commission say:
<S ID="C1P2S1">
Se ruega a la Comisịn que indique:

------

Note

The result of alignment has been hand-validated on links files, so the files showing the actual sentences do not reflect exactly the alignment when corrections have been made on first files.

2. Different states of aligned sentences


You are invited to send comments and feedback to multext@lpl.univ-aix.fr.


| Top | Next | Multext Corpus : general Content | Multext home page | LPL/CNRS
Copyright © Centre National de la Recherche Scientifique, 1996.
This page will undergo frequent modification. Therefore, please do not mirror this page.
HTML 3.2 Checked!