MULTEXT/EAGLES - Document LSD 2. Part 0. Version 0.5. Last modified 28 April 1996.




logo

GLOSIX Part 0. Overview





Contents

| Back to LSD2 Table of Contents |


Goals

The MULTEXT project and the EAGLES sub-group on Tools have joined efforts to address the need for reusable linguistic software by working toward the establishment of Guidelines for Linguistic Software Development (LSD Guidelines). The document "Considerations for Linguistic Software Reusability" (document MUL/EAG-LSD 1) outlines the general principles upon which the Guidelines are to be based. This document is a follow-on which makes first sketch of the Guidelines, and is intended to provide the basis upon which the full specifications will be developed.

The development of the Guidelines will require considerable input from and discussion within the language engineering community. We therefore intend to achieve our goal by a process of stepwise refinement, consisting of a cycle of specification, testing and feedback, and refinement. It is for this reason that we provide here a preliminary sketch which is intended to


What is linguistic software reusability?

Linguistic software reusability comprises several aspects. We discuss the major ones below. They are listed in the order in which they should be implemented, since as we go down the list, each depends and builds upon the previous one.

Usability

Reusability implies usability as a starting point. The current most obvious obstacles to usability are factors such as poor documentation, unreliability, lack of robustness, etc., which serve as the prime reasons why freely-available software is not more widely used. The rectification of these problems is fairly straightforward, and serves as a first step in working toward reusability.

Portability

Portability concerns the capability for tools developed at one site to be immediately usable at other sites. At present, it is nearly a given that software developed at one site demands considerable tweaking to run at another site, especially if the environments are not perfectly identical. This leads to substantial investments of time and resources, just to get the the point of being able to run software acquired from other sites.

Ideally, we should aim for portability across platforms (Windows, MacOs, Unix), but this is a long-term goal which will require substantial work. In the short term, we can achieve protability between similar environments, e.g., between different versions of UNIX.

Compatibility

Compatibility concerns the capability for tools developed independently to inter-operate in the same environment, in order to perform complex tasks. This demands, first, that tools can communicate--that is, for results produced by one tool to be usable by another; and, second, that their functionalities are complementary and coherent. It is also essential that tools are designed for compatibility with data and other resources (e.g., lexicons) in common formats.

At present, the proliferation of different implementations of basic linguistic tools such as part-of-speech taggers is not only confusing to the user who may want to apply such tools, but also renders comparison of their results virtually meaningless. Standard methods for software design and development will make comparison of results possible.

Extensibility

Extensibility involves the capability to adapt tools to fit particular needs, to add pieces to existing tools, to replace pieces, etc. One important feature for linguistic purposes is the capacity to use the same tools on different languages. At the moment, most linguistic software exists in the form of integrated systems performing multiple functions, often with little or no access to individually functioning modules. Thus adaptation or extension, either for functionality or to accomodate other languages, etc., is virtually impossible.


Principles

The MULTEXT/EAGLES LSD Guidelines will be based on existing or emerging standards. However, there is an enormous proliferation of standards relevant to the full specification of the Guidelines, including areas such as character sets, document encoding, programming languages, operating systems, etc. There exist in some cases multiple standards for the same phenomenon, as well as drafts, discussions among technical groups, etc., since many standards are currently in various stages of the definition process.

The MULTEXT/EAGLES LSD Guidelines are intended to provide a selection among relevant standards that best suit the needs of linguistic software development, in order to define a coherent "open environment" for developers, the GLOSIX Open System Environment. In addition, it will be necessary to fill or at least determine the gaps among existing standards. To develop the MULTEXT/EAGLES LSD Guidelines, it will be necessary first of all to compile a list of the relevant standards, drafts, etc., and then to examine each closely, in order to determine their relations, overlaps, compatibilities, etc. This is a formidable task, which can only be accomplished by taking careful steps toward fuller and fuller specification.

There exist similar integration efforts, such as the IEEE PACS Committee (POSIX) or the X/OPEN Company. While it will be possible to build upon and adapt from these efforts, the development of the MULTEXT/EAGLES LSD guidelines differs from them in the following ways:


Scope and limitations

The current document addresses primarily the issues of usability and portability, which are seen to be the first and most basic issues to be considered in working toward full development of the MULTEXT/EAGLES LSD Guidelines. Other important concerns, such as compatibility and extensibility,will be taken up in the continuing work of the Tools sub-group.

The MULTEXT/EAGLES LSD Guidelines will cover the following topics:

Because the work of the Tools sub-group is preliminary, this document is subject to the following limitations:

Acronyms and abbreviations

FAQ
Frequently Asked Questions -- compilation of most frequently asked questions, regularly posted on USENET newsgroups.
LSD
Linguistic Sofware Development

| Top | LSD2 Table of Contents | MULTEXT | EAGLES Tool subgroup | LPL |

Copyright (c) Centre National de la Recherche Scientifique, 1995-1996. HTML 3.2 Checked!