M0015 : EUROWORDNET

The EUROWORDNET DATA consists of the following modules:

A. Available Wordnets

B. LR(1) Common Components

C. LR(2) Language-Specific Components

D. LR(3) Software

E. Prices

F. Technical support

 

  1. Available Wordnets
  2. Following the announcement of the EuroWordNet databases in the last issue of the ELRA Newsletter (Vol.4 N.2), we are happy to announce that the list of EuroWordNet languages has grown. The following wordnets are now available via ELRA:

    ELRA ref.

    Language

    Synsets

    Word Meanings

    Language Internal Relations

    Equi-valence Relations

    ELRA-M0015

    English Addition to English WordNet

    16361

    40588

    42140

    0

    ELRA-M0016

    Dutch

    44015

    70201

    111639

    53448

    ELRA-M0017

    Spanish

    23370

    50526

    55163

    21236

    ELRA-M0018

    Italian

    48529

    48499

    117068

    71789

    ELRA-M0019

    German

    15132

    20453

    34818

    16347

    ELRA-M0020

    French

    22745

    32809

    49494

    22730

    ELRA-M0021

    Czech

    12824

    19949

    26259

    12824

    ELRA-M0022

    Estonian

    9317

    13839

    16318

    9004



  3. LR(1) Common Components (All Foreground - Data of layer 1)
  4. A.

    The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created. An ILI-record contains:

    A.1 synset: set of synonymous words or phrases (mostly from WordNet1.5)
    A.2 part-of-speech,
    A.3 one or more Top-Concept classifications (Optional)
    A.4 one or more Domain labels (Optional)
    A.5 a gloss in English (mostly from WordNet1.5)
    A.6 a unique ID linking the synset to its source (mostly WordNet1.5)

    B.

    Top-Ontology: an ontology of 63 basic semantic classes based on fundamental distinctions. By means of the Top-Ontology all the wordnets can be accessed using a single language-independent classification-scheme. Top-Concepts are only assigned to ILI-records.

    C.

    Domain-ontology: an ontology of subject-domains optionally assigned to ILI-records.

    D.

    A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets. These Base-Concepts form the core of all the wordnets. All the Base-Concepts are classified in terms of the Top-Concepts that apply to them.

    E.

    WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.



  5. LR(2) Language-Specific Components (Data of layer 2- partly Foreground and partly Background)
  6. Wordnets produced in the first project (LE2-4003):

    F.

    Dutch wordnet

    G.

    English wordnet (additional relations which are missing in WordNet1.5)

    H.

    Italian wordnet

    I.

    Spanish wordnet

    After extension of the project (LE4-8328):

    J.

    German wordnet

    K.

    French wordnet

    L.

    Czech wordnet

    M.

    Estonian wordnet

    The specific wordnets are language-internal structures, minimally containing:

    • set of variants or synonyms making up the synset
    • part-of-speech
    • language-internal relations to other synsets
    • equivalence relations with ILI-records
    • a unique-id linking the synset to its source

    Each wordnet will be distributed with LR1 and will include documentation on LR1 and the distributed wordnet. All the data will be distributed as text-files in the EuroWordNet import format and as Polaris database files (see below LR3). The EuroWordNet viewer (Periscope, see below LR3) can be used to access the database version. Polaris has to be licensed to modify and extend the database version.

    The wordnets are distributed without:

    • glosses
    • usage labels
    • morpho-syntactic properties
    • examples
    • word-to-word translations

     

  7. LR(3) Software
  8. The multilingual EUROWORDNET Database (partly Foreground, partly Background) consists of three components:

    • The actual wordnets in Flaim database format: an indexing and compression format of Novell.
    • Polaris (Louw 1997): a wordnet editing tool for creating, editing and exporting wordnets.
    • Periscope (Cuypers and Adriaens 1997): a graphical database viewer for viewing and exporting wordnets.

    The Polaris tool is a re-implementation of the Novell ConceptNet toolkit (Díez-Orzas et al 1995) adapted to the EuroWordNet architecture. Polaris can import new wordnets or wordnet fragments from ASCII files with the correct import format and it creates an indexed EUROWORDNET Database. Furthermore, it allows a user to edit and add relations in the wordnets and to formulate queries. The Polaris toolkit makes it possible to visualise the semantic relations as a tree-structure that can directly be edited. These trees can be expanded and shrunk by clicking on word-meanings and by specifying so-called TABs indicating the kind and depth of relations that need to be shown. Expanded trees or sub-trees can be stored as a set of synsets, which can be manipulated, saved or loaded. Additionally, it is possible to access the ILI or the ontologies, and to switch between the wordnets and ontologies via the ILI. Finally, it contains an interface to project sets of synsets across wordnets.

    The Periscope program is a public viewer that can be used to look at wordnets created by the Polaris tool and to compare them in a graphical interface. Word meanings can be looked up and trees can be expanded. Individual meanings or complete branches can be projected on another wordnet or wordnet structures can be compared via the equivalence relations with the Inter-Lingual-Index. Selected trees can be exported to text files. The Periscope program cannot be used for importing or changing wordnets.

    N.

    The Polaris program is partly Background and partly Foreground. It is property of Vantage Research and can be licensed as a EuroWordNet result from Vantage Research (http://www.vantage.com).

    O.

    The Periscope viewer is property of Vantage Research and is Foreground.

     

  9. Prices
  10. The prices indicated in the tables below are based on the number of synsets in each language wordnet. Members are offered a 50% discount on the public price. Each language wordnet has a fixed number of non divisible synsets.

    There are 4 different types of use:

    VAR-C = Commercial use

    VAR-I = Internal use by a commercial organisation

    VAR-E = Evaluation licence (3 month licence)

    End-User = Research use by an academic institution

    ELRA Member prices (in EURO)

    Language wordnet

    Number of synsets

    VAR-C

    VAR-I

    VAR-E

    END-USER

    ELRA-M0015 English Addition

    16,361

    4090.25

    2454.15

    327.22

    163.61

    ELRA-M0016 Dutch

    44,015

    11003.75

    6602.25

    880.3

    440.15

    ELRA-M0017 Spanish

    23,370

    5842.5

    3505.5

    467.4

    233.7

    ELRA-M0018 Italian

    48,529

    12132.25

    7279.35

    970.58

    485.29

    ELRA-M0019 German

    15,132

    3783

    2269.8

    302.64

    151.32

    ELRA-M0020 French

    22,745

    5686.25

    3411.75

    454.9

    227.45

    ELRA-M0021 Czech

    12,824

    3206

    1923.6

    256.48

    128.24

    ELRA-M0022 Estonian

    9,317

    2329.25

    1397.55

    186.34

    93.17



    Non-Member prices (in EURO)

    Language wordnet

    Number of synsets

    VAR-C

    VAR-I

    VAR-E

    END-USER

    ELRA-M0015 English Addition

    16,361

    8180.5

    4908.3

    654.44

    327.22

    ELRA-M0016 Dutch

    44,015

    22007.5

    13204.5

    1760.6

    880.3

    ELRA-M0017 Spanish

    23,370

    11685

    7011

    934.8

    467.4

    ELRA-M0018 Italian

    48,529

    24264.5

    14558.7

    1941.16

    970.58

    ELRA-M0019 German

    15,132

    7566

    4539.6

    605.28

    302.64

    ELRA-M0020 French

    22,745

    11372.5

    6823.5

    909.8

    454.9

    ELRA-M0021 Czech

    12,824

    6412

    3847.2

    512.96

    256.48

    ELRA-M0022 Estonian

    9,317

    4658.5

    2795.1

    372.68

    186.34



    Discount***

    Number of synsets

    Discount

    Above 60,000 cumulated synsets

    5%

    Above 100,000 cumulated synsets

    10%

    Above 160,000 cumulated synsets

    20%

    ***A discount is offered to both members and non-members according to the total (cumulated) number of synsets that are ordered at one time. The total number of synsets is calculated by adding up the number of synsets for each language wordnet purchased. For example, if you order the English and Dutch wordnets, the total amount of synsets is 16,361 synsets (English) + 44,015 synsets (Dutch) = 60,376 synsets. In this case, the 5% corresponding discount is applied.

     

  11. Technical support
  12. Technical support may be provided by members of the consortium. It will be implemented through bilateral agreements between the User and the member of the consortium responsible for the data acquired by User. As an indication the support contract will be on a yearly basis and will cost 10-20 KEURO/Year.

    For more information about the EuroWordNet project: http://www.hum.uva.nl/~ewn


    Access to prices and
    other LR's of the same type

    Copyright © 1996-2001 ELRA/ELDA - Webmaster