- Available Wordnets
Following the announcement of the EuroWordNet databases in the last
issue of the ELRA Newsletter (Vol.4 N.2), we are happy to announce
that the list of EuroWordNet languages has grown. The following wordnets
are now available via ELRA:
|
ELRA ref.
|
Language
|
Synsets
|
Word Meanings
|
Language Internal Relations
|
Equi-valence Relations
|
|
ELRA-M0015
|
English Addition to English WordNet
|
16361
|
40588
|
42140
|
0
|
|
ELRA-M0016
|
Dutch
|
44015
|
70201
|
111639
|
53448
|
|
ELRA-M0017
|
Spanish
|
23370
|
50526
|
55163
|
21236
|
|
ELRA-M0018
|
Italian
|
48529
|
48499
|
117068
|
71789
|
|
ELRA-M0019
|
German
|
15132
|
20453
|
34818
|
16347
|
|
ELRA-M0020
|
French
|
22745
|
32809
|
49494
|
22730
|
|
ELRA-M0021
|
Czech
|
12824
|
19949
|
26259
|
12824
|
|
ELRA-M0022
|
Estonian
|
9317
|
13839
|
16318
|
9004
|
- LR(1) Common Components (All Foreground - Data
of layer 1)
|
A.
|
The Inter-Lingual-Index, which is a list of
records (ILI-records), in the form of synsets mainly taken from
WordNet1.5 or manually created. An ILI-record contains:
A.1 synset: set of synonymous words or phrases
(mostly from WordNet1.5)
A.2 part-of-speech,
A.3 one or more Top-Concept classifications (Optional)
A.4 one or more Domain labels (Optional)
A.5 a gloss in English (mostly from WordNet1.5)
A.6 a unique ID linking the synset to its source (mostly WordNet1.5)
|
|
B.
|
Top-Ontology: an ontology of 63 basic semantic
classes based on fundamental distinctions. By means of the Top-Ontology
all the wordnets can be accessed using a single language-independent
classification-scheme. Top-Concepts are only assigned to ILI-records.
|
|
C.
|
Domain-ontology: an ontology of subject-domains
optionally assigned to ILI-records.
|
|
D.
|
A selection of ILI-records, the so-called Base-Concepts,
which play a major role in the different wordnets. These Base-Concepts
form the core of all the wordnets. All the Base-Concepts are
classified in terms of the Top-Concepts that apply to them.
|
|
E.
|
WordNet1.5 (91591 synsets; 168217 meanings;
126520 entry words) in EuroWordNet format.
|
- LR(2) Language-Specific Components (Data of layer
2- partly Foreground and partly Background)
Wordnets produced in the first project (LE2-4003):
|
F.
|
Dutch wordnet
|
|
G.
|
English wordnet (additional relations which
are missing in WordNet1.5)
|
|
H.
|
Italian wordnet
|
|
I.
|
Spanish wordnet
|
After extension of the project (LE4-8328):
|
J.
|
German wordnet
|
|
K.
|
French wordnet
|
|
L.
|
Czech wordnet
|
|
M.
|
Estonian wordnet
|
The specific wordnets are language-internal structures,
minimally containing:
- set of variants or synonyms making up the synset
- part-of-speech
- language-internal relations to other synsets
- equivalence relations with ILI-records
- a unique-id linking the synset to its source
Each wordnet will be distributed with LR1 and will
include documentation on LR1 and the distributed wordnet. All the
data will be distributed as text-files in the EuroWordNet import format
and as Polaris database files (see below LR3). The EuroWordNet viewer
(Periscope, see below LR3) can be used to access the database version.
Polaris has to be licensed to modify and extend the database version.
The wordnets are distributed without:
- glosses
- usage labels
- morpho-syntactic properties
- examples
- word-to-word translations
- LR(3) Software
The multilingual EUROWORDNET Database (partly Foreground, partly
Background) consists of three components:
- The actual wordnets in Flaim database format: an indexing and
compression format of Novell.
- Polaris (Louw 1997): a wordnet editing tool for creating, editing
and exporting wordnets.
- Periscope (Cuypers and Adriaens 1997): a graphical database viewer
for viewing and exporting wordnets.
The Polaris tool is a re-implementation of the Novell
ConceptNet toolkit (Díez-Orzas et al 1995) adapted to the EuroWordNet
architecture. Polaris can import new wordnets or wordnet fragments
from ASCII files with the correct import format and it creates an
indexed EUROWORDNET Database. Furthermore, it allows a user to edit
and add relations in the wordnets and to formulate queries. The Polaris
toolkit makes it possible to visualise the semantic relations as a
tree-structure that can directly be edited. These trees can be expanded
and shrunk by clicking on word-meanings and by specifying so-called
TABs indicating the kind and depth of relations that need to be shown.
Expanded trees or sub-trees can be stored as a set of synsets, which
can be manipulated, saved or loaded. Additionally, it is possible
to access the ILI or the ontologies, and to switch between the wordnets
and ontologies via the ILI. Finally, it contains an interface to project
sets of synsets across wordnets.
The Periscope program is a public viewer that can
be used to look at wordnets created by the Polaris tool and to compare
them in a graphical interface. Word meanings can be looked up and
trees can be expanded. Individual meanings or complete branches can
be projected on another wordnet or wordnet structures can be compared
via the equivalence relations with the Inter-Lingual-Index. Selected
trees can be exported to text files. The Periscope program cannot
be used for importing or changing wordnets.
|
N.
|
The Polaris program is partly Background and
partly Foreground. It is property of Vantage Research and can
be licensed as a EuroWordNet result from Vantage Research (http://www.vantage.com).
|
|
O.
|
The Periscope viewer is property of Vantage Research and is
Foreground.
|
- Prices
The prices indicated in the tables below are based
on the number of synsets in each language wordnet. Members are offered
a 50% discount on the public price. Each language wordnet has a fixed
number of non divisible synsets.
There are 4 different types of use:
VAR-C = Commercial use
VAR-I = Internal use by a commercial organisation
VAR-E = Evaluation licence (3 month licence)
End-User = Research use by an academic institution
|
ELRA Member prices (in EURO)
|
|
Language wordnet
|
Number of synsets
|
VAR-C
|
VAR-I
|
VAR-E
|
END-USER
|
|
ELRA-M0015 English Addition
|
16,361
|
4090.25
|
2454.15
|
327.22
|
163.61
|
|
ELRA-M0016 Dutch
|
44,015
|
11003.75
|
6602.25
|
880.3
|
440.15
|
|
ELRA-M0017 Spanish
|
23,370
|
5842.5
|
3505.5
|
467.4
|
233.7
|
|
ELRA-M0018 Italian
|
48,529
|
12132.25
|
7279.35
|
970.58
|
485.29
|
|
ELRA-M0019 German
|
15,132
|
3783
|
2269.8
|
302.64
|
151.32
|
|
ELRA-M0020 French
|
22,745
|
5686.25
|
3411.75
|
454.9
|
227.45
|
|
ELRA-M0021 Czech
|
12,824
|
3206
|
1923.6
|
256.48
|
128.24
|
|
ELRA-M0022 Estonian
|
9,317
|
2329.25
|
1397.55
|
186.34
|
93.17
|
|
Non-Member prices (in EURO)
|
|
Language wordnet
|
Number of synsets
|
VAR-C
|
VAR-I
|
VAR-E
|
END-USER
|
|
ELRA-M0015 English Addition
|
16,361
|
8180.5
|
4908.3
|
654.44
|
327.22
|
|
ELRA-M0016 Dutch
|
44,015
|
22007.5
|
13204.5
|
1760.6
|
880.3
|
|
ELRA-M0017 Spanish
|
23,370
|
11685
|
7011
|
934.8
|
467.4
|
|
ELRA-M0018 Italian
|
48,529
|
24264.5
|
14558.7
|
1941.16
|
970.58
|
|
ELRA-M0019 German
|
15,132
|
7566
|
4539.6
|
605.28
|
302.64
|
|
ELRA-M0020 French
|
22,745
|
11372.5
|
6823.5
|
909.8
|
454.9
|
|
ELRA-M0021 Czech
|
12,824
|
6412
|
3847.2
|
512.96
|
256.48
|
|
ELRA-M0022 Estonian
|
9,317
|
4658.5
|
2795.1
|
372.68
|
186.34
|
|
Discount***
|
|
Number of synsets
|
Discount
|
|
Above 60,000 cumulated synsets
|
5%
|
|
Above 100,000 cumulated synsets
|
10%
|
|
Above 160,000 cumulated synsets
|
20%
|
***A discount is offered to both members and non-members
according to the total (cumulated) number of synsets that are ordered
at one time. The total number of synsets is calculated by adding up
the number of synsets for each language wordnet purchased. For example,
if you order the English and Dutch wordnets, the total amount of synsets
is 16,361 synsets (English) + 44,015 synsets (Dutch) = 60,376 synsets.
In this case, the 5% corresponding discount is applied.
- Technical support
Technical support may
be provided by members of the consortium. It will be implemented through
bilateral agreements between the User and the member of the consortium
responsible for the data acquired by User. As an indication the support
contract will be on a yearly basis and will cost 10-20
KEURO/Year.
For more information about the EuroWordNet project: http://www.hum.uva.nl/~ewn
Access to prices and
other LR's of the same type
Copyright © 1996-2001 ELRA/ELDA - Webmaster