EuroWordNet data
To maximize the uniform encoding of the wordnets, we have
classified the Base Concepts using a Top Ontology, specifically been designed
for this purpose. The Top Ontology is based on existing linguistic classifications
and is adapted to represent the diversity of the Base Concepts. It is important
to realize that the Top Concepts represent semantic features that can either
be applied as disjunctively or conjunctively. In the latter case it is
possible to get complex clusters of features, such as: Container+Part+Object+Natural,
which could apply to "seed case". Click BC
Clustering Overview Image to see some examples how BCs can be clustered.
The first level of the Top Ontology is divided into three
types:
-
1stOrderEntity (roughly corresponding to concrete, perceivable
objects and substances)
-
2ndOrderEntity (states, situations and events)
-
3rdOrderEntitiy (mental entities such as ideas, concepts,
knowledge)
For a further explanation of the Top Ontology and the Base
Concept selection see the deliverable Deliverable D017d034d036, which is
available as zipped RTF or zipped
PostScript file. The data is available as:
-
Top Concept Ontology: 64 Top Concepts linked to the Inter-Lingual-Index
-
EuroWordNet import file:
this format can be loaded in the Polaris
tool to create a EuroWordNet database.
-
EuroWordNet database format: EuroWordNet database created
from the previous file.
-
Included in the EuroWordNet database.
-
The common Base Concepts and their classifications in terms
of the Top Ontology. The Base Concepts are specified in terms of WordNet1.5
synsets (identified by their file offset position+pos, e.g. 00123456-n).
The classification in terms of Top Concepts is available in 2 formats:
-
Flat Ascii files for the 1stOrderEntities,
2ndOrderEntities and 3rdOrderEntities:
listing the Base Concepts and the cluster of Top Concepts that applies.
Additional information is provided in the form of a domain label, a gloss,
the hyperonym in WordNet1.5, 1 synset member from the synset (the sense
numbers do not correspond with the sense numbers in WordNet1.5 database).
Fields are separated by TABs. A fixed number of fields is provided per
line: 9 for 1stOrderEntities and 22 for 2ndOrderEntities and 3rdOrderEntities.
-
Flat Acii files for the 1stOrderEntities,
2ndOrderEntities and 3rdOrderEntities:
listing the TopConcept combinations that occurred followed by all Base
Concepts that belong to these clusters.
In addition to classifying the 1024 Common Base Concepts,
we have also constructed a reduced set of 164 Core Base Concepts
that occur in 3 or more wordnets as important meanings. The can be accessed
separately. They are listed as WordNet1.5 synsets with glosses, their WordNet1.5
hyperonym and the EuroWordNet Top Concepts that have been applied. By clicking
on the links either the EuroWordNet Top Ontology is activated or the WordNet1.5
hyponymy tree. Finally, we have reduced the 164 Core Base Concepts
to 71 Base Types. The reduction involved removing unbalanced hyponyms
(when both the hyperonym and hyponym are present but not other co-hyponyms)
and by replacing closely related synsets (e.g. act and action)
by a single Type. The Base Types can be seen as a minimalized list of fundamental
concepts (semantic primitives or taxonomy tops). For each Base Type we
have provided the mapping to the Core Base Concepts that it represents.
The top-ontology work is related to other semantic projects,
e.g.:
Back
to "Data"
Up
to "Main Menu EuroWordNet"