Bern, 9th March 1999
Genevieve Clavel (SNL); Peter Dale (BL); Magda Heiner-Freiling (DDB); Martin Kunz (DDB); Patrice Landry (SNL); Andrew MacEwan (BL); Max Naudi (BnF); Pat Oddy (BL) Angélique Saget (BnF)
CoBRA+ working group on multilingual subject access
Note Further tables, appendices, etc. supplement this document. See the links at the foot of this page.
This final report defines the problem of multilingual subject access, summarises the work carried out by the CoBRA+ working group on multilingual subject access from autumn 1997 until February 1999 and its results, identifies and discusses issues to be resolved, and presents a proposal for a prototype to the directors of the institutions concerned. For a summary of results, and the proposal, see ‘CoBRA+ working group on multilingual subject access: proposals for discussion, March 18th 1999. This report will be distributed to members of the CENL and posted on the GABRIEL website. Genevieve Clavel has compiled it on the basis of the group’s reports, discussions within the group and comments provided by the partners.
For the meetings leading up to this study, the members of the group and the meetings’ list, see Annexe 1.
The question of multilingual access to bibliographic databases affects not only searchers in countries in which several languages are spoken such as Switzerland, but also all those who search material in databases containing material in more than one language, which is the case in the majority of scientific or research databases.
The growth of networks means that we can easily access catalogues outside our own immediate circle - in another town, another country, another continent. In doing so we encounter problems concerning not only search interfaces, but also concerning subject access or even author access in another language.
In France for example, each document, independently of the language in which it has been written, is indexed using a French-language subject heading language. Thus, in order to search by subject headings for documents written in English or German, held in the Bibliothèque nationale de France, the researcher from abroad has to master the French language.
In theory, the indexer should be able to analyse a document and assign headings in his/her native language, while the user should be able to search in his/her native language. The language of the document itself should have no influence on the language of the subject heading language used for indexing nor on the language used for searching. (Practically speaking of course, there are restrictions, since there is a limit to the number of languages in which subject headings languages could be maintained and thus in which the user may search.) In the example below, we are concerned with three languages: German, French and English.
If we can imagine a system in which there are equivalents among subject headings in these three languages, the following scenario may be envisaged:
a German-speaking indexer will use German-language subject headings to index all the documents received, regardless of the language in which they are written.
The user may search for these documents by entering subject headings in German, but also in French or in English, thanks to the equivalents that have been established, in French or in English without the necessity to know the other languages or the structure of the other SHLs.
Ideally, this approach should not be confined to one database, but would allow the different databases to be brought together in virtual system: an English-speaking user in London should be able to search the database of the Deutsche Bibliothek in Frankfurt using English-language headings, and retrieving documents which have been indexed using the German subject headings’ list.
The problems of creating multilingual thesauri have been widely discussed, though usually confined to a restricted subject field and a standard exists (ISO 5964) which proposes three approaches to their construction:
If we consider these approaches in the context of subject heading languages in libraries, the following points must be taken into consideration:
It is with this aim in mind that the members of the project group got closer the third approach, in a feasibility study on how to offer multilingual subject access using three different subject heading languages (SHL): RAMEAU, SWD/RSWK and LCSH, by establishing links between headings in each language. A similar approach has also been adopted in two major terminology projects, the creation of equivalents between the Art and Architecture Thesaurus and other controlled vocabularies in the same field, and in the Unified Medical Language System (UMLS). It should be emphasised though that while the approach adopted by the group does not correspond exactly to the guidelines in ISO 5964, the solutions proposed by the standard have been studied and used in part in the establishment of the group’s linking methodology.
Although this study was confined to three SHLs, its potential goes well beyond the partner institutions. The three SHLs are applied not only in the partner libraries but are in extensive use in other libraries in France, Germany and Great Britain, as well as in many other English-speaking, French-speaking and German-speaking countries; linking the three SHLs could provide access to millions of documents. In addition, the methodology studied and the approach used was designed from the start to be extendable to other subject heading languages if it proved valid in the restricted context being studied.
The aim of the study was to establish equivalents (or best matches) between RAMEAU, SWD and LCSH in comparing both headings from selected subject areas and indexing of publications. Its four key elements were:
It was clear from the start that the linking approach would have consequences for future co-operation, the most important being that should it be possible to establish equivalents among the majority of the headings, and that each partner uses these links, fully standardised subject access would only be guaranteed in the source language of the catalogue in question - and this access would be different for each institution.
Each document in an institution would not be indexed three times using SWD/RWSK, RAMEAU and LCSH. Instead, an institution would index documents using its own SHL, e.g. for DDB this would be SWD/RSWK, then offer an access based on the other language equivalents from the remaining SHLs i.e. DDB would offer English language and French language access based on the headings found in LCSH and in RAMEAU. However, the juxtaposition of those headings in the other languages would not constitute indexing according to the rules of the other schemes.
While this approach does not correspond to an ‘ideal’ multilingual access as described above, at the same time it has the potential to be more than a simple ‘lead-in’ term and is certainly an improvement on the current situation, in which subject access to these databases is monolingual.
It is important to understand something of the nature of subject indexing and also something about the structure and contents of the three subject heading languages being studied in order to appreciate the problems faced when attempting to use them as a multilingual access tool, as well as to be able to judge the benefits of such an approach.
The three subject heading languages contain a list of headings (a controlled vocabulary of concepts which may be expressed in one or more words), a semantic structure (defined in the authority records for each heading) and a set of rules by which these headings may be combined in a string to describe the contents of a document (the syntax of the language).
are headings in RAMEAU, each with a record in the RAMEAU SHL, while
Acteurs – Formation – France
is a string, constructed according to the RAMEAU syntax rules, that could be assigned to a document: it has no authority record in the SHL.
The first tasks carried out by the group were situated at the level of the headings, and aimed to see to what extent links could be made among the three SHLs. Following the creation of multilingual lists of headings, the group then extended the study to take into account the use of these headings within indexing strings, and the consequences for access.
The methodology adopted for the comparison of headings from the different SHLs has been described in detail in the report on Sports (Annexe 2) and only an excerpt of the report will be given here (from ‘Conclusion: summary of methodology and global results’).
The selection criteria for these two fields to be studied (Sports and Theater) were based on a pragmatic approach: Sports was considered to be a relatively uncomplicated field in the sense that it has no major cultural or national bias, while on the other hand it is not too simple as its terminology is not internationally established (e.g. as in some scientific fields), while theatre was considered to be a wider and more complex semantic field, enabling the group to test the linking methodology defined and refined in the subject area of Sports. The final methodology, which can be applied to other fields, is defined below:
For details of the results, complete lists of the headings, and further discussion, see the reports on Sports, and Theater annexed.
These initial results are very promising, especially if the following points are also taken into consideration:
However, the following difficulty also came to light:
Enfants acteurs = Child actors = Schauspieler + Kind (string composed of two authorities)
However, currently none of the partners systematically create strings in their authority lists (for reasons of list management and also general philosophy), so that if the heading / string matches are removed, the results are less positive. So, either a solution must be found in a structure outside the different systems, or a lower level of equivalence must be accepted, along with the consequences for searching. This point will be considered further in the section on proposals for the future. It is important to consider linking possibilities not only at the theoretical level (headings) but also at the application level (strings) since, as was stated earlier, the ultimate goal of the group is not just to link headings, but to allow users to access documents, typically via subject strings. Indeed, given the nature of heading creation in each SHL (literary warrant) the headings cannot be divorced from their application in indexing. Throughout the work on linking headings in Sports and Theater, it was always possible, and sometimes necessary to refer back to real indexing in bibliographic records in order to check on the meaning of headings, to resolve terminological ambiguities and to confirm some equivalents.
After the creation of links between headings, it was therefore necessary to go a step further and to analyse the consequences at the level of indexing strings. Initially, it was planned to compare the subject indexing of about 500 recent publications with international imprints. However, given the difficulty of locating publications indexed in all three languages, the group modified the approach:
A comparative analysis was made of the indexing of those titles (in any field) in which original indexing was available in all three SHLs. In addition, a more focused comparison of titles in the subject area of Theater was carried out. This was conducted by asking each library to supply 10 examples of indexing from their current cataloguing in this area which the other libraries could then index using their own system.
There were some inherent difficulties in both approaches. In the first case considered: since the group had only carried out a detailed selection and comparison of headings in the fields of Theater and Sports, there were no general multilingual lists available for the comparison. It was therefore necessary to start by attempting to create links ‘across the board’ for the titles found, without being able to apply the methodology defined in the other subject areas. As a result, some false links may have been made, and others not discovered.
The study confirmed the following points: although the SHLs being studied adhere to the same basic principles, the number of headings and subdivisions which may be combined and the complexity of the strings which may result varies from language to language. The number of strings that may be applied to a document also varies according to the different rules applied. We may say in general that the level of co-ordination and application of rules are closer between RAMEAU and LCSH than between SWD/RSWK and the other two schemes, but not to the extent that they are interchangeable. In addition the number of strings applied to a document may also vary as a result of indexer subjectivity and rule interpretation. Finally, the indexing resulting from the application of the rules within different linguistic and cultural contexts can give rise to variations in the strings assigned to documents: sometimes there is a close similarity
Title = Using and understanding medical statistics
SWD/RWSK Medizinische Statistik
LCSH Medical statistics
RAMEAU Statistique médicale
sometimes apparently less so, since the heading / string equivalents were not studied here
Title = Le baptême dans l’église ancienne
SWD/RSWK Taufe / Frühchristentum / Geschichte 30-325 / Quelle
LCSH Baptism – History – Early church, ca 30-600
RAMEAU Baptême – Histoire – 30-600 (Eglise primitive)
and at times there are extensive differences, but also maybe because some links had not been identified
Title = Bank guarantees in international trade: the law and practice of independent (first demand) guarantees and standby letters of credit in civil law and common law jurisdictions
SWD/RSWK Bankgarantie / Aussenhandel / Internationales Privatrecht /
LCSH Suretyship and guaranty
LCSH Letters of credit
RAMEAU Garanties à première demande
The group was aware of these different structures and syntaxes at the start of the study, hence the focus of the comparative study, which was to see if equivalents were possible at the SHL level, and then check if those links were reflected usefully in the comparative indexing of actual documents. It was estimated that the overlap in authorities used to index the documents under consideration varied from between 29% to 56%.
The second case was concerned only with titles taken from the field of Theater, and so should have presented fewer difficulties. However, the paucity of documents available in the field resulted in the selection of documents which, although in the field of Theater, were outside the scope of the study in several respects: the trilingual selection and comparison of headings was restricted to common nouns, and several of the documents selected required the attribution of personal names, time subdivisions and form subdivisions, none of which were available in a trilingual form. If we restrict the analysis to the 27 titles without these ‘extraneous’ elements, we see that in 23 titles there is at least one equivalence established at the authority level. Trilingual access through linked subject headings is available in many cases, thus validating the trilingual list for Theater
e.g. Traveling theater = Théâtre ambulant = Wanderbühne
There are also cases of links between a heading in one SHL and more than one heading in another SHL, underlining the findings of the work on the lists:
e.g. Afro-American Theater = Théâtre noir américain = USA + Theater + Schwarze
In addition, the exercise of indexing the titles enabled participants to improve or correct some links which had been established in the trilingual list
e.g. Puppet theater = Théâtre de marionnettes = Figurentheater (trilingual list)
Puppet theater = Théâtre de marionnettes = Marionettenspiel (derived from indexing)
In general, the indexing comparison shows that while there may be convergence in the headings used (i.e. at the authority level) there is considerably less overlap in the combination of those headings to create strings. This is in part a reflection of indexer subjectivity, but once again underlines the differences in approach defined in the rules and structures of the SHLs themselves. This is an important point for the application of a trilingual list in user searching, and will need to be studied further in the prototype phase.
The group considers that the results of the tests summarised above are sufficiently promising to warrant further investigation of the linking approach, by extending the trilingual lists on the one hand, and by creating a prototype aiming to give a better understanding of the practical application of the linked SHLs, and enabling the following points to be clarified:
It is clear from the initial tests that linking equivalents is a labour intensive area and that it will take time to cover all fields. It would be useful to identify the 'most-used' headings, which if linked would cover a high percentage of items, indexed, and work on such sets of headings. The structure of the RAMEAU list and the information system of the Bibliothèque nationale de France enable such information to be established, see for example the list of most-used RAMEAU headings in the theater field produced by BnF (in Theater : results and analysis). At present it is not possible to provide such a list from the SWD file nor from the LCSH file used at the British Library, though in the latter, usage may be deduced from the number of times a heading has been used in the on-line cataloguing system. It would also be productive to study to what degree selection of headings in a field could be automated e.g. using the hierarchical relationships (narrower terms), or by using classification numbers already assigned to headings, according to each SHL classification scheme (for example DDC numbers in the LCSH file used at the BL).
It is important to note that RAMEAU headings already contain around 60'000 links to LCSH headings, established by BnF. While these were not established according to the methodology defined by the group, and would need to be verified, they are an essential source of information, and could be used as input for a prototype.
This prototype would of course contain all the results already obtained from the feasibility study i.e. the linked headings (Sports and Theater) and the related bibliographic records in each database. These lists of linked headings could be extended, using the same methodology, to some new headings in other Sports or performing arts areas. But it would be interesting above all to test the above-mentioned "future strategy" in a new field by using the information in an automated process. The following actions are therefore proposed:
The feasibility study aimed at testing the intellectual links among SHLs, and did not plan to investigate the technical aspects surrounding linking and subsequent access to linked headings, or the indexed documents themselves. However, the group discussed the management of a trilingual list and considered the advantages and disadvantages of creating and maintaining a central list against those of managing links within each partners system.
It was felt initially that if links were maintained in each system, the need for a further layer of management might be eliminated. However, there are several difficulties: first of all if each partner were to maintain links within each SHL, there would be a duplication of effort, and probably complications in trying to agree each link with each partner. In addition, each extra partner introduced would increase the load of creating and maintaining links. Furthermore, in the case of LCSH, the SHL is maintained by the Library of Congress not the British Library and while it is seen as desirable that the LoC participate in the medium to long term, practically speaking the project will need to start independently of this institution. Finally, problems were foreseen in the case of ‘one to many’ links among SHLs (see above, section 3), and subsequent creation of new headings and their corresponding links.
The example below shows the case of the LCSH heading "Jumping", which contains a UF "High jumping", and so may therefore be considered to be equivalent to two different headings in RAMEAU and in SWD
Theoretically, it would also be possible to maintain all the links in one partner's database, but none of the partners who manage an authority list saw this as a feasible solution for their institution.
The group saw greater advantages in the creation of an external system or 'metathesaurus' which would contain for each equivalent a record containing an international identifier, giving the identifier of the heading in the different authority files, and maintaining a link with these. (In the diagram below, Mt = metathesaurus record)
A metathesaurus record would be created for each entry, whether it be monolingual (i.e. a heading without an equivalent in another language), bilingual (a heading with an equivalent in one other SHL), trilingual etc.:
This metathesaurus would enable a flexible management structure: specialists in each institution could have the right to establish equivalents (and the resulting links) at the metathesaurus level, subject to review. The experience of both BnF and DDB in the co-operative management of RAMEAU and SWD in their own countries will be beneficial here.
Questions remain concerning the management of see-references and related headings. If they are to take place at a local level, they will remain monolingual, and this will have an impact on searching: the ‘transparent’ parallel approach described above in section 2, will be restricted to preferred terms. If the authority structure is to be duplicated at the metathesaurus level, this has implications for duplicated work and future maintenance.
In addition, the impact of searching headings in the metathesaurus as opposed to searching through strings in the databases needs to be evaluated. It may be that only a Boolean search on headings in databases would be possible, but this needs to be tested in a prototype. This also has implications for the approach described in section 2. It should be noted that none of the automated systems used by the partners currently supports multilingual subject indexing, thesaurus management or searching. If ‘transparent’ multilingual string searching is required, the following points will also need to be studied: presentation of headings in subject indexes (separate indexes, or inter-filed); record display (which headings should be displayed - one language or all); default languages; user interface (visible / invisible switch from one language to another). These questions are especially relevant to SNL since ideally the library wishes to offer users as transparent an approach as possible to multilingual searching..
Whichever data management option is adopted, there are questions of data format and exchange for each partner to be clarified. A UNIMARC authority structure for the metathesaurus could be envisaged, since it is currently the only authority structure to accommodate multilingual links. Conversion programmes to and from UNIMARC from the partner authority formats would be necessary since each partner currently uses a different authority format: BnF uses INTERMARC, DDB uses MAB, BL uses UKMARC and SNL uses USMARC. In addition, the group should study the use of the thesaurus record structure used in the Aquarelle project.
The prototype needs to be tested from the point of view of multilingual searching by the user for bibliographic records within one or more systems, either starting from the metathesaurus and targeting one or more systems, or extending a search in one system through to another in another language via the metathesaurus.
Since users carry out subject searches not only with common nouns, but also with geographic and personal names, the treatment of these and their possible integration needs to be taken into consideration, in co-operation with other work in the same field (e.g. AUTHOR).
In addition, indexer use of the metathesaurus should be investigated to see how it may be used as an aid to indexing, and as an aid to SHL enrichment.
A further point to be considered could be the use of a classification scheme as a mechanism to organise the linking structure.
As mentioned above Aquarelle has used a product that may correspond to the requirements of the group concerning the prototype: it offers an access to heterogeneous databases via a (limited) multilingual thesaurus and Z39.50. The study of the tools used in the project should be a priority for the next stage, particularly in relation to the discussion under the section ‘Prototype’.
The group is aware that the feasibility study raises many questions, which can only be solved by taking the project a stage further i.e. the creation and testing of a prototype. Nonetheless, the group considers that the establishment and use of equivalents in other languages will facilitate access for the user. It will also have advantages for the indexer : the existence of indexing in another language for a document is an aid to indexing for all partners. The creation of links between headings in different languages will increase the ability to make use of work already carried out by a sister institution. Furthermore, the comparison of the lists concerning Sports and Theater resulted in the creation of some headings missing in the lists, and has thus enabled each institution to enrich and improve its own vocabulary, which in turn enriches access for the user and the indexer. In the medium term, the comparison of the lists will improve and enrich each list, and encourage better convergence.
The group therefore encourages the participating institutions to agree to carry out the following:
Comparative analysis of titles indexed using LCSH, RAMEAU and SWD/RSWK
Sports headings methodology
Theater - Results Analysis
The following example tables are downloadable as rtf format files
LCSH headings from selected bibliographic records and RAMEAU / SWD equivalences
RAMEAU headings from selected bibliographic records and LCSH / SWD equivalences
SWD headings from selected bibliographic records and LCSH
/ RAMEAU equivalences
LCSH Monolingual list (Sports)
TRILINGUAL LIST Track-athletics
Trilingual List (Sports)
Trilingual list: Global results (Theater)
Trilingual list: from RAMEAU list (Theater)
Trilingual list: from LCSH list (Theater)
Trilingual list: from SWD list
THEATER: RAMEAU most-used headings in BN-OPALE online catalogue
Hosted by: Gabriel - Gateway to Europe's National Libraries