Find out how Europeana performs semantic enrichment and how you can enrich your metadata with linked open vocabularies.
AUTOMATIC SEMANTIC ENRICHMENT AT EUROPEANA
Europeana enriches its data providers’ metadata by automatically linking text strings found in the metadata to controlled terms from Linked Open dataset or vocabularies. This process of “augmenting” the source metadata with additional terms is called semantic enrichment.
The enrichment process can be summarised to two main steps:
Matching the metadata of Europeana CH objects to external semantic data, results in links between these objects and resources from external datasets. The example below shows that the object was automatically enriched with the concept of “Costume” from the DBpedia dataset.
The created links point to additional data such as translated labels or broader labels. In the example given above, this means that the record is supplemented with all the translated labels of the DBpedia concept, as well as, with a link to the broader concept in DBpedia “Fashion” and all its translated labels.
For instance, Europeana enriches places with Geonames, while person names and concepts are enriched with DBpedia.
HELP EUROPEANA SEMANTIC ENRICHMENT BY ENRICHING YOUR OWN METADATA
The Europeana Data Model (EDM) gives support for contextual resources — the so-called ‘semantic layer', including concepts from ‘value vocabularies' like thesauri, authority lists, classifications, either coming from the network of Europeana's providers or from third-party data sources. This means that data providers are strongly encouraged to include links from open and multilingual vocabularies in the metadata you send to Europeana following the EDM recommendations for metadata on contextual resources.
Europeana has developed a small tool that ‘dereferences' the URIs, i.e., that fetches all the multilingual and semantic data that are published as Linked Open Data for vocabulary concepts and other contextual resources on third-party services. Europeana currently dereferences several vocabularies from internationally established initiatives or more specific projects, which you can use as well. The vocabulary mappings to EDM and configuration files used for dereferencing are available on GitHub. If you would like to have your own Linked Open Data vocabulary dereferenced, please mention it to your Europeana contact.
SELECTING TARGET DATASETS FOR AUTOMATIC SEMANTIC ENRICHMENT
The selection of the datasets to perform enrichment with is a crucial step to improve the quality of the enrichment and the overall metadata. We recommend to follow the following steps during the selection:
Analyse the source data: a good knowledge of the source data in terms of topic coverage, gaps, quality issues is necessary before selecting an enrichment target.
Identify the enrichment requirements: before performing an enrichment, the enricher should have already defined the expected results. For instance an enrichment could be performed to improve the overall quality of a dataset. In this case the quality issues to be fixed should be identified before performing the enrichment.
Find datasets available on the Web. We recommend selecting datasets available on the Web. Several inventories are available to help enrichers to source enrichment targets.
Select the enrichment targets. Before selecting a target, the enricher has to evaluate potential targets. We have identified criteria that can be used to evaluate targets against the source data.
Availability and Access: We recommend selecting targets available on the Web and compliant with the Linked Data recipes. These targets should be properly documented and usable under an open licence.
Granularity and Coverage. The enricher should select targets that have the same coverage than the source data or that can complement the source data. Coverage of several languages is highly desirable.
Connectivity. We recommend selecting well-connected targets with incoming and outgoing equivalence links to other targets.
Test the selected target on a sample of source data. One the target is selected, it should be tested on a sample of data before being applied to the whole dataset. A test will allow to verify whether the target really covers the source data or whether it doesn’t introduce semantic ambiguities.
In the case study, Get your vocabularies in Wikidata...so Europeana and others can get them, we are describing the practical steps envisioned, based on recent experimentation done by Sandra Fauconnier consisting of aligning the MIMO vocabulary with Wikidata.
Europeana empowers the cultural heritage sector in its digital transformation. We develop expertise, tools and policies to embrace digital change and encourage partnerships that foster innovation.
All texts are CC BY-SA, images and media licensed individually. Europeana Foundation is registered at the Chamber of Commerce under number 27307531, RSIN number is 8186.80.349.
FIND US ELSEWHERE
Europeana is an initiative of the European Union, financed by the European Union’s Connecting Europe Facility and European Union Member States. The Europeana services, including this website, are operated by a consortium led by the Europeana Foundation under a service contract with the European Commission.
The European Commission does not guarantee the accuracy of the information and accepts no responsibility or liability whatsoever with regard to the information on this website. Neither the European Commission, nor any person acting on the European Commission’s behalf, is responsible or liable for the accuracy or use of the information on this website.
Hi! Could we please enable some additional services for analytics and security? You can always change or withdraw your consent later.