Science cluster
Summary
ONTOLISST combines existing social science ontologies with new algorithms to optimise topic assignment in multilingual contexts. It explores how standard background variables, such as gender or age, can be included in non-thematic metadata. The ultimate goal is to create cost-effective discovery tools in multiple EU languages, improving access to and visibility of social science research for a diverse group of users, from expert audiences to policymakers.
Challenge
Open Science Service
The ONTOLISST project addresses the challenge of how digital European and international social scientific archives construct realities through thematic assignments in their metadata ontologies. Currently, there is a lack of streamlined, multilingual tools to assign topics and concepts across social scientific data from various archives, which impedes data discoverability and accessibility for diverse audiences. The project team's analyses of the ontologies provide a basis to interpret thematic metadata systems that are often taken for granted, even though they are a product of specific languages, are historically formed and discipline-based practices.
Solution
ONTOLISST proposes to develop the Light Social Science Thesaurus (LiSST), a controlled vocabulary of approximately 100 terms organised into two hierarchical layers. This thesaurus will be used to create a gold standard corpus for automated or semi-automated topic assignment in research data repositories, leveraging annotated samples of social scientific (meta)data in Finnish, English, French, and Hungarian. Additionally, the project investigates how Natural Language Processing (NLP) tools can be employed for efficient topic annotation across languages.
Scientific Impact
- Improving Understanding of Metadata Creation
ONTOLISST explores how scientific traditions and languages shape social science metadata, offering insights into how thematic metadata systems influence research interpretations. - Enhancing Data Curation Practices
The project simplifies thematic metadata assignment with a multilingual thesaurus and NLP tools, addressing current complexities and improving both manual and automated curation. - Supporting Semi-Automated Metadata Annotation
By creating a cross-language annotated dataset, ONTOLISST enhances NLP-driven topic assignment, reducing language barriers and improving data discoverability of social science data for both human and machine users. - Fostering Interoperability across RIs
ONTOLISST supports cross-domain data curation standardisation, enabling smaller RIs and non-experts to access and reuse data more efficiently. - Advancing AI-Driven Curation
The project tests AI tools for metadata curation, promoting cost-effective, open science practices that enhance data accessibility across disciplines and regions.
Principal investigator
Judit Gárdos, senior research fellow at the HUN-REN Centre for Social Sciences (CSS) and head of the Research Documentation Centre at CSS. Judit is the principal investigator of ONTOLISST.