ONTOLISST project image - photo credits Mariann Kovács

Science cluster

SSHOC - Social Sciences and Humanities

Summary

ONTOLISST combines existing social science ontologies with new algorithms to optimise topic assignment in multilingual contexts. It explores how standard background variables, such as gender or age, can be included in non-thematic metadata. The ultimate goal is to create cost-effective discovery tools in multiple EU languages, improving access to and visibility of social science research for a diverse group of users, from expert audiences to policymakers.

Research domains:
Social Sciences and Humanities
Partner(s):
HUN-REN Centre for Social Sciences (coordinator), Research Documentation Centre, Fondation Nationale des Sciences Politiques, Center for Socio-political Data, and Tampere University, Finnish Social Science Data Archive
Project team member(s):
Alina Danciu, Júlia Egyed-Gergely, Anna Horváth, Jieun Jeong, Éva Kovács, Mari Kleemola, Enikő Meiszterics, Miklós Sebők, Róza Vajda

Challenge

Open Science Service

The ONTOLISST project addresses the challenge of how digital European and international social scientific archives construct realities through thematic assignments in their metadata ontologies. Currently, there is a lack of streamlined, multilingual tools to assign topics and concepts across social scientific data from various archives, which impedes data discoverability and accessibility for diverse audiences. The project team's analyses of the ontologies provide a basis to interpret thematic metadata systems that are often taken for granted, even though they are a product of specific languages, are historically formed and discipline-based practices.

Solution

ONTOLISST proposes to develop the Light Social Science Thesaurus (LiSST), a controlled vocabulary of approximately 100 terms organised into two hierarchical layers. This thesaurus will be used to create a gold standard corpus for automated or semi-automated topic assignment in research data repositories, leveraging annotated samples of social scientific (meta)data in Finnish, English, French, and Hungarian. Additionally, the project investigates how Natural Language Processing (NLP) tools can be employed for efficient topic annotation across languages.

Scientific Impact

  • Improving Understanding of Metadata Creation
    ONTOLISST explores how scientific traditions and languages shape social science metadata, offering insights into how thematic metadata systems influence research interpretations.
  • Enhancing Data Curation Practices
    The project simplifies thematic metadata assignment with a multilingual thesaurus and NLP tools, addressing current complexities and improving both manual and automated curation.
  • Supporting Semi-Automated Metadata Annotation
    By creating a cross-language annotated dataset, ONTOLISST enhances NLP-driven topic assignment, reducing language barriers and improving data discoverability of social science data for both human and machine users.
  • Fostering Interoperability across RIs
    ONTOLISST supports cross-domain data curation standardisation, enabling smaller RIs and non-experts to access and reuse data more efficiently.
  • Advancing AI-Driven Curation
    The project tests AI tools for metadata curation, promoting cost-effective, open science practices that enhance data accessibility across disciplines and regions.

Keywords
social science ontologies, metadata ontologies, thematic metadata systems, Light Social Science Thesaurus (LiSST), Natural Language Processing
Project start date:
Project duration:
24 months

Principal investigator

Judit Gardos - PI -ONTOLISST
Judit Gárdos
HUN-REN Centre for Social Sciences
BIO

Judit Gárdos, senior research fellow at the HUN-REN Centre for Social Sciences (CSS) and head of the Research Documentation Centre at CSS. Judit is the principal investigator of ONTOLISST. 

QUOTE
"Improving access to and visibility of social science research for a diverse group of users, from expert audiences to policymakers."