ONTOLISST - Thematic ontologies in social science research data

ONTOLISST project image - photo credits Mariann Kovács

Science cluster & challenges

SSHOC - Social Sciences and Humanities

Annotation of data & type of data

Summary

ONTOLISST combines existing social science ontologies with new algorithms to optimise topic assignment in multilingual contexts. It explores how standard background variables, such as gender or age, can be included in non-thematic metadata. The ultimate goal is to create cost-effective discovery tools in multiple EU languages, improving access to and visibility of social science research for a diverse group of users, from expert audiences to policymakers.

Research domains:

Social Sciences and Humanities

Partner(s):

HUN-REN Centre for Social Sciences (coordinator), Research Documentation Centre, Fondation Nationale des Sciences Politiques, Center for Socio-political Data, and Tampere University, Finnish Social Science Data Archive

Project team member(s):

Alina Danciu, Júlia Egyed-Gergely, Anna Horváth, Jieun Jeong, Éva Kovács, Mari Kleemola, Enikő Meiszterics, Miklós Sebők, Róza Vajda

Challenge

Open Science Service

The ONTOLISST project addresses the challenge of how digital European and international social scientific archives construct realities through thematic assignments in their metadata ontologies. Currently, there is a lack of streamlined, multilingual tools to assign topics and concepts across social scientific data from various archives, which impedes data discoverability and accessibility for diverse audiences. The project team's analyses of the ontologies provide a basis to interpret thematic metadata systems that are often taken for granted, even though they are a product of specific languages, are historically formed and discipline-based practices.

Solution

ONTOLISST proposes to develop the Light Social Science Thesaurus (LiSST), a controlled vocabulary of approximately 100 terms organised into two hierarchical layers. This thesaurus will be used to create a gold standard corpus for automated or semi-automated topic assignment in research data repositories, leveraging annotated samples of social scientific (meta)data in Finnish, English, French, and Hungarian. Additionally, the project investigates how Natural Language Processing (NLP) tools can be employed for efficient topic annotation across languages.

Scientific Impact

Improving Understanding of Metadata Creation
ONTOLISST explores how scientific traditions and languages shape social science metadata, offering insights into how thematic metadata systems influence research interpretations.
Enhancing Data Curation Practices
The project simplifies thematic metadata assignment with a multilingual thesaurus and NLP tools, addressing current complexities and improving both manual and automated curation.
Supporting Semi-Automated Metadata Annotation
By creating a cross-language annotated dataset, ONTOLISST enhances NLP-driven topic assignment, reducing language barriers and improving data discoverability of social science data for both human and machine users.
Fostering Interoperability across RIs
ONTOLISST supports cross-domain data curation standardisation, enabling smaller RIs and non-experts to access and reuse data more efficiently.
Advancing AI-Driven Curation
The project tests AI tools for metadata curation, promoting cost-effective, open science practices that enhance data accessibility across disciplines and regions.

Publications

Danciu, A., Gárdos, J., & Kleemola, M. (2024, December 3). The ONTOLISST project on DDI metadata, vocabularies and NLP, Presentation at EDDI 2024 Conference. Zenodo, DOI: https://doi.org/10.5281/zenodo.14671273
Mari Kleemola (FSD), Sebők Miklós (HUN-REN), Róza Vajda (HUN-REN), The ONTOLISST project and its first results on the use of NLP for automatic topic assignment - Presentation at the ‘Artificial Intelligence and metadata in social science’ webinar 10 February 2025. Zenodo, DOI: https://doi.org/10.5281/zenodo.16794477
Bolton, S., Beeken, J., Balkan, L., Saji, A. & Vajda, R. (2025), The European Language Social Science Thesaurus (ELSST): Keeping it FAIR. Presentation at IASSIST 2025 conference, Bristol, UK. Zenodo, DOI: https://doi.org/10.5281/zenodo.15676417
Gárdos, J., Vajda, R. & Venczel, T. (2025, December 2). Semantic Interoperability in Social Science Repositories. Presentation at EDDI 2025 Conference. Zenodo, DOI: https://doi.org/10.5281/zenodo.17802017
Babolcsay, B. (2026, január 8). Using NLP Methods in Social Sciences - Experience and Opportunities Presented through the Lens of the ONTOLISST Project. Zenodo, DOI: https://doi.org/10.5281/zenodo.18183039
Babolcsay, B., Hertrich, C., & Danciu, A. (2026, január 12). The ONTOLISST project. Using NLP for automated topic assignment for SSH research data. Zenodo, DOI: https://doi.org/10.5281/zenodo.18220030
Babolcsay, B., Hertrich, C., & Danciu, A. (2026, január 12). The ONTOLISST project. Using NLP for automated topic assignment for SSH research data. Zenodo. DOI: https://doi.org/10.5281/zenodo.18220030
Gárdos, J., Vajda, R. & Venczel, T. (2026, February 18). Standardization vs. Preservation? Supporting Interoperability by Enhancing Thematic Metadata at Social Science Archives, DOI: https://doi.org/10.5281/zenodo.18759736 | https://doi.org/10.2218/ijdc.v20i1.1139

Events

2-6 December, 2024 | Chur, Switzerland - EDDI 2024 Conference
10 February, 2025 | Online - Webinar Artificial Intelligence and metadata in social science | Programme - Video recordings
3-6 June, 2025 | Bristol, UK - IASSIST 2025 Conference | Bolton, S., Beeken, J., Balkan, L., Saji, A. & Vajda, R. (2025). The European Language Social Science Thesaurus (ELSST): Keeping it FAIR. Presentation at IASSIST 2025 conference | DOI
17–20 September, 2025 | Berlin School of Library and Information Science at Humboldt-Universität zu Berlin, Germany - Presentation "Standardizing social science metadata for interoperability, discoverability and accessibility" at DOCAM 2025 conference
17-20 September, 2025 | Berlin, Germany - DOCAM 2025 Conference
25 September, 2025 | Paris, France - Workshop Café de la donnée CDSP organised by ONTOLISST at Sciences Po Paris on generating thematic metadata with AI.
01-05 December, 2025 | Budapest, Hungary - ONTOLISST organised the session ‘Challenges of Interoperability’ at the European DDI Users Conference -EDDI 2025 with two presentations on Semantic Interoperability and a Roundtable discussion lead by Roza Vadja (The EDDI conference was organised by ONTOLISST's partner Research Documentation Centre, Centre for Social Sciences, Hungary):
- Danciu, A., Gárdos, J., & Kleemola, M. (2024, December 3). The ONTOLISST project on DDI metadata, vocabularies and NLP. Zenodo | DOI: https://doi.org/10.5281/zenodo.14671273
- Vajda, R., Gárdos, J., & Venczel, T. (2025, December 3). Semantic Interoperability in Social Science Repositories. The 17th Annual European DDI User Conference (EDDI2025), Budapest, Hungary. Zenodo | DOI: https://doi.org/10.5281/zenodo.17802017
10-11 December 2025 | Paris, France - Presentation on the NLP-related results of the project so far by Barbara Babolcsay (poltextLAB), Alina Danciu (CDSP), Chloé Hertrich (CDSP) at the Semaine Data-SHS 2025
16-18 February 2026 | Zagreb, Croatia - 20th International Digital Curation Conference (IDCC) - Conference presentation: Standardisation vs. Preservation? Supporting Interoperability by Enhancing Thematic Metadata at Social Science Archives. DOI: 10.5281/zenodo.18759736.

Keywords

social science ontologies, metadata ontologies, thematic metadata systems, Light Social Science Thesaurus (LiSST), Natural Language Processing

Project start date:

1 December 2024

Project duration:

24 months

Principal investigator

Judit Gárdos

HUN-REN Centre for Social Sciences

BIO

Judit Gárdos, senior research fellow at the HUN-REN Centre for Social Sciences (CSS) and head of the Research Documentation Centre at CSS. Judit is the principal investigator of ONTOLISST.

QUOTE

"Improving access to and visibility of social science research for a diverse group of users, from expert audiences to policymakers."

Resources

ONTOLISST Infographics