NFDI4DS | UHH-SEMS - Publication Details

Giovanni Colavizza

ORCID: 0000-0002-9806-084X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5005227218

Research Areas

scientometrics and bibliometrics research
Natural Language Processing Techniques
Topic Modeling
Biomedical Text Mining and Ontologies
Semantic Web and Ontologies
Wikis in Education and Collaboration
Complex Network Analysis Techniques
Digital Humanities and Scholarship
Advanced Text Analysis Techniques
Digital and Traditional Archives Management
Research Data Management Practices
Data Quality and Management
Misinformation and Its Impacts
Historical Economic and Social Studies
Handwritten Text Recognition Techniques
Library Science and Information Systems
Art History and Market Analysis
Computational and Text Analysis Methods
Web visibility and informetrics
Open Source Software Innovations
COVID-19 diagnosis using AI
Mathematics, Computing, and Information Processing
Aesthetic Perception and Analysis
Evolutionary Game Theory and Cooperation
Academic Publishing and Open Access

University of Amsterdam
2019-2024

University of Bologna
2023-2024

University of Copenhagen
2024

The Alan Turing Institute
2018-2023

Turing Institute
2018-2023

Universidade Nova de Lisboa
2023

University of Lisbon
2023

Berlin State Library
2023

Europeana Foundation
2023

University College Dublin
2023

The citation advantage of linking publications to research data

OPENALEX - Publications

Giovanni Colavizza Iain Hrynaszkiewicz Isla Staden Kirstie Whitaker Barbara McGillivray

Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors provide data availability statements. As a consequence of this, there has been strong uptake statements in recent literature. Nevertheless, it is still unclear what proportion these actually contain well-formed links data, for example via URL permanent identifier, if an added value providing such links. We consider 531, 889 articles published PLOS BMC,...

10.1371/journal.pone.0230416 article EN cc-by PLoS ONE 2020-04-22

Characterizing in-text citations in scientific articles: A large-scale analysis

OPENALEX - Publications

Kevin W. Boyack Nees Jan van Eck Giovanni Colavizza Ludo Waltman

10.1016/j.joi.2017.11.005 article EN Journal of Informetrics 2017-12-01

Crypto Art: A Decentralized View

OPENALEX - Publications

Massimo Franceschet Giovanni Colavizza T V G Smith Blake Finucane Martin Lukas Ostachowski and 4 more

Abstract Crypto art is limited-edition digital art, cryptographically registered with a token on blockchain. Tokens represent transparent, auditable origin and provenance for piece of art. Blockchain technology allows tokens to be held securely traded without the involvement third parties. draws its origins from conceptual art—sharing immaterial distributive nature artworks, tight blending artworks currency rejection conventional markets institutions. The authors propose collection...

10.1162/leon_a_02003 article EN Leonardo 2020-12-04

Assessing the Impact of OCR Quality on Downstream NLP Tasks

OPENALEX - Publications

Daniel van Strien Kaspar Beelen Mariona Coll Ardanuy Kasra Hosseini Barbara McGillivray and 1 more

A growing volume of heritage data is being digitized and made available as text via optical character recognition (OCR).Scholars libraries are increasingly using OCR-generated for retrieval analysis.However, the process creating through OCR introduces varying degrees error to text.The impact these errors on natural language processing (NLP) tasks has only been partially studied.We perform a series extrinsic assessment -sentence segmentation, named entity recognition, dependency parsing,...

10.5220/0009169004840496 article EN Proceedings of the 14th International Conference on Agents and Artificial Intelligence 2020-01-01

Archives and AI: An Overview of Current Debates and Future Perspectives

OPENALEX - Publications

Giovanni Colavizza Tobias Blanke Charles Jeurgens Julia Noordegraaf

The digital transformation is turning archives, both old and new, into data. As a consequence, automation in the form of artificial intelligence techniques increasingly applied to scale traditional recordkeeping activities, experiment with novel ways capture, organise, access records. We survey recent developments at intersection Artificial Intelligence archival thinking practice. Our overview this growing body literature organised through lenses Records Continuum model. find four broad...

10.1145/3479010 article EN Journal on Computing and Cultural Heritage 2021-12-14

A scientometric overview of CORD-19

OPENALEX - Publications

Giovanni Colavizza Rodrigo Costas Vincent Traag Nees Jan van Eck Thed N. van Leeuwen and 1 more

As the COVID-19 pandemic unfolds, researchers from all disciplines are coming together and contributing their expertise. CORD-19, a dataset of coronavirus publications, has been made available alongside calls to help mine information it contains create tools search more effectively. We analyse delineation publications included in CORD-19 scientometric perspective. Based on comparison Web Science database, we find that provides an almost complete coverage research coronaviruses. not only...

10.1371/journal.pone.0244839 article EN cc-by PLoS ONE 2021-01-07

COVID-19 research in Wikipedia

OPENALEX - Publications

Giovanni Colavizza

Wikipedia is one of the main sources free knowledge on Web. During first few months pandemic, over 5,200 new pages COVID-19 were created, accumulating 400 million page views by mid-June 2020. 1 At same time, an unprecedented amount scientific articles and ongoing pandemic have been published online. Wikipedia’s content based reliable sources, such as literature. Given its public function, it crucial for to rely representative results, especially in a time crisis. We assess coverage...

10.1162/qss_a_00080 article EN cc-by Quantitative Science Studies 2020-07-21

A principled methodology for comparing relatedness measures for clustering publications

OPENALEX - Publications

Ludo Waltman Kevin W. Boyack Giovanni Colavizza Nees Jan van Eck

There are many different relatedness measures, based for instance on citation relations or textual similarity, that can be used to cluster scientific publications. We propose a principled methodology evaluating the accuracy of clustering solutions obtained using these measures. formally show proposed has an important consistency property. The empirical analyses we present publications in fields cell biology, condensed matter physics, and economics. Using BM25 text-based measure as evaluation...

10.1162/qss_a_00035 article EN cc-by Quantitative Science Studies 2020-03-10

A scientometric overview of CORD-19

OPENALEX - Publications

Giovanni Colavizza Rodrigo Costas Vincent Traag Nees Jan van Eck Thed N. van Leeuwen and 1 more

Abstract As the COVID-19 pandemic unfolds, researchers from all disciplines are coming together and contributing their expertise. CORD-19, a dataset of coronavirus publications, has been made available along-side calls to help mine information it contains create tools search more effectively. We analyse delineation publications included in CORD-19 scientometric perspective. Based on comparison Web Science database, we find that provides an almost complete coverage research coronaviruses. not...

10.1101/2020.04.20.046144 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2020-04-20

Quantifying Engagement with Citations on Wikipedia

OPENALEX - Publications

Tiziano Piccardi Мириам Реди Giovanni Colavizza Robert West

Wikipedia is one of the most visited sites on Web and a common source information for many users. As an encyclopedia, was not conceived as original information, but gateway to secondary sources: according Wikipedia's guidelines, facts must be backed up by reliable sources that reflect full spectrum views topic. Although citations lie at heart Wikipedia, little known about how users interact with them. To close this gap, we built client-side instrumentation logging all interactions links...

10.1145/3366423.3380300 article EN 2020-04-20

Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia

OPENALEX - Publications

Harshdeep Singh Robert West Giovanni Colavizza

Abstract Wikipedia’s content is based on reliable and published sources. To this date, relatively little known about what sources Wikipedia relies on, in part because extracting citations identifying cited challenging. close gap, we release Citations, a comprehensive data set of extracted from Wikipedia. We extracted29.3 million 6.1 English articles as May 2020, classified being books, journal articles, or Web content. were thus able to extract 4.0 scholarly publications with...

10.1162/qss_a_00105 article EN cc-by Quantitative Science Studies 2021-01-01

An analysis of the effects of sharing research data, code, and preprints on citations

OPENALEX - Publications

Giovanni Colavizza Lauren Cadwallader Marcel LaFlamme Grégory Dozot Stéphane Lecorney and 2 more

Calls to make scientific research more open have gained traction with a range of societal stakeholders. Open Science practices include but are not limited the early sharing results via preprints and openly outputs such as data code reproducible extensible. Existing evidence shows that adopting has effects in several domains. In this study, we investigate whether one or leads significantly higher citations for an associated publication, which is form academic impact. We use novel dataset...

10.1371/journal.pone.0311493 article EN cc-by PLoS ONE 2024-10-30

Research Data in Scientific Publications: A Cross-Field Analysis

OPENALEX - Publications

Puyu Yang Giovanni Colavizza

Data sharing is fundamental to scientific progress, enhancing transparency, reproducibility, and innovation across disciplines. Despite its growing significance, the variability of data-sharing practices research fields remains insufficiently understood, limiting development effective policies infrastructure. This study investigates evolving landscape practices, specifically focusing on intentions behind data release, reuse, referencing. Leveraging PubMed open dataset, we developed a model...

10.48550/arxiv.2502.01407 preprint EN arXiv (Cornell University) 2025-02-03

Datasheets for Digital Cultural Heritage Datasets

OPENALEX - Publications

Henk Alkemade Steven Claeyssens Giovanni Colavizza Nuno Freire Jörg Lehmann and 3 more

Sparked by issues of quality and lack proper documentation for datasets, the machine learning community has begun developing standardised processes establishing datasheets with intent to provide context information on provenance, purposes, composition, collection process, recommended uses or societal biases reflected in training datasets.This approach fits well practices procedures established GLAM institutions, such as collections' descriptions.However, digital cultural heritage datasets...

10.5334/johd.124 article EN cc-by Journal of Open Humanities Data 2023-01-01

The Closer the Better: Similarity of Publication Pairs at Different Cocitation Levels

OPENALEX - Publications

Giovanni Colavizza Kevin W. Boyack Nees Jan van Eck Ludo Waltman

We investigated the similarities of pairs articles that are cocited at different cocitation levels journal, article, section, paragraph, sentence, and bracket. Our results indicate textual similarity, intellectual overlap (shared references), author authors), proximity in publication time all rise monotonically as level gets lower (from journal to bracket). While main gain similarity happens when moving from article cocitation, changes entail an increase especially section paragraph...

10.1002/asi.23981 article EN Journal of the Association for Information Science and Technology 2017-11-20

Clustering citation histories in the Physical Review

OPENALEX - Publications

Giovanni Colavizza Massimo Franceschet

10.1016/j.joi.2016.07.009 article EN Journal of Informetrics 2016-09-21

Polarization and reliability of news sources in Wikipedia

OPENALEX - Publications

Puyu Yang Giovanni Colavizza

Purpose Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of external sources. News media outlets constitute nearly one-third all citations within Wikipedia. However, embracing such radically open approach also poses the challenge potential introduction biased content or viewpoints into The authors conduct an investigation integrity knowledge Wikipedia, focusing on...

10.1108/oir-02-2023-0084 article EN cc-by Online Information Review 2024-01-18

An analysis of the effects of sharing research data, code, and preprints on citations

OPENALEX - Publications

Giovanni Colavizza Lauren Cadwallader Marcel LaFlamme Grégory Dozot Stéphane Lecorney and 2 more

10.48550/arxiv.2404.16171 preprint EN arXiv (Cornell University) 2024-04-24

Unsilencing colonial archives via automated entity recognition

OPENALEX - Publications

Mrinalini Luthra Konstantin Todorov Charles Jeurgens Giovanni Colavizza

Purpose This paper aims to expand the scope and mitigate biases of extant archival indexes. Design/methodology/approach The authors use automatic entity recognition on archives Dutch East India Company extract mentions underrepresented people. Findings release an annotated corpus baselines for a shared task show that proposed goal is feasible. Originality/value Colonial are increasingly focus attention historians public, broadening access them pressing need archives.

10.1108/jd-02-2022-0038 article EN Journal of Documentation 2023-01-28

Deep Reference Mining From Scholarly Literature in the Arts and Humanities

OPENALEX - Publications

Danny Rodrigues Alves Giovanni Colavizza Frédé́ric Kaplan

We consider the task of reference mining: detection, extraction and classification references within full text scholarly publications. Reference mining brings forward specific challenges, such as need to capture morphology highly abbreviated words dependence among elements a reference, both following codified styles. This is particularly difficult, little explored, with respect literature in arts humanities, where are mostly given footnotes. apply deep learning architecture for from explore...

10.3389/frma.2018.00021 article EN cc-by Frontiers in Research Metrics and Analytics 2018-07-13

A map of Digital Humanities research across bibliographic data sources

OPENALEX - Publications

Gianmarco Spinaci Giovanni Colavizza Silvio Peroni

Abstract This study presents the results of an experiment we performed to measure coverage Digital Humanities (DH) publications in mainstream open and proprietary bibliographic data sources, by further highlighting relations among DH other disciplines. We created a list journals based on manual curation bibliometric data. used that identify sources under consideration. ERIH-PLUS Social Sciences (SSH) publications. analysed citation links they included understand relationship between SSH...

10.1093/llc/fqac016 article EN cc-by Digital Scholarship in the Humanities 2022-03-23

Coming Soon ...