- Semantic Web and Ontologies
- Scientific Computing and Data Management
- Data Quality and Management
- Topic Modeling
- Research Data Management Practices
- Biomedical Text Mining and Ontologies
- Information Retrieval and Search Behavior
- Advanced Database Systems and Queries
- Web Data Mining and Analysis
- Natural Language Processing Techniques
- Library Science and Information Systems
- Digital and Traditional Archives Management
- Data Visualization and Analytics
- Digital Humanities and Scholarship
- Advanced Text Analysis Techniques
- Data Management and Algorithms
- Bioinformatics and Genomic Networks
- Advanced Data Storage Technologies
- Genetics, Bioinformatics, and Biomedical Research
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Neural Networks and Applications
- Data Mining Algorithms and Applications
- Gene expression and cancer classification
- AI in cancer detection
University of Padua
2016-2025
National Research Institute of Brewing
2021
Citations are the cornerstone of knowledge propagation and primary means assessing quality research, as well directing investments in science. Science is increasingly becoming “data‐intensive,” where large volumes data collected analyzed to discover complex patterns through simulations experiments, most scientific reference works have been replaced by online curated sets. Yet, given a set, there no quantitative, consistent, established way knowing how it has used over time, who contributed...
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. As a result, growing community of researchers has been investigating the equity existing and proposing novel ones, advancing understanding risks opportunities automated decision-making for historically disadvantaged populations. Progress fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, algorithmic...
The digitalization of clinical workflows and the increasing performance deep learning algorithms are paving way towards new methods for tackling cancer diagnosis. However, availability medical specialists to annotate digitized images free-text diagnostic reports does not scale with need large datasets required train robust computer-aided diagnosis that can target high variability cases data produced. This work proposes evaluates an approach eliminate manual annotations tools in digital...
Topic variance has a greater effect on performances than system but it cannot be controlled by developers who can only try to cope with it. On the other hand, is important its own, since what may affect directly changing components and determines differences among systems. In this paper, we face problem of studying in order better understand how much contribute overall performances. To end, propose methodology based General Linear Mixed Model (GLMM) develop statistical models able isolate...
Context. As software systems become more integrated into society's infrastructure, the responsibility of professionals to ensure compliance with various non-functional requirements increases. These include security, safety, privacy, and, increasingly, non-discrimination. Motivation. Fairness in pricing algorithms grants equitable access basic services without discriminating on basis protected attributes. Method. We replicate a previous empirical study that used black box testing audit by...
To promote the responsible development and use of data-driven technologies –such as machine learning artificial intelligence– principles trustworthiness, accountability fairness should be followed. The quality dataset on which these applications rely, is crucial to achieve compliance with required ethical principles. Quantitative approaches measure data are abundant in literature among practitioners, however they not sufficient cover all challenges involved. In this paper, we show that...
If we want to measure the impact of a database, can use its organization treat it same way any other publishing agent, such as journal or an author?
Databases are fundamental to advance biomedical science. However, most of them populated and updated with a great deal human effort. Biomedical Relation Extraction (BioRE) aims shift this burden machines. Among its different applications, the discovery Gene-Disease Associations (GDAs) is one BioRE relevant tasks. Nevertheless, few resources have been developed train models for GDA extraction. Besides, these all limited in size-preventing from scaling effectively large amounts data.
Information retrieval (IR) systems are the prominent means for searching and accessing huge amounts of unstructured information on web elsewhere. They complex systems, constituted by many different components interacting together, evaluation is crucial to both tune improve them. Nevertheless, in current methodology, there still no way determine how much each component contributes overall performances interact together. This hampers possibility a deep understanding IR system behavior and,...
This paper analyzes two state-of-the-art Neural Information Retrieval (NeuIR) models: the Deep Relevance Matching Model (DRMM) and Vector Space (NVSM). Our contributions include: (i) a reproducibility study of supervised unsupervised NeuIR models, where we present issues encountered during their reproducibility; (ii) performance comparison with other lexical, semantic showing that traditional lexical models are still highly competitive DRMM NVSM; (iii) an application NVSM on collections from...
The semantic mismatch between query and document terms—i.e., the gap—is a long-standing problem in Information Retrieval (IR). Two main linguistic features related to gap that can be exploited improve retrieval are synonymy polysemy. Recent works integrate knowledge from curated external resources into learning process of neural language models reduce effect gap. However, these knowledge-enhanced have been used IR mostly for re-ranking not directly retrieval. We propose Semantic-Aware Neural...
Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding knowledge that still largely unexploited. order to allow decoding included reports, we propose an unsupervised extraction system combining a rule-based expert with pre-trained Machine Learning (ML) models, namely Semantic Knowledge Extractor Tool (SKET). Combining techniques and ML models provides high accuracy results extraction. This work demonstrates viability...
In the last decade, scholarly graphs became fundamental to storing and managing knowledge in a structured machine-readable way. Methods tools for discovery impact assessment of science rely on such their quality serve scientists, policymakers, publishers. Since research data very important communication, started including dataset metadata relationships publications. Such are foundations Open Science investigations, data-article publishing workflows, discovery, indicators. However, due...
In this paper we discuss the problem of data citation with a specific focus on Linked Open Data.We outline main requirements methodology must fulfill: (i) uniquely identify cited objects; (ii) provide descriptive metadata; (iii) enable variable granularity citations; and (iv) produce both human-and machine-readable references.We propose based named graphs RDF quad semantics that allows us to create meta-graphs respecting outlined requirements.We also present compelling use case search...