Erik M. van Mulligen

ORCID: 0000-0003-1377-9386
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Biomedical Text Mining and Ontologies
  • Semantic Web and Ontologies
  • Scientific Computing and Data Management
  • Electronic Health Records Systems
  • Topic Modeling
  • Bioinformatics and Genomic Networks
  • Natural Language Processing Techniques
  • Data Quality and Management
  • Research Data Management Practices
  • Computational Drug Discovery Methods
  • Advanced Text Analysis Techniques
  • Genetics, Bioinformatics, and Biomedical Research
  • Machine Learning in Healthcare
  • Genomics and Rare Diseases
  • Business Process Modeling and Analysis
  • Genomics and Phylogenetic Studies
  • Distributed and Parallel Computing Systems
  • Service-Oriented Architecture and Web Services
  • Pharmacovigilance and Adverse Drug Reactions
  • Biomedical and Engineering Education
  • Gene expression and cancer classification
  • linguistics and terminology studies
  • Advanced Database Systems and Queries
  • Machine Learning in Bioinformatics
  • Web Data Mining and Analysis

Erasmus MC
2015-2024

Erasmus University Rotterdam
2005-2024

Amsterdam University Medical Centers
2022

Vrije Universiteit Amsterdam
2022

University Medical Center
2016

University Hospital and Clinics
2016

Leiden University Medical Center
2009-2016

University of Amsterdam
1991-2014

Nanyang Technological University
2014

ITMO University
2014

There is an urgent need to improve the infrastructure supporting reuse of scholarly data. A diverse set stakeholders-representing academia, industry, funding agencies, and publishers-have come together design jointly endorse a concise measureable principles that we refer as FAIR Data Principles. The intent these may act guideline for those wishing enhance reusability their data holdings. Distinct from peer initiatives focus on human scholar, Principles put specific emphasis enhancing ability...

10.1038/sdata.2016.18 article EN cc-by Scientific Data 2016-03-15

From the scientific community, a lot of effort has been spent on correct identification gene and protein names in text, while less chemical names. Dictionary-based term power to recognize diverse representation information literature map chemicals their database identifiers.We developed dictionary for small molecules drugs combining from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB ChemIDplus. Rule-based filtering, manual check highly frequent terms disambiguation rules were applied. We tested...

10.1093/bioinformatics/btp535 article EN Bioinformatics 2009-09-16

Abstract WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from edits create automatic copies data. Semantic technology captures concepts co-occurring one sentence and thus potential factual statements. In addition, indirect associations between calculated. We call on 'million minds' annotate concepts' collect facts literature with reward collaborative...

10.1186/gb-2008-9-5-r89 article EN cc-by Genome biology 2008-05-28

Background and objective In order for computers to extract useful information from unstructured text, a concept normalization system is needed link relevant concepts in text sources that contain further about the concept. Popular tools biomedical field are dictionary-based. this study we investigate usefulness of natural language processing (NLP) as an adjunct dictionary-based normalization. Methods We compared performance two systems, MetaMap Peregrine, on Arizona Disease Corpus, with...

10.1136/amiajnl-2012-001173 article EN cc-by-nc-nd Journal of the American Medical Informatics Association 2012-10-06

Abstract Motivation: Full-text documents potentially hold more information than their abstracts, but require resources for processing. We investigated the added value of full text over abstracts in terms content and occurrences gene symbol—gene name combinations that can resolve gene-symbol ambiguity. Results: analyzed a set 3902 biomedical full-text articles. Different keyword measures indicate density is highest coverage texts much greater abstracts. Analysis five different standard...

10.1093/bioinformatics/bth291 article EN Bioinformatics 2004-05-06

The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. generation this requires the from automatic annotation systems be harmonized. In first phase, five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All delivered in common format included concept identifiers boundary assignments enabled comparison alignment results. During harmonization results produced...

10.1142/s0219720010004562 article EN Journal of Bioinformatics and Computational Biology 2010-02-01

Abstract Motivation: Knowledge of drug–drug interactions (DDIs) is crucial for health-care professionals to avoid adverse effects when co-administering drugs patients. As most newly discovered DDIs are made available through scientific publications, automatic DDI extraction highly relevant. Results: We propose a novel feature-based approach extract from text. Our consists three steps. First, we apply text preprocessing convert input sentences given dataset into structured representations....

10.1093/bioinformatics/btu557 article EN Bioinformatics 2014-08-20

Data in the life sciences are extremely diverse and stored a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG pathway or UniProt protein data) to that general-purpose FigShare, Zenodo, Dataverse EUDAT). These have widely different levels sensitivity security considerations. For example, clinical observations about genetic mutations patients highly sensitive, while species diversity generally not. The lack uniformity models one repository...

10.7717/peerj-cs.110 article EN cc-by PeerJ Computer Science 2017-04-24

Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that expensive cumbersome construct. We developed a knowledge-based system requires minimal training data, applied the for of adverse drug events from text. The consists concept recognition module identifies drugs effects in sentences, knowledge-base establishes whether exists between recognized concepts. knowledge base was filled with information Unified Medical Language...

10.1186/1471-2105-15-64 article EN cc-by BMC Bioinformatics 2014-03-04

The advances in bioinformatics required to annotate human genomic variants and place them public data repositories have not kept pace with their discovery. Moreover, a law of diminishing returns has begun operate both terms publication submission. Although the continued deposition such domain is essential maximize scientific clinical utility, rewards for sharing are few, representing serious practical impediment To date, two main strategies been adopted as means encourage submission variant...

10.1002/humu.22144 article EN Human Mutation 2012-06-27

Competitions in text mining have been used to measure the performance of automatic processing solutions against a manually annotated gold standard corpus (GSC). The preparation GSC is time-consuming and costly final consists at most few thousand documents with limited set semantic groups. To overcome these shortcomings, CALBC project partners (PPs) produced large-scale biomedical four different groups through harmonisation annotations from solutions, first version Silver Standard Corpus...

10.1186/2041-1480-2-s5-s11 article EN cc-by Journal of Biomedical Semantics 2011-01-01

There is growing interest in whether social media can capture patient-generated information relevant for medicines safety surveillance that cannot be found traditional sources. The aim of this study was to evaluate the potential contribution mining networks using following associations as case studies: (1) rosiglitazone and cardiovascular events (i.e. stroke myocardial infarction); (2) human papilloma virus (HPV) vaccine infertility. We collected publicly accessible, English-language posts...

10.1007/s40264-015-0333-5 article EN cc-by-nc Drug Safety 2015-08-04

Background: Patients with cancer often have to make complex decisions about treatment, the options varying in risk profiles and effects on survival quality of life. Moreover, inefficient care paths it hard for patients participate shared decision-making. Data-driven decision-support tools potential empower patients, support personalized care, improve health outcomes promote equity. However, currently seldom consider life or individual preferences, their use clinical practice remains limited,...

10.1177/26323524231225249 article EN cc-by-nc Palliative Care and Social Practice 2024-01-01

Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on number publicly available databases tested it an annotated corpus. To achieve acceptable recall precision used automatic semi-automatic processing steps together with disambiguation rules. However, remained to be investigated which impact extensive manual curation multi-source chemical would have term text. ChemSpider is database that has undergone aimed at...

10.1186/1758-2946-2-3 article EN cc-by Journal of Cheminformatics 2010-03-23

Abstract Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, patent claims) in English, French, German, Spanish, Dutch. Three annotators per language independently annotated the concepts, based on subset of Unified Medical Language System covering wide range semantic groups. reduce annotation workload, automatically generated...

10.1093/jamia/ocv037 article EN cc-by-nc Journal of the American Medical Informatics Association 2015-05-05

Background Drug-related adverse events remain an important cause of morbidity and mortality impose huge burden on healthcare costs. Routinely collected electronic data give a good snapshot how drugs are being used in ‘real-world’ settings. Objective To describe strategy that identifies potentially drug-induced acute myocardial infarction (AMI) from large international network. Methods Post-marketing safety surveillance was conducted seven population-based databases three countries (Denmark,...

10.1371/journal.pone.0072148 article EN cc-by PLoS ONE 2013-08-28

Abstract Compounds that are candidates for drug repurposing can be ranked by leveraging knowledge available in the biomedical literature and databases. This knowledge, spread across a variety of sources, integrated within graph, which thereby comprehensively describes known relationships between concepts, such as drugs, diseases, genes, etc. Our work uses semantic information disease concepts features, extracted from an existing graph integrates 200 different biological sources. RepoDB,...

10.1038/s41598-019-42806-6 article EN cc-by Scientific Reports 2019-04-18

We describe our approach to the chemical–disease relation (CDR) task in BioCreative V challenge. The CDR consists of two subtasks: automatic disease-named entity recognition and normalization (DNER), extraction chemical-induced diseases (CIDs) from Medline abstracts. For DNER subtask, we used concept tool Peregrine, combination with several optimization steps. CID system, which named RELigator, was trained on a rich feature set, comprising features derived graph database containing prior...

10.1093/database/baw046 article EN cc-by Database 2016-01-01
Coming Soon ...