Mercè Crosas

ORCID: 0000-0003-1304-1939
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Scientific Computing and Data Management
  • Research Data Management Practices
  • Data Quality and Management
  • Privacy-Preserving Technologies in Data
  • Big Data and Business Intelligence
  • Distributed and Parallel Computing Systems
  • Semantic Web and Ontologies
  • Computational and Text Analysis Methods
  • Advanced Data Storage Technologies
  • Ethics in Clinical Research
  • Chemistry and Stereochemistry Studies
  • Data Visualization and Analytics
  • scientometrics and bibliometrics research
  • Data Analysis and Archiving
  • Advanced Proteomics Techniques and Applications
  • Online Learning and Analytics
  • Biomedical Text Mining and Ontologies
  • Astronomical Observations and Instrumentation
  • Persona Design and Applications
  • Genomics and Phylogenetic Studies
  • Digital and Cyber Forensics
  • Ethics and Social Impacts of AI
  • Health, Environment, Cognitive Aging
  • COVID-19 Digital Contact Tracing
  • Diverse Global Research Studies

Harvard University
2014-2023

Quantitative BioSciences
2014-2023

Data Management (Italy)
2023

Universitat Politècnica de Catalunya
2023

Barcelona Supercomputing Center
2023

Harvard University Press
2012-2021

IQ Samhällsbyggnad
2017

Library of Congress
2015

Massachusetts General Hospital
2015

Solar Data Analysis Center
2015

There is an urgent need to improve the infrastructure supporting reuse of scholarly data. A diverse set stakeholders-representing academia, industry, funding agencies, and publishers-have come together design jointly endorse a concise measureable principles that we refer as FAIR Data Principles. The intent these may act guideline for those wishing enhance reusability their data holdings. Distinct from peer initiatives focus on human scholar, Principles put specific emphasis enhancing ability...

10.1038/sdata.2016.18 article EN cc-by Scientific Data 2016-03-15

As the coronavirus disease 2019 (COVID-19) epidemic worsens, understanding effectiveness of public messaging and large-scale social distancing interventions is critical. The research health response communities can should use population mobility data collected by private companies, with appropriate legal, organizational, computational safeguards in place. When aggregated, these help refine providing near real-time information about changes patterns human movement.

10.1126/science.abb8021 article EN Science 2020-03-23

The FAIR principles have been widely cited, endorsed and adopted by a broad range of stakeholders since their publication in 2016. By intention, the 15 guiding do not dictate specific technological implementations, but provide guidance for improving Findability, Accessibility, Interoperability Reusability digital resources. This has likely contributed to adoption principles, because individual stakeholder communities can implement own solutions. However, it also resulted inconsistent...

10.1162/dint_r_00024 article EN Data Intelligence 2019-11-01

An increasing number of researchers support reproducibility by including pointers to and descriptions datasets, software methods in their publications. However, scientific articles may be ambiguous, incomplete difficult process automated systems. In this paper we introduce RO-Crate, an open, community-driven, lightweight approach packaging research artefacts along with metadata a machine readable manner. RO-Crate is based on Schema$.$org annotations JSON-LD, aiming establish best practices...

10.3233/ds-210053 article EN cc-by-nc Data Science 2022-01-04

This article presents a study on the quality and execution of research code from publicly-available replication datasets at Harvard Dataverse repository. Research is typically created by group scientists published together with academic papers to facilitate transparency reproducibility. For this study, we define ten questions address aspects impacting reproducibility reuse. First, retrieve analyze more than 2000 over 9000 unique R files 2010 2020. Second, execute in clean runtime environment...

10.1038/s41597-022-01143-6 article EN cc-by Scientific Data 2022-02-21

One way to provide academic credit investigators who gather data in clinical trials would be create a designation of “data author.” This Sounding Board article explores this idea.

10.1056/nejmsb1616595 article EN New England Journal of Medicine 2017-03-29

Abstract Transparent evaluations of FAIRness are increasingly required by a wide range stakeholders, from scientists to publishers, funding agencies and policy makers. We propose scalable, automatable framework evaluate digital resources that encompasses measurable indicators, open source tools, participation guidelines, which come together accommodate domain relevant community-defined FAIR assessments. The components the are: (1) Maturity Indicators – community-authored specifications...

10.1038/s41597-019-0184-5 article EN cc-by Scientific Data 2019-09-20

Reproducibility and reusability of research results is an important concern in scientific communication science policy. A foundational element reproducibility the open persistently available presentation data. However, many common approaches for primary data publication use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies...

10.7717/peerj-cs.1 article EN cc-by PeerJ Computer Science 2015-05-27

The Dataverse Network is an open-source application for publishing, referencing, extracting and analyzing research data. main goal of the to solve problems data sharing through building technologies that enable institutions reduce burden researchers publishers, incentivize them share their By installing software, institution able host multiple individual virtual archives, called dataverses scholars, groups, or journals, providing a publication framework supports author recognition,...

10.1045/january2011-crosas article EN D-Lib Magazine 2011-01-01

Abstract This article presents a practical roadmap for scholarly data repositories to implement citation in accordance with the Joint Declaration of Data Citation Principles, synopsis and harmonization recommendations major science policy bodies. The was developed by Repositories Expert Group, as part Implementation Pilot (DCIP) project, an initiative FORCE11.org NIH-funded BioCADDIE ( https://biocaddie.org ) project. makes 11 specific recommendations, grouped into three phases...

10.1038/s41597-019-0031-8 article EN cc-by Scientific Data 2019-04-10

We analyze data sharing practices of astronomers over the past fifteen years. An analysis URL links embedded in papers published by American Astronomical Society reveals that total number included literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The also shows availability linked material decays with time: 2011, 44% a decade earlier, 2001, were broken. A rough link types to hosted on astronomers' personal websites become unreachable much faster...

10.1371/journal.pone.0104798 article EN cc-by PLoS ONE 2014-08-28

The Evolution of Data Citation: From Principles to Implementation

10.29173/iq504 article EN cc-by-nc IASSIST Quarterly 2014-05-26

EDITOR'S SUMMARY While the conventions of bibliographic citation have been long established, sole focus is on reference to other scholarly works. Access data serving as basis for work has limited. Data extends important access material that largely unavailable sharing, verification and reuse. The Joint Declaration Citation Principles, finalized in February 2014, a formal statement pulling together practices used research publishing arenas common use. declaration encompasses eight principles...

10.1002/bult.2015.1720410313 article EN Bulletin of the Association for Information Science and Technology 2015-02-01

The vast majority of social science research uses small (megabyte- or gigabyte-scale) datasets. These fixed-scale datasets are commonly downloaded to the researcher’s computer where analysis is performed. data can be shared, archived, and cited with well-established technologies, such as Dataverse Project, support published results. trend toward big data—including large-scale streaming data—is starting transform has potential impact policymaking well our understanding social, economic,...

10.1177/0002716215570847 article EN The Annals of the American Academy of Political and Social Science 2015-04-09

Widespread sharing of scientific datasets holds great promise for new discoveries and risks personal privacy. Dataset handling policies play the critical role balancing privacy value. We propose an extensible, formal, theoretical model dataset policies. define binary operators policy composition comparing strictness, such that propositions like "this is stricter than policy" can be formally phrased. Using this model, The are described in a machine-executable human-readable way. further...

10.1109/spw.2016.11 article EN 2016-05-01

By encouraging and requiring that authors share their data in order to publish articles, scholarly journals have become an important actor the movement improve openness of reproducibility research. But how many social science encourage or mandate supporting research findings? How does journal policies vary by discipline? What influences these journals’ decisions adopt such instructions? And what do those instructions look like?We discuss results our analysis 291 highly-ranked publishing...

10.31235/osf.io/9h7ay article EN 2018-03-30

The Research Data Alliance (RDA) is a community-driven organization dedicated to the development and use of technical, social, community infrastructure promoting data sharing data-driven exploration. RDA particularly important for global academic where research often ad hoc, may have short shelf-life, hard fund.At its launch in 2013, struck chord. Since then, has attracted more than 9400 members from 130-plus countries developed used by groups all over world. One founders Francine Berman...

10.1162/99608f92.5e126552 article EN cc-by Harvard data science review 2020-01-31

Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, practice, shared may be unusable, or published results obtained from them irreproducible. Data repository features services contribute significantly to quality, longevity, reusability of datasets. This paper presents a combination original secondary analysis studies focusing on computational reproducibility, curation, gamified design elements that can employed indicate improve...

10.3390/data6020015 article EN Data 2021-02-03

Abstract Transparent evaluations of FAIRness are increasingly required by a wide range stakeholders, from scientists to publishers, funding agencies and policy makers. We propose scalable, automatable framework evaluate digital resources that encompasses measurable indicators, open source tools, participation guidelines, which come together accommodate domain relevant community-defined FAIR assessments. The components the are: (1) Maturity Indicators - community-authored specifications...

10.1101/649202 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2019-05-28

Recent reproducibility case studies have raised concerns showing that much of the deposited research has not been reproducible. One their conclusions was way data repositories store and code cannot fully facilitate due to absence a runtime environment needed for execution. New specialized tools provide cloud-based computational environments encapsulation, thus enabling portability reproducibility. However, they do often enable discoverability, standardized citation, or long-term archival...

10.1145/3391800.3398173 article EN 2020-06-16
Coming Soon ...