NFDI4DS | UHH-SEMS - Publication Details

Craig Willis

ORCID: 0000-0002-6148-7196

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5084699532

Research Areas

Scientific Computing and Data Management
Research Data Management Practices
Semantic Web and Ontologies
Information Retrieval and Search Behavior
Distributed and Parallel Computing Systems
Data Quality and Management
Topic Modeling
Advanced Data Storage Technologies
Natural Language Processing Techniques
Advanced Text Analysis Techniques
Web Data Mining and Analysis
Peer-to-Peer Network Technologies
Explainable Artificial Intelligence (XAI)
Information and Cyber Security
Recommender Systems and Techniques
Music and Audio Processing
Meta-analysis and systematic reviews
Smart Agriculture and AI
Network Security and Intrusion Detection
Digital Humanities and Scholarship
Biomedical Text Mining and Ontologies
Wikis in Education and Collaboration
Library Science and Information Systems
Data Management and Algorithms
Advanced Malware Detection Techniques

University of Illinois Urbana-Champaign
2012-2021

National Center for Supercomputing Applications
2017-2020

University of Notre Dame
2020

University of Illinois System
2013-2016

University of North Carolina at Chapel Hill
2011-2014

Analysis and synthesis of metadata goals for scientific data

OPENALEX - Publications

Craig Willis Jane Greenberg Hollie White

The proliferation of discipline‐specific metadata schemes contributes to artificial barriers that can impede interdisciplinary and transdisciplinary research. authors considered this problem by examining the domains , objectives architectures nine used document scientific data in physical, life, social sciences. They a mixed‐methods content analysis G reenberg's ( ) objectives, principles, domains, architectural layout MODAL framework, derived 22 metadata‐related goals from textual...

10.1002/asi.22683 article EN Journal of the American Society for Information Science and Technology 2012-06-26

TROV - A Model and Vocabulary for Describing Transparent Research Objects

OPENALEX - Publications

Meng Li Timothy McPhillips Craig Willis Nikolaus Nova Parulian Bertram Ludäscher and 4 more

The Transparent Research Object Vocabulary (TROV) is a key element of the Transparency Certified (TRACE) approach to ensuring research trustworthiness. In contrast with methods that entail repeating computations in part or full verify descriptions included publication are sufficient reproduce reported results, TRACE depends on controlled computing environment termed System (TRS) guarantee accurate, sufficiently complete, and otherwise trustworthy records captured when results obtained first...

10.2218/ijdc.v19i1.1019 article EN cc-by International Journal of Digital Curation 2025-02-12

TERRA-REF Data Processing Infrastructure

OPENALEX - Publications

Max Burnette Rob Kooper J. D. Maloney Gareth S. Rohde Jeffrey Terstriep and 10 more

The Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) provides a data and computation pipeline responsible for collecting, transferring, processing distributing large volumes of crop sensing genomic genetically informative germplasm sets. primary source these is field scanner system built over an experimental at the University Arizona Maricopa Agricultural Center. uses several different sensors to observe dense collection frequency with...

10.1145/3219104.3219152 article EN Proceedings of the Practice and Experience on Advanced Research Computing 2018-07-12

Trust but Verify: How to Leverage Policies, Workflows, and Infrastructure to Ensure Computational Reproducibility in Publication

OPENALEX - Publications

Craig Willis Victoria Stodden

This article distills findings from a qualitative study of seven reproducibility initiatives to enumerate nine key decision points for journals seeking address concerns about the quality and rigor computational research by expanding peer review publication process. We evaluate our guidance in light recent National Academies Science, Engineering, Medicine (NASEM, 2019) report on Reproducibility Replicability Science recommendation journal audits. present 10 that clarify how contend with...

10.1162/99608f92.25982dcf article EN cc-by Harvard data science review 2020-12-16

Scholar‐built collections: A study of user requirements for research in large‐scale digital libraries

OPENALEX - Publications

Katrina Fenlon Megan Senseney Harriett Green Sayan Bhattacharyya Craig Willis and 1 more

ABSTRACT To realize the great potential value of large‐scale digital libraries, we need a fuller understanding range ways in which scholarly communities conduct research, or want to research within them. Scholars build collections course their work. How can anticipate and support various kinds collection‐building ‐use, order diversity researchers who work libraries books? This paper reports selected results study how user groups HathiTrust Digital Library create use research. aims contribute...

10.1002/meet.2014.14505101047 article EN Proceedings of the American Society for Information Science and Technology 2014-01-01

HIVE: Helping interdisciplinary vocabulary engineering

OPENALEX - Publications

Jane Greenberg Robert M. Losee José Ramón Pérez Agüera Ryan Scherle Hollie White and 1 more

Abstract Editor's Summary HIVE (Helping Interdisciplinary Vocabulary Engineering) is an effort to automatically generate metadata for content, drawing descriptor terms from multiple vocabularies encoded as Simple Knowledge Organization Systems (SKOS). The a response the challenges of interoperability, cost and usability terminology sets often needed adequately describe digital resources. By offering access more than one vocabulary with useful descriptors broad domain, enables aggregating...

10.1002/bult.2011.1720370407 article EN Bulletin of the American Society for Information Science and Technology 2011-04-01

A random walk on an ontology: Using thesaurus structure for automatic subject indexing

OPENALEX - Publications

Craig Willis Robert M. Losee

Relationships between terms and features are an essential component of thesauri, ontologies, a range controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships thesaurus or other vocabulary structures. We introduce methodology for analysis modeling indexing process based on weighted random walk algorithm. The primary goal research is contribution structure process. resulting models evaluated context automatic subject four...

10.1002/asi.22853 article EN Journal of the American Society for Information Science and Technology 2013-05-22

Implementing Computational Reproducibility in the Whole Tale Environment

OPENALEX - Publications

Kyle Chard Niall Gaffney Matthew B. Jones K. Kowalik Bertram Ludäscher and 5 more

We present and define a structured digital object, called "Tale," for the dissemination publication of computational scientific findings in scholarly record. The Tale emerges from NSF funded Whole project (wholetale.org) which is developing environment designed to capture entire pipeline associated with experiment thereby enable reproducibility. A allows researchers create package code, data information about workflow necessary support, review, recreate results reported published research....

10.1145/3322790.3330594 article EN 2019-06-17

Topic Modeling Users' Interpretations of Songs to Inform Subject Access in Music Digital Libraries

OPENALEX - Publications

Kahyun Choi Jin Ha Lee Craig Willis J. Stephen Downie

The assignment of subject metadata to music is useful for organizing and accessing digital collections. Since manual annotation large-scale collections labor-intensive, automatic methods are preferred. Topic modeling algorithms can be used automatically identify latent topics from appropriate text sources. Candidate sources such as song lyrics often too poetic, resulting in lower-quality topics. Users' interpretations provide an alternative source. In this paper, we propose topic discovery...

10.1145/2756406.2756936 article EN 2015-06-12

HIVEing: the effect of a semantic web technology on inter-indexer consistency

OPENALEX - Publications

Hollie White Craig Willis Jane Greenberg

Purpose – The purpose of this paper is to examine the effect Helping Interdisciplinary Vocabulary Engineering (HIVE) system on inter-indexer consistency information professionals when assigning keywords a scientific abstract. This study examined first, potential HIVE users; second, impact had consistency; and third, challenges associated with using HIVE. Design/methodology/approach A within-subjects quasi-experimental research design was used for study. Data were collected task-scenario...

10.1108/jd-07-2012-0083 article EN Journal of Documentation 2014-05-06

The Rockerverse: Packages and Applications for Containerisation with R

OPENALEX - Publications

Daniel Nüst Dirk Eddelbuettel D. Scott Bennett Robrecht Cannoodt Dav Clark and 20 more

The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the and presents current state of packages managing controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-based data processing, production deployment services. variety applications demonstrates power specifically containerisation in general. Across ways...

10.32614/rj-2020-007 article EN The R Journal 2020-01-01

Finding information in books: Characteristics of full‐text searches in a collection of 10 million books

OPENALEX - Publications

Craig Willis Miles Efron

Abstract Searching large collections of digitized books is a relatively new area in information‐seeking and retrieval research, made possible by initiatives such as Google Books the HathiTrust Digital Library. The availability full‐text book transforming how users search interact with information books, but characteristics these changes are unknown. This paper aims to provide insight into searches collection first step broader research agenda intended improve retrieval. To better understand...

10.1002/meet.14505001085 article EN Proceedings of the American Society for Information Science and Technology 2013-01-01

Container-based Analysis Environments for Low-Barrier Access to Research Data

OPENALEX - Publications

Craig Willis Mike I. Lambert Kenton McHenry Christine Kirkpatrick

The growing size of high-value sensor-born or computationally derived scientific datasets are pushing the boundaries traditional models data access and discovery. Due to their size, these often accessible only through systems on which they were created. Access for exploration reproducibility is limited file transfer by applying used store generate original data, infeasible. There a trend toward providing large-scale research in-place via container-based analysis environments. This paper...

10.1145/3093338.3104164 article EN 2017-07-05

Preserving Reproducibility: Provenance and Executable Containers in DataONE Data Packages

OPENALEX - Publications

Bryce Mecum Matthew B. Jones Dave Vieglais Craig Willis

Many data packaging standards are available to researchers and repository operators the choice use an existing standard or create a new one is challenging. We introduce DataONE Data Package which based on OAI-ORE Resource Map standard. describe functionality provides, implementation considerations, compare it standards, discuss future extensions including ability execution environments via WholeTale "Tales"" alternate serialization formats.

10.1109/escience.2018.00019 article EN 2018-10-01

The HIVE impact

OPENALEX - Publications

Hollie White Craig Willis Jane Greenberg

Research has shown that automatic subject indexing is more efficient and consistent than manual indexing; yet many organizations continue to use because of the unacceptable quality automatically produced results. This poster presents results an exploratory experiment examining consistency stemming from a machine-aided approach. The HIVE vocabulary server was used present concepts 31 workshop participants. presentation terms via sequence reduced indexer burden contributed increased...

10.1145/2132176.2132297 article EN Proceedings of the 2011 iConference 2012-02-07

Learning sufficient queries for entity filtering

OPENALEX - Publications

Miles Efron Craig Willis Garrick Sherman

Entity-centric document filtering is the task of analyzing a time-ordered stream documents and emitting those that are relevant to specified set entities (e.g., people, places, organizations). This exemplified by TREC Knowledge Base Acceleration (KBA) track has broad applicability in other modern IR settings. In this paper, we present simple yet effective approach based on learning high-quality Boolean queries can be applied deterministically during filtering. We call these statements...

10.1145/2600428.2609517 article EN 2014-07-03

Application of BagIt-Serialized Research Object Bundles for Packaging and Re-Execution of Computational Analyses

OPENALEX - Publications

Kyle Chard Niall Gaffney Matthew B. Jones K. Kowalik Bertram Ludäscher and 7 more

In this paper we describe our experience adopting the Research Object Bundle (RO-Bundle) format with BagIt serialization (BagIt-RO) for design and implementation of "tales" in Whole Tale platform. A tale is an executable research object intended dissemination computational scientific findings that captures information needed to facilitate understanding, transparency, re-execution review reproducibility at time publication. We platform requirements led adoption BagIt-RO, specifics...

10.1109/escience.2019.00068 article EN 2019-09-01

Reproducibility by Other Means: Transparent Research Objects

OPENALEX - Publications

Timothy McPhillips Craig Willis Michael R. Gryk Santiago Núñez-Corrales Bertram Ludäscher

Research Objects have the potential to significantly enhance reproducibility of scientific research. One important way can do this is by encapsulating means for re-executing computational components studies, thus supporting new form enabled digital computing-exact repeatability. However, also make research more reproducible transparency, a component orthogonal re-executability. We describe here our vision making transparent providing disambiguating claims about generally, and repeatability...

10.1109/escience.2019.00066 article EN 2019-09-01

CHEESE

OPENALEX - Publications

Baijian Yang Rajesh Kalyanam Craig Willis Mike I. Lambert Christine Kirkpatrick

The CHEESE project supplements and enhances traditional cybersecurity education with hands-on, practical experience in common flaws solutions. requires only a web browser, allowing users to develop skills without compromising their own computer or spending hours setting up complex virtual machine (VM) sandbox environment. In this tutorial we will conduct hands-on walkthrough of couple demonstrations on present an overview the platform community-driven contribution development process.

10.1145/3349266.3351393 article EN 2019-09-26

What Makes a Query Temporally Sensitive?

OPENALEX - Publications

Craig Willis Garrick Sherman Miles Efron

This work takes an in-depth look at the factors that affect manual classifications of 'temporally sensitive' information needs. We use qualitative and quantitative techniques to analyze 660 topics from Text Retrieval Conference (TREC) previously used in experimental evaluation temporal retrieval models. Regression analysis is identify previous classifications. explore potential problems with classifications, considering principles guidelines for future on

10.1145/2911451.2914703 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2016-07-07

Coming Soon ...