Laurel Orr

ORCID: 0000-0002-2183-3541
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Medical Imaging Techniques and Applications
  • Advanced X-ray and CT Imaging
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Topic Modeling
  • Natural Language Processing Techniques
  • Data Quality and Management
  • Medical Image Segmentation Techniques
  • Advanced MRI Techniques and Applications
  • Data Stream Mining Techniques
  • Digital Radiography and Breast Imaging
  • Radiation Dose and Imaging
  • Bayesian Modeling and Causal Inference
  • Graph Theory and Algorithms
  • Web Data Mining and Analysis
  • Adversarial Robustness in Machine Learning
  • Data Visualization and Analytics
  • Privacy-Preserving Technologies in Data
  • Artificial Intelligence in Healthcare and Education
  • Biomedical Text Mining and Ontologies
  • Machine Learning and Data Classification
  • Computational Physics and Python Applications
  • Hydrocarbon exploration and reservoir analysis
  • Digital Image Processing Techniques
  • Geological Modeling and Analysis

Stanford University
2020-2022

Microsoft (United States)
2019-2021

Salesforce (United States)
2021

University of Washington
2014-2020

Microsoft Research (United Kingdom)
2019

Sandia National Laboratories
2013-2014

Sandia National Laboratories California
2012-2014

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and adaptable to wide range downstream tasks. We call these foundation underscore their critically central yet incomplete character. This report provides thorough account opportunities risks models, ranging from capabilities language, vision, robotics, reasoning, human interaction) technical principles(e.g., model architectures, training procedures, data, systems,...

10.48550/arxiv.2108.07258 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Foundation Models (FMs) are models trained on large corpora of data that, at very scale, can generalize to new tasks without any task-specific finetuning. As these continue grow in size, innovations push the boundaries what do language and image tasks. This paper aims understand an underexplored area FMs: classical like cleaning integration. a proof-of-concept, we cast five integration as prompting evaluate performance FMs We find that achieve SoTA tasks, even though they not for identify...

10.14778/3574245.3574258 article EN Proceedings of the VLDB Endowment 2022-12-01

Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks not well understood. We present Holistic Evaluation of Models (HELM) to improve transparency models. First, we taxonomize vast space potential scenarios (i.e. use cases) metrics desiderata) that interest LMs. Then select a broad subset based on coverage feasibility, noting what's missing or underrepresented (e.g. question answering neglected English...

10.48550/arxiv.2211.09110 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural prompt that demonstrates how perform the task and no additional training. Prompting is brittle process wherein small modifications can cause large variations in model predictions, therefore significant effort dedicated towards designing painstakingly "perfect prompt" for task. To mitigate high degree of involved prompt-design, we instead ask whether producing multiple effective, yet imperfect,...

10.48550/arxiv.2210.02441 preprint EN public-domain arXiv (Cornell University) 2022-01-01

With the increased generation and availability of big data in different domains, there is an imminent requirement for analysis tools that are able to 'explain' trends anomalies obtained from this a range users with backgrounds. Wu-Madden (PVLDB 2013) Roy-Suciu (SIGMOD 2014) recently proposed solutions can explain interesting or unexpected answers simple aggregate queries terms predicates on attributes. In paper, we propose generic framework support much richer, insightful explanations by...

10.14778/2856318.2856329 article EN Proceedings of the VLDB Endowment 2015-12-01

A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how disambiguate that appear rarely training data, termed tail entities. Humans use subtle reasoning patterns based on facts, relations, and types unfamiliar Inspired by these patterns, we introduce Bootleg, self-supervised NED system explicitly grounded disambiguation. We define core disambiguation, create learning procedure encourage model learn show weak supervision...

10.48550/arxiv.2010.10363 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Using data statistics, we convert predicates on a table into induced (diPs) that apply the joining tables. Doing so substantially speeds up multi-relation queries because benefits of predicate pushdown can now beyond just tables have predicates. We use diPs to skip exclusively during query optimization; i.e., lead better plans and no overhead execution. study how for complex expressions usefulness varies with statistics used construct distributions. Our results show building using zone-maps...

10.14778/3368289.3368292 article EN Proceedings of the VLDB Endowment 2019-11-01

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due presence of rare entities. Existing approaches are limited by coarse-grained structural resources biomedical knowledge bases as well use training datasets that provide low coverage over uncommon resources. In this work, we address these issues proposing a cross-domain data integration method transfers from general text base domain. We...

10.18653/v1/2021.findings-emnlp.388 preprint EN cc-by 2021-01-01

The industrial machine learning pipeline requires iterating on model features, training and deploying models, monitoring deployed models at scale. Feature stores were developed to manage standardize the engineer's workflow in this end-to-end pipeline, focusing traditional tabular feature data. In recent years, however, development has shifted towards using self-supervised pretrained embeddings as features. Managing these downstream systems that use them introduces new challenges with respect...

10.14778/3476311.3476402 article EN Proceedings of the VLDB Endowment 2021-07-01

We present a probabilistic approach to generate small, query-able summary of dataset for interactive data exploration. Departing from traditional summarization techniques, we use the Principle Maximum Entropy representation that can be used give approximate query answers. develop theoretical framework and formulation our show how it answer queries. then solving techniques three critical optimizations improve preprocessing time accuracy. Lastly, experimentally evaluate work using 5 GB flights...

10.14778/3115404.3115419 article EN Proceedings of the VLDB Endowment 2017-06-01

10.1109/icde60146.2024.00456 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

While much work has been done on applying GPU technology to computed tomography (CT) reconstruction algorithms, many of these implementations focus smaller datasets that are better suited for medical applications. This paper proposes an irregular approach the algorithm design which utilizes hardware's unique cache structure and employs small x-ray image data prefetches host upload GPUs while devices operating large contiguous sub-volumes reconstruction. will improve overall hit-rates thus...

10.1109/sc.companion.2012.42 article EN 2012-11-01

Open world database management systems assume tuples not in the still exist and are becoming an increasingly important area of research. We present Themis, first open that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over entire population. leverage apriori population aggregate information develop combine two different approaches for automatic debiasing: sample reweighting Bayesian network probabilistic modeling. build a prototype...

10.1145/3318464.3380606 article EN 2020-05-29

Estimation of the x-ray attenuation properties an object with respect to energy emitted from source is a challenging task for traditional Bremsstrahlung sources. This exploratory work attempts estimate profile range given profile. Previous has shown that calculating single effective value polychromatic not accurate due non-linearities associated image formation process. Instead, we completely characterize imaging system virtually and utilize iterative search method/constrained optimization...

10.1117/12.2064693 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2014-09-04

We present the motivation, design, implementation, and preliminary evaluation for a service that enables astronomers to study growth history of galaxies by following their `merger trees' in large-scale astrophysical simulations. The uses Myria parallel data management system as back-end D3 visualization library within its graphical front-end. demonstrate at workshop on ~5TB dataset.

10.1145/2627770.2627774 article EN 2014-06-22

This exploratory work investigates the feasibility of extracting linear attenuation functions with respect to energy from a multi-channel radiograph an object interest composed homogeneous material by simulating entire imaging system combined digital phantom and leveraging this information along acquired image. synergistic combination allows for improved estimates on not only effective energy, but spectrum that is coincident detector elements. Material composition identification radiographs...

10.1109/nssmic.2014.7431055 article EN 2021 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC) 2014-11-01

Conventional CPU-based algorithms for Computed Tomography reconstruction lack the computational efficiency necessary to process large, industrial datasets in a reasonable amount of time. Specifically, processing time single-pass, trillion volumetric pixel (voxel) requires months reconstruct using high performance workstation. An optimized, single workstation multi-GPU approach has shown increases by 2-3 orders-of-magnitude; however, future-size, voxel can still take an entire day complete....

10.1109/nssmic.2014.7431130 article EN 2021 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC) 2014-11-01

Although there has been progress in applying GPU-technology to Computed-Tomography reconstruction algorithms, much of the work concentrated on optimizing performance for smaller, medical-scale datasets. Industrial CT datasets can vary widely size and number projections. With new advancements high resolution cameras, it is entirely possible that community may soon need pursue a 100-megapixel detector applications. To reconstruct such massive dataset, simply adding extra GPUs would not be an...

10.1117/12.2023090 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2013-09-26

This work will present the utilization of massively multi-threaded environment graphics processors (GPUs) to improve computation time needed reconstruct large computed tomography (CT) datasets and aris- ing challenges for system implementation. Intelligent algorithm design differs greatly from traditional CPU design. Although a brute force port algo- rithm GPU kernel may yield non-trivial performance gains, further measurable gains could be achieved by designing with consideration given...

10.1117/12.2029995 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2013-09-26

Karan Goel, Laurel Orr, Nazneen Fatema Rajani, Jesse Vig, Christopher Ré. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies: Industry Papers. 2021.

10.18653/v1/2021.naacl-industry.26 article EN cc-by 2021-01-01

Foundation Models (FMs) are models trained on large corpora of data that, at very scale, can generalize to new tasks without any task-specific finetuning. As these continue grow in size, innovations push the boundaries what do language and image tasks. This paper aims understand an underexplored area FMs: classical like cleaning integration. a proof-of-concept, we cast five integration as prompting evaluate performance FMs We find that achieve SoTA tasks, even though they not for identify...

10.48550/arxiv.2205.09911 preprint EN public-domain arXiv (Cornell University) 2022-01-01

This paper will investigate energy-efficiency for various real-world industrial computed-tomography reconstruction algorithms, both CPU- and GPU-based implementations. work shows that the energy required a given is based on performance problem size. There are many ways to describe efficiency, thus this multiple metrics including performance-per-watt, energy-delay product, consumption. found irregular approaches<sup>1</sup> realized tremendous savings in consumption when compared CPU...

10.1117/12.2060721 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2014-09-04
Coming Soon ...