NFDI4DS | UHH-SEMS - Publication Details

David Talby

ORCID: 0000-0003-2782-5478

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5008312989

Research Areas

Biomedical Text Mining and Ontologies
Topic Modeling
Software System Performance and Reliability
Machine Learning in Healthcare
Software Engineering Techniques and Practices
Artificial Intelligence in Healthcare and Education
Parallel Computing and Optimization Techniques
Distributed and Parallel Computing Systems
Natural Language Processing Techniques
Software Engineering Research
Cloud Computing and Resource Management
Artificial Intelligence in Healthcare
Advanced Software Engineering Methodologies
Interconnection Networks and Systems
Software Testing and Debugging Techniques
Semantic Web and Ontologies
Pharmacovigilance and Adverse Drug Reactions
Service-Oriented Architecture and Web Services
Advanced Database Systems and Queries
Food Security and Health in Diverse Populations
Information Technology Governance and Strategy
Misinformation and Its Impacts
Model-Driven Software Engineering Techniques
Advanced Computational Techniques and Applications
Privacy-Preserving Technologies in Data

John Snow (United States)
2020-2024

Hebrew University of Jerusalem
1999-2009

United States Air Force
2006

Federated benchmarking of medical artificial intelligence with MedPerf

OPENALEX - Publications

Alexandros Karargyris Renato Umeton Micah Sheller Alejandro Aristizabal Johnu George and 69 more

Abstract Medical artificial intelligence (AI) has tremendous potential to advance healthcare by supporting and contributing the evidence-based practice of medicine, personalizing patient treatment, reducing costs, improving both provider experience. Unlocking this requires systematic, quantitative evaluation performance medical AI models on large-scale, heterogeneous data capturing diverse populations. Here, meet need, we introduce MedPerf, an open platform for benchmarking in domain....

10.1038/s42256-023-00652-2 article EN cc-by Nature Machine Intelligence 2023-07-17

Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling

OPENALEX - Publications

David Talby Dror G. Feitelson

Distributed memory parallel systems such as the IBM SP2 execute jobs using variable partitioning. Scheduling in FCFS order leads to severe fragmentation and utilization loss, which lead development of backfilling schedulers EASY. This paper presents a scheduler that improves EAST two ways: It supports both user selected administrative priorities, guarantees bounded wait time for all jobs. The gives each waiting job slack, determines how long it may have before running: 'important' 'heavy'...

10.1109/ipps.1999.760525 article EN 2003-01-20

Spark NLP: Natural Language Understanding at Scale

OPENALEX - Publications

Veysel Kocaman David Talby

Spark NLP is a Natural Language Processing (NLP) library built on top of Apache ML. It provides simple, performant & accurate annotations for machine learning pipelines that can scale easily in distributed environment. comes with 1100+ pretrained and models more than 192+ languages. supports nearly all the tasks modules be used seamlessly cluster. Downloaded 2.7 million times experiencing 9x growth since January 2020, by 54% healthcare organizations as world's most widely enterprise.

10.1016/j.simpa.2021.100058 article EN Software Impacts 2021-02-06

Accurate Clinical and Biomedical Named Entity Recognition at Scale

OPENALEX - Publications

Veysel Kocaman David Talby

We introduce an agile, production-grade clinical and biomedical Named entity recognition (NER) algorithm based on a modified BiLSTM-CNN-Char DL architecture built top of Apache Spark. Our NER implementation establishes new state-of-the-art accuracy 7 8 well-known benchmarks 3 concept extraction challenges: 2010 i2b2/VA extraction, 2014 n2c2 de-identification, 2018 medication extraction. Moreover, models trained using this outperform the commercial solutions, AWS Medical Comprehend Google...

10.1016/j.simpa.2022.100373 article EN Software Impacts 2022-07-19

LangTest: A comprehensive evaluation library for custom LLM and NLP models

OPENALEX - Publications

Arshaan Nazir T. Chakravarthy David Cecchini Rakshit Khajuria Prikshit Sharma and 3 more

The use of natural language processing (NLP) models, including the more recent large models (LLM) in real-world applications obtained relevant success past years. To measure performance these systems, traditional metrics such as accuracy, precision, recall, and f1-score are used. Although it is important to those terms, often requires an holistic evaluation that consider other aspects robustness, bias, toxicity, fairness, safety, efficiency, clinical relevance, security, representation,...

10.1016/j.simpa.2024.100619 article EN Software Impacts 2024-02-10

Agile software testing in a large-scale project

OPENALEX - Publications

David Talby Arie Keren Orit Hazzan Yael Dubinsky

Agile software development in general and Extreme Programming (XP) particular, promote radical changes how organizations traditionally work. We present analyze new data from a real, large-scale agile project to develop business-critical enterprise information system for the Israeli Air Force (IAF). Our results offer evidence that testing practices actually work, dramatically improving quality productivity. describe organization's successful guidelines four key areas: test design activity...

10.1109/ms.2006.93 article EN IEEE Software 2006-07-01

CLEVER: Clinical Large Language Model Evaluation by Expert Review (Preprint)

OPENALEX - Publications

Veysel Kocaman Mustafa Aytuğ Kaya Andrei Marian Feier David Talby

<sec> <title>BACKGROUND</title> The proliferation of both general-purpose and healthcare-specific Large Language Models (LLMs) has intensified the challenge effectively evaluating comparing them. Data contamination plagues validity public benchmarks; self-preference distorts LLM-as-a-judge approaches; there’s a gap between tasks used to test models those in clinical practice. </sec> <title>OBJECTIVE</title> In response, we propose CLEVER: A methodology for blind, randomized, preference-based...

10.2196/preprints.72153 preprint EN 2025-02-04

Governance of an agile software project

OPENALEX - Publications

David Talby Yael Dubinsky

Effective governance of agile software teams is challenging but required to enable wide adoption methodologies, in particular for large-scale projects. In this paper we apply a full lifecycle model projects, focused on the iteration level. The concept demonstrated via case study large-scale, enterprise-critical project that implemented practices. We analyze three events, including metrics triggered event, decisions taken and followup ensure resolution. conclude iterations can be naturally...

10.1109/sdg.2009.5071336 article EN 2009-05-01

Agile metrics at the Israeli Air Force

OPENALEX - Publications

Yael Dubinsky David Talby Orit Hazzan Arie Keren

It is a significant challenge to implement and research agile software development methods in organizations such as the army. Since it differs from industry academia, data gathered army its continuous analysis may enrich community knowledge abut methods. This work describes research, conducted during an entire release, about one team at Israeli Air Force that works according Extreme Programming. The establishment of this investigation first release part long-term process, started last year,...

10.1109/adc.2005.8 article EN 2006-03-30

Improving and Stabilizing Parallel Computer Performance Using Adaptive Backfilling

OPENALEX - Publications

David Talby Dror G. Feitelson

The scheduler is a key component in determining the overall performance of parallel computer, and as we show here, schedulers wide use today exhibit large unexplained gaps during their operation. Also, different scheduling algorithms often vary they show, suggesting that choosing correct for each time frame can improve performance. We present two adaptive achieve this: One chooses by recent past performance, other average degree parallelism, which shown to be correlated algorithmic...

10.1109/ipdps.2005.252 article EN 2005-04-19

Reflections on Reflection in Agile Software Development

OPENALEX - Publications

David Talby Orit Hazzan Yael Dubinsky Arie Keren

This paper analyzes the reflections of an agile team, developing a large-scale project in industry setting. The team uses iteration summary meeting practice, which includes four elements: customer's summary, formal presentation system, review metrics and reflection. technique for entire reflection element particular is described, empirical evidence given to show that it assessed as highly effective, achieving its intended goals, increasing satisfaction. Further, proposed practice supports...

10.1109/agile.2006.45 article EN 2006-08-08

What is worth learning from parallel workloads?

OPENALEX - Publications

Julia Zilber Ofer Amit David Talby

Learning useful and predictable features from past workloads exploiting them well is a major source of improvement in many operating system problems. We review known parallel workload features, argue that the correct approach for future on-line algorithm design as modeling user- session-based modeling, instead analyzing jobs directly done today. then provide statistically sound answers to two basic questions: Which user session are central enough be potentially useful, answered using...

10.1145/1088149.1088200 article EN 2005-06-20

A Co-Plot analysis of logs and models of parallel workloads

OPENALEX - Publications

David Talby Dror G. Feitelson Adi Raveh

We present a multivariate analysis technique called Co-Plot that is especially suitable for few samples of many variables. embeds the multidimensional in two dimensions, way allows key variables to be identified, and relations between both observations analyzed together. When applied workloads on parallel supercomputers, we find stable perpendicular axes highly correlated variables, one representing individual job attributes other multijob attributes. The different workloads, hand, are...

10.1145/1243991.1243993 article EN ACM Transactions on Modeling and Computer Simulation 2007-07-01

Automated De-Identification of Arabic Medical Records

OPENALEX - Publications

Veysel Kocaman Youssef Mellah Hasham Ul Haq David Talby

As Electronic Health Records (EHR) become ubiquitous in healthcare systems worldwide, including Arabic-speaking countries, the dual imperative of safeguarding patient privacy and leveraging data for research quality improvement grows. This paper presents a first-of-its-kind automated de-identification pipeline medical text specifically tailored Arabic language. includes accurate Named Entity Recognition (NER) identifying personal information; obfuscation models to replace sensitive entities...

10.18653/v1/2023.arabicnlp-1.4 article EN cc-by 2023-01-01

Understanding COVID-19 News Coverage using Medical NLP

OPENALEX - Publications

Ali Emre Varol Veysel Kocaman Hasham Ul Haq David Talby

Being a global pandemic, the COVID-19 outbreak received media attention. In this study, we analyze news publications from CNN and The Guardian - two of world's most influential organizations. dataset includes more than 36,000 articles, analyzed using clinical biomedical Natural Language Processing (NLP) models Spark NLP for Healthcare library, which enables deeper analysis medical concepts previously achieved. covers key entities phrases, observed biases, change over time in coverage by...

10.48550/arxiv.2203.10338 preprint EN cc-by arXiv (Cornell University) 2022-01-01

A Process-Complete Automatic Acceptance Testing Framework

OPENALEX - Publications

David Talby Ori Nakar N. Shmueli E. Margolin Arie Keren

We present a new automated software acceptance tests framework. The framework is novel in supporting the entire lifecycle and all QA activities, including test maintenance over multiple versions, interaction with programmers business analysts, traceability to specifications, multi-user cases more. This enables significant increase productivity product quality. compare our other available tools, products frameworks, several patterns anti-patterns for implementing successful testing solution.

10.1109/swste.2005.2 article EN 2005-05-24

Deeper Clinical Document Understanding Using Relation Extraction

OPENALEX - Publications

Hasham Ul Haq Veysel Kocaman David Talby

The surging amount of biomedical literature & digital clinical records presents a growing need for text mining techniques that can not only identify but also semantically relate entities in unstructured data. In this paper we propose framework comprising Named Entity Recognition (NER) and Relation Extraction (RE) models, which expands on previous work three main ways. First, introduce two new RE model architectures -- an accuracy-optimized one based BioBERT speed-optimized utilizing crafted...

10.48550/arxiv.2112.13259 preprint EN cc-by arXiv (Cornell University) 2021-01-01

SYSTEM ANALYSIS AND DESIGN IN A LARGE-SCALE SOFTWARE PROJECT: THE CASE OF TRANSITION TO AGILE DEVELOPMENT

OPENALEX - Publications

Yael Dubinsky Orit Hazzan David Talby Arie Keren

Agile software development methods mainly aim at increasing quality by fostering customer collaboration and performing exhaustive testing. The introduction of Extreme Programming (XP) – the most common agile method into an organization is accompanied with conceptual organizational changes. These changes range from daily-life (e.g., sitting together maintaining informative project environment) continue on management level meeting listening to during whole process concept team which means that...

10.5220/0002451900110018 article EN cc-by-nc-nd 2006-01-01

Improving Clinical Document Understanding on COVID-19 Research with Spark NLP

OPENALEX - Publications

Veysel Kocaman David Talby

Following the global COVID-19 pandemic, number of scientific papers studying virus has grown massively, leading to increased interest in automated literate review. We present a clinical text mining system that improves on previous efforts three ways. First, it can recognize over 100 different entity types including social determinants health, anatomy, risk factors, and adverse events addition other commonly used biomedical entities. Second, processing pipeline includes assertion status...

10.48550/arxiv.2012.04005 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Factors associated with social determinants of health mentions in PubMed clinical case reports from 1975 to 2022: A natural language processing analysis

OPENALEX - Publications

Julio Bonis Veysel Kocaman David Talby

Social determinants of health (SDoH) significantly influence outcomes, accounting for nearly 40% such outcomes globally. These determinants, pivotal in understanding disparities, are insufficiently documented clinical settings and academic narratives. To address this gap, we examined case reports from PubMed (1975&ndash;2022) to identify mentions six specific SDoH, employing a pre-trained named-entity recognition (NER) model Spark natural language processing (NLP). Multivariate logistic...

10.36922/aih.2737 article EN cc-by Deleted Journal 2024-04-17

Coming Soon ...