James Wexler

ORCID: 0009-0006-8105-6998
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Explainable Artificial Intelligence (XAI)
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Artificial Intelligence in Healthcare and Education
  • Machine Learning and Data Classification
  • Machine Learning in Healthcare
  • Adversarial Robustness in Machine Learning
  • Software Engineering Research
  • Ethics and Social Impacts of AI
  • AI in Service Interactions
  • Artificial Intelligence in Law
  • Legal Education and Practice Innovations
  • Electronic Health Records Systems
  • Music Technology and Sound Studies
  • Judicial and Constitutional Studies
  • Music and Audio Processing
  • Neuroscience and Music Perception
  • Data Quality and Management
  • Diabetes Treatment and Management
  • Scientific Computing and Data Management
  • Machine Learning in Materials Science
  • Big Data and Business Intelligence
  • Healthcare Technology and Patient Monitoring
  • Data Stream Mining Techniques

Google (United States)
2017-2024

East Stroudsburg University
2023

Abstract Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority information in each patient’s record. We propose representation patients’ entire raw records based on Fast Healthcare Interoperability Resources (FHIR) format....

10.1038/s41746-018-0029-1 article EN cc-by npj Digital Medicine 2018-05-04

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an neural net's state in terms human-friendly key idea view the high-dimensional net aid, not obstacle. We show how use CAVs part technique, Testing with (TCAV), that...

10.48550/arxiv.1711.11279 preprint EN other-oa arXiv (Cornell University) 2017-01-01

A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, analyze ML systems, with minimal coding. The Tool lets test hypothetical situations, importance different data features, visualize model behavior multiple models subsets input data. It also measure according fairness metrics....

10.1109/tvcg.2019.2934619 article EN cc-by IEEE Transactions on Visualization and Computer Graphics 2019-01-01

We present a design study of the TensorFlow Graph Visualizer, part machine intelligence platform. This tool helps users understand complex learning architectures by visualizing their underlying dataflow graphs. The works applying series graph transformations that enable standard layout techniques to produce legible interactive diagram. To declutter graph, we decouple non-critical nodes from layout. provide an overview, build clustered using hierarchical structure annotated in source code....

10.1109/tvcg.2017.2744878 article EN IEEE Transactions on Visualization and Computer Graphics 2017-08-29

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make decisions. Most the current explanation methods provide explanations through feature importance scores, which identify features that for each individual input. However, how systematically summarize interpret such per sample scores itself is challenging. In this work, we propose principles desiderata \emph{concept} based explanation, goes beyond per-sample...

10.48550/arxiv.1902.03129 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020.

10.18653/v1/2020.emnlp-demos.15 article EN cc-by 2020-01-01

Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Bard or OpenAI's ChatGPT, it's unclear whether usage these aids users across various contexts. To better understand how access to affects productivity and trust, we conducted a mixed-methods, task-based user study, observing 76 software engineers (N=76) they completed programming exam with without Bard. Effects on performance, efficiency, satisfaction, trust vary...

10.1145/3640543.3645198 article EN 2024-03-18

Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods steering chatbot's outputs, such as prompt engineering fine-tuning, do not support in converting natural feedback on the model's outputs changes or model. In this work, we explore how enable interactively refine through feedback, by helping them convert into set of principles (i.e. constitution) that dictate behavior. From formative study, (1) found...

10.1145/3640543.3645144 article EN 2024-03-18

To make music composition more approachable, we designed the first AI-powered Google Doodle, Bach where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in style of Bach. For to input melodies, simplified sheet-music based interface. support an interactive experience at scale, re-implemented TensorFlow.js (Smilkov 2019) run browser reduced its runtime from 40s 2s adopting dilated depth-wise separable convolutions fusing...

10.48550/arxiv.1907.06637 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing results this raises scalability and interpretability challenges. In paper, we present LLM Comparator, novel visual analytics tool for interactively automatic evaluation. The supports interactive workflows users understand when why model performs better or worse than baseline model, how two are qualitatively different. We...

10.1145/3613905.3650755 article EN 2024-05-02

Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data practitioners often rely on dataset summaries, especially distributions various derived features. Some features, like toxicity or topics, are to many datasets, but interesting features domain specific: instruments and genres for a music dataset, diseases symptoms medical dataset. Accordingly, data run custom analyses each which cumbersome use unsupervised methods....

10.1145/3613905.3650798 article EN 2024-05-11

Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing results this raises scalability and interpretability challenges. In paper, we present LLM Comparator, novel visual analytics tool for interactively automatic evaluation. The supports interactive workflows users understand when why model performs better or worse than baseline model, how two are qualitatively different. We...

10.48550/arxiv.2402.10524 preprint EN arXiv (Cornell University) 2024-02-16

Evaluating large language models (LLMs) presents unique challenges. While automatic side-by-side evaluation, also known as LLM-as-a-judge, has become a promising solution, model developers and researchers face difficulties with scalability interpretability when analyzing these evaluation outcomes. To address challenges, we introduce LLM Comparator, new visual analytics tool designed for evaluations of LLMs. This provides analytical workflows that help users understand why one outperforms or...

10.1109/tvcg.2024.3456354 article EN cc-by IEEE Transactions on Visualization and Computer Graphics 2024-01-01

Making decisions about what clinical tasks to prepare for is multi-factored, and especially challenging in intensive care environments where resources must be balanced with patient needs. Electronic health records (EHRs) are a rich data source, but task-agnostic can difficult use as summarizations of needs specific task, such "could this need ventilator tomorrow?" In paper, we introduce ClinicalVis, an open-source EHR visualization-based prototype system task-focused design evaluation...

10.48550/arxiv.1810.05798 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Interpretability techniques aim to provide the rationale behind a model's decision, typically by explaining either an individual prediction (local explanation, e.g. 'why is this patient diagnosed with condition') or class of predictions (global set patients condition in general'). While there are many methods focused on one, few frameworks can both local and global explanations consistent manner. In work, we combine two powerful existing techniques, one (Integrated Gradients, IG) (Testing...

10.48550/arxiv.2106.08641 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Models are interpretable when machine learning (ML) practitioners can readily understand the reasoning behind their predictions. Ironically, little is known about ML practitioners' experience of discovering and adopting novel interpretability techniques in production settings. In a qualitative study with 18 at large technology company working text data, we found that despite varied tasks, experienced nearly identical challenges related to methods model analysis workflows. These stem from...

10.1145/3563657.3596046 article EN 2023-07-10

Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data workers often rely on dataset summaries, especially distributions various derived features. Some features, like toxicity or topics, are to many datasets, but interesting features domain specific: instruments and genres for a music dataset, diseases symptoms medical dataset. Accordingly, data run custom analyses each which cumbersome difficult. We present...

10.48550/arxiv.2402.14880 preprint EN arXiv (Cornell University) 2024-02-21

Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Gemini (formerly Bard) or OpenAI's ChatGPT, it's unclear whether usage these aids users across various contexts. To better understand how access to affects productivity and trust, we conducted a mixed-methods, task-based user study, observing 76 software engineers (N=76) they completed programming exam with without Bard. Effects on performance, efficiency,...

10.1145/3640543.3645198 preprint EN arXiv (Cornell University) 2024-02-28

Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still difficult and tedious process. In this work, we introduce ConstitutionalExperts, method for learning prompt consisting constitutional principles (i.e. rules), training dataset. Unlike prior methods that optimize as single entity, our incrementally improves by surgically editing individual principles. We also show can improve overall performance unique prompts different...

10.48550/arxiv.2403.04894 preprint EN arXiv (Cornell University) 2024-03-07

As large language models (LLMs) grow increasingly adept at processing unstructured text data, they offer new opportunities to enhance data curation workflows. This paper explores the evolution of LLM adoption among practitioners a technology company, evaluating impact LLMs in tasks through participants' perceptions, integration strategies, and reported usage scenarios. Through series surveys, interviews, user studies, we provide timely snapshot how organizations are navigating pivotal moment...

10.48550/arxiv.2412.16089 preprint EN arXiv (Cornell University) 2024-12-20
Coming Soon ...