NFDI4DS | UHH-SEMS - Publication Details

Weixin Liang

ORCID: 0000-0001-9924-693X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5076286335

Research Areas

Topic Modeling
Artificial Intelligence in Healthcare and Education
Explainable Artificial Intelligence (XAI)
Speech and dialogue systems
Natural Language Processing Techniques
Machine Learning and Algorithms
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Adversarial Robustness in Machine Learning
SARS-CoV-2 detection and testing
Sentiment Analysis and Opinion Mining
Ethics and Social Impacts of AI
Scientific Computing and Data Management
Algorithms and Data Compression
Advanced Graph Neural Networks
Health Systems, Economic Evaluations, Quality of Life
Privacy-Preserving Technologies in Data
Digital Rights Management and Security
Academic Publishing and Open Access
Technology Assessment and Management
Music and Audio Processing
Nonmelanoma Skin Cancer Studies
Artificial Intelligence in Healthcare
Model Reduction and Neural Networks
Cancer Genomics and Diagnostics

Harbin Medical University
2024

Stanford University
2019-2024

Columbia University
2021

Zhejiang University
2018-2020

Translational Genomics Research Institute
2018

Advances, challenges and opportunities in creating data for trustworthy AI

OPENALEX - Publications

Weixin Liang Girmaw Abebe Tadesse Daniel E. Ho Li Fei-Fei Matei Zaharia and 2 more

10.1038/s42256-022-00516-1 article EN Nature Machine Intelligence 2022-08-17

GPT detectors are biased against non-native English writers

OPENALEX - Publications

Weixin Liang Mert Yüksekgönül Yining Mao Eric Q. Wu James Zou

GPT detectors frequently misclassify non-native English writing as AI generated, raising concerns about fairness and robustness. Addressing the biases in these is crucial to prevent marginalization of speakers evaluative educational settings create a more equitable digital landscape.

10.1016/j.patter.2023.100779 article EN cc-by-nc-nd Patterns 2023-07-01

Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis

OPENALEX - Publications

Weixin Liang Yuhui Zhang Hancheng Cao Binglu Wang Daisy Yi Ding and 7 more

10.1056/aioa2400196 article EN NEJM AI 2024-07-17

Characterizing the Clinical Adoption of Medical AI Devices through U.S. Insurance Claims

OPENALEX - Publications

Kevin Wu Eric Q. Wu Brandon Theodorou Weixin Liang Christina Mack and 3 more

There are now over 500 medical artificial intelligence (AI) devices that approved by the U.S. Food and Drug Administration. However, little is known about where how often these actually used after regulatory approval. In this article, we systematically quantify adoption usage of AI in United States tracking Current Procedural Terminology (CPT) codes explicitly created for AI. CPT widely documenting billing payment procedures, providing a measure device utilization across different clinical...

10.1056/aioa2300030 article EN NEJM AI 2023-11-09

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

OPENALEX - Publications

Weixin Liang Zachary Izzo Yaohui Zhang Haley Lepp Hancheng Cao and 7 more

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by language model (LLM). Our maximum likelihood leverages expert-written and AI-generated reference texts accurately efficiently examine real-world LLM-use at level. apply this case study scientific peer review AI conferences that took place after release ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 EMNLP 2023. results suggest between 6.5% 16.9% submitted as...

10.48550/arxiv.2403.07183 preprint EN arXiv (Cornell University) 2024-03-11

Mapping the Increasing Use of LLMs in Scientific Papers

OPENALEX - Publications

Weixin Liang Yaohui Zhang Zhengxuan Wu Haley Lepp Wenlong Ji and 9 more

Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, to what extent this tool might have an effect on global practices. However, we lack a precise measure proportion...

10.48550/arxiv.2404.01268 preprint EN arXiv (Cornell University) 2024-04-01

Can large language models provide useful feedback on research papers? A large-scale empirical analysis

OPENALEX - Publications

Weixin Liang Yuhui Zhang Hancheng Cao Binglu Wang Daisy Yi Ding and 7 more

Expert feedback lays the foundation of rigorous research. However, rapid growth scholarly production and intricate knowledge specialization challenge conventional scientific mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who more junior or from under-resourced settings have especially hard times getting timely feedback. With breakthrough large language models (LLM) such as GPT-4, there is growing interest in using LLMs generate on research...

10.48550/arxiv.2310.01783 preprint EN other-oa arXiv (Cornell University) 2023-01-01

MOSS: End-to-End Dialog System Framework with Modular Supervision

OPENALEX - Publications

Weixin Liang Youzhi Tian Chengcai Chen Zhou Yu

A major bottleneck in training end-to-end task-oriented dialog system is the lack of data. To utilize limited data more efficiently, we propose Modular Supervision Network (MOSS), an encoder-decoder framework that could incorporate supervision from various intermediate modules including natural language understanding, state tracking, policy learning and generation. With only 60% data, MOSS-all (i.e., MOSS with all four modules) outperforms state-of-the-art models on CamRest676. Moreover,...

10.1609/aaai.v34i05.6349 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

DeepStore

OPENALEX - Publications

Vikram Sharma Mailthody Zaid Qureshi Weixin Liang Ziyan Feng Simon Garcia de Gonzalo and 5 more

Recent advancements in deep learning techniques facilitate intelligent-query support diverse applications, such as content-based image retrieval and audio texturing. Unlike conventional key-based queries, these intelligent queries lack efficient indexing require complex compute operations for feature matching. To achieve high-performance querying against massive datasets, modern computing systems employ GPUs in-conjunction with solid-state drives (SSDs) fast data access parallel processing....

10.1145/3352460.3358320 article EN 2019-10-11

ALICE: Active Learning with Contrastive Natural Language Explanations

OPENALEX - Publications

Weixin Liang James Zou Yu Zhou

Training a supervised neural network classifier typically requires many annotated training samples. Collecting and annotating large number of data points are costly sometimes even infeasible. Traditional annotation process uses low-bandwidth human-machine communication interface: classification labels, each which only provides few bits information. We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop framework that utilizes contrastive natural language...

10.18653/v1/2020.emnlp-main.355 article EN cc-by 2020-01-01

GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

OPENALEX - Publications

Weixin Liang Yanhao Jiang Zixuan Liu

Images are more than a collection of objects or attributes — they represent web relationships among interconnected objects. Scene Graph has emerged as new modality structured graphical representation images. encodes nodes connected via pairwise relations edges. To support question answering on scene graphs, we propose GraphVQA, language-guided graph neural network framework that translates and executes natural language multiple iterations message passing nodes. We explore the design space...

10.18653/v1/2021.maiworkshop-1.12 article EN cc-by 2021-01-01

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

OPENALEX - Publications

Weixin Liang James Zou Yu Zhou

Open Domain dialog system evaluation is one of the most important challenges in research. Existing automatic metrics, such as BLEU are mostly reference-based. They calculate difference between generated response and a limited number available references. Likert-score based self-reported user rating widely adopted by social conversational systems, Amazon Alexa Prize chatbots. However, suffers from bias variance among different users. To alleviate this problem, we formulate comparison task. We...

10.18653/v1/2020.acl-main.126 article EN cc-by 2020-01-01

Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity

OPENALEX - Publications

Weixin Liang Jun-Hong Shen Guojie Zhang Ning Dong Luke Zettlemoyer and 1 more

State Space Models (SSMs) have emerged as efficient alternatives to Transformers for sequential modeling, but their inability leverage modality-specific features limits performance in multi-modal pretraining. Here, we propose Mixture-of-Mamba, a novel SSM architecture that introduces modality-aware sparsity through parameterization of the Mamba block. Building on Mixture-of-Transformers (W. Liang et al. arXiv:2411.04996; 2024), extend benefits SSMs while preserving computational efficiency....

10.48550/arxiv.2501.16295 preprint EN arXiv (Cornell University) 2025-01-27

Adaptive Self-improvement LLM Agentic System for ML Library Development

OPENALEX - Publications

Guojie Zhang Weixin Liang Olivia Hsu Kunle Olukotun

ML libraries, often written in architecture-specific programming languages (ASPLs) that target domain-specific architectures, are key to efficient systems. However, writing these high-performance libraries is challenging because it requires expert knowledge of algorithms and the ASPL. Large language models (LLMs), on other hand, have shown general coding capabilities. challenges remain when using LLMs for generating ASPLs 1) this task complicated even experienced human programmers 2) there...

10.48550/arxiv.2502.02534 preprint EN arXiv (Cornell University) 2025-02-04

DAWSON: A Domain Adaptive Few Shot Generation Framework

OPENALEX - Publications

Weixin Liang Zixuan Liu Can Liu

Training a Generative Adversarial Networks (GAN) for new domain from scratch requires an enormous amount of training data and days time. To this end, we propose DAWSON, Domain Adaptive FewShot Generation FrameworkFor GANs based on meta-learning. A major challenge applying meta-learning is to obtain gradients the generator evaluating it development sets due likelihood-free nature GANs. address challenge, alternative GAN procedure that naturally combines two-step algorithms. DAWSON...

10.48550/arxiv.2001.00576 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Systematic analysis of 32,111 AI model cards characterizes documentation practice in AI

OPENALEX - Publications

Weixin Liang Nazneen Fatema Rajani Xinyu Yang Ezinwanne Ozoani Eric Q. Wu and 3 more

10.1038/s42256-024-00857-z article EN Nature Machine Intelligence 2024-06-21

CU-Net: Component Unmixing Network for Textile Fiber Identification

OPENALEX - Publications

Zunlei Feng Weixin Liang Daocheng Tao Li Sun Anxiang Zeng and 1 more

10.1007/s11263-019-01199-9 article EN International Journal of Computer Vision 2019-07-19

Neural Group Testing to Accelerate Deep Learning

OPENALEX - Publications

Weixin Liang James Zou

Recent advances in deep learning have made the use of large, neural networks with tens millions parameters. The sheer size these imposes a challenging computational burden during inference. Existing work focuses primarily on accelerating each forward pass network. Inspired by group testing strategy for efficient disease testing, we propose which accelerates samples one pass. Groups that test negative are ruled out. If tests positive, then retested adaptively. A key challenge is to modify...

10.1109/isit45174.2021.9518038 article EN 2022 IEEE International Symposium on Information Theory (ISIT) 2021-07-12

Systematic analysis of 50 years of Stanford University technology transfer and commercialization

OPENALEX - Publications

Weixin Liang Scott Elrod Daniel A. McFarland James Zou

This article systematically investigates the technology licensing by Stanford University. We analyzed all inventions marketed Stanford's Office of Technology Licensing (OTL) between 1970 to 2020, with 4,512 from 6,557 inventors. quantified how innovation landscape at changed over time and examined factors that correlate commercial success. found most profitable are predominantly licensed inventors' own startups, have involved larger teams time, proportion female inventors has tripled past 25...

10.1016/j.patter.2022.100584 article EN cc-by-nc-nd Patterns 2022-09-01

Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face

OPENALEX - Publications

Xinyu Yang Weixin Liang James Zou

Advances in machine learning are closely tied to the creation of datasets. While data documentation is widely recognized as essential reliability, reproducibility, and transparency ML, we lack a systematic empirical understanding current dataset practices. To shed light on this question, here take Hugging Face -- one largest platforms for sharing collaborating ML models datasets prominent case study. By analyzing all 7,433 Face, our investigation provides an overview ecosystem insights into...

10.48550/arxiv.2401.13822 preprint EN other-oa arXiv (Cornell University) 2024-01-01

HERALD: An Annotation Efficient Method to Detect User Disengagement in Social Conversations

OPENALEX - Publications

Weixin Liang Kai-Hui Liang Zhou Yu

Weixin Liang, Kai-Hui Zhou Yu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.283 article EN cc-by 2021-01-01

LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering

OPENALEX - Publications

Weixin Liang Feiyang Niu Aishwarya Reganti Govind Thattai Gökhan Tür

The predominant approach to visual question answering (VQA) relies on encoding the image and with a "black-box" neural encoder decoding single token as answer like "yes" or "no". Despite this approach's strong quantitative results, it struggles come up intuitive, human-readable forms of justification for prediction process. To address insufficiency, we reformulate VQA full generation task, which requires model justify its predictions in natural language. We propose LRTA [Look, Read, Think,...

10.48550/arxiv.2011.10731 preprint EN other-oa arXiv (Cornell University) 2020-01-01

What's documented in AI? Systematic Analysis of 32K AI Model Cards

OPENALEX - Publications

Weixin Liang Nazneen Fatema Rajani Xinyu Yang Ezinwanne Ozoani Eric Q. Wu and 3 more

The rapid proliferation of AI models has underscored the importance thorough documentation, as it enables users to understand, trust, and effectively utilize these in various applications. Although developers are encouraged produce model cards, it's not clear how much information or what cards contain. In this study, we conduct a comprehensive analysis 32,111 documentations on Hugging Face, leading platform for distributing deploying models. Our investigation sheds light prevailing card...

10.48550/arxiv.2402.05160 preprint EN arXiv (Cornell University) 2024-02-07

Dissecting the cellular reprogramming and tumor microenvironment in left- and right-sided Colorectal Cancer by single cell RNA sequencing

OPENALEX - Publications

Congxue Hu Xiaozhi Huang Jing Chen Weixin Liang Kaiyue Yang and 5 more

10.1016/j.trsl.2024.12.002 article EN cc-by-nc Translational research 2024-12-01

Coming Soon ...