Noah A. Smith

ORCID: 0000-0002-2387-9789
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Natural Language Processing Techniques
  • Error Correcting Code Techniques
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Service-Oriented Architecture and Web Services
  • Visual Attention and Saliency Detection
  • Bayesian Modeling and Causal Inference
  • Algorithms and Data Compression
  • Semantic Web and Ontologies
  • Model-Driven Software Engineering Techniques
  • Speech and dialogue systems
  • Swearing, Euphemism, Multilingualism
  • ICT Impact and Policies
  • Gene Regulatory Network Analysis
  • Traffic Prediction and Management Techniques
  • Explainable Artificial Intelligence (XAI)
  • Advanced Software Engineering Methodologies
  • Business Process Modeling and Analysis
  • Music Technology and Sound Studies
  • Digital Rights Management and Security
  • Image Retrieval and Classification Techniques
  • Machine Learning and Algorithms
  • Decision-Making and Behavioral Economics

University of Washington
2020

Carnegie Mellon University
2011-2015

The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018 [2]. These a surprisingly large carbon footprint [38]. Ironically, was inspired by the human brain, which is remarkably energy efficient. Moreover, financial cost of can make it difficult academics, students, and researchers, particular those emerging economies, engage research. This position paper advocates practical solution making...

10.48550/arxiv.1907.10597 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level based on gradients or attention weights. We present first study focused generating natural across several complex visual reasoning tasks: commonsense reasoning, visual-textual entailment, and question answering. The key challenge of accurate rationalization is comprehensive image understanding at all levels: not just their...

10.18653/v1/2020.findings-emnlp.253 article EN 2020-01-01

This letter analyses call detail records of 16 million live calls over Internet-protocol-based telecommunications networks. The objective is to examine the dependency between average duration and quality as perceived by user. Surprisingly, analysis suggests that connection non-monotonic. contradicts common assumption, higher leads longer calls. In light this new finding, use an indicator for (aggregated) user experience must be reconsidered. results also impact modeling behavior. Based on...

10.1109/lwc.2018.2806442 article EN IEEE Wireless Communications Letters 2018-02-15

We establish THumB, a rubric-based human evaluation protocol for image captioning models. Our scoring rubrics and their definitions are carefully developed based on machine- human-generated captions the MSCOCO dataset. Each caption is evaluated along two main dimensions in tradeoff (precision recall) as well other aspects that measure text quality (fluency, conciseness, inclusive language). evaluations demonstrate several critical problems of current practice. Human-generated show...

10.48550/arxiv.2111.08940 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation question Answering), an automatic metric measures faithfulness a generated image its input via visual answering (VQA). Specifically, given input, we automatically generate several question-answer pairs using language model. calculate by...

10.48550/arxiv.2303.11897 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Transformer-based NLP models are powerful but have high computational costs that limit deployment scenarios. Finetuned encoder-decoder popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for improves efficiency on structured output question-answering tasks where multiple outputs required of single input. Our method, prompt-in-decoder (PiD), encodes the input once decodes parallel, boosting both...

10.48550/arxiv.2403.13112 preprint EN arXiv (Cornell University) 2024-03-19

This article explores recent advancements and innovative trends in urban mobility transportation engineering. It delves into the resilience of networks, automated machine learning applications, congestion prediction, sentiment analysis electric vehicle discussions, air mobility, blockchain technology, query-based moment retrieval videos, person re-identification. By reviewing key contributions from various authors, including A Bauranov, S Parks, H Chen, others, this paper highlights...

10.31219/osf.io/645t9 preprint EN 2024-06-27

Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These can memorize and generate content similar to posing potential concerns. Therefore, model creators are motivated develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns for LMs, noting the conceptual similarity (but legal distinction from) DMCA takedown This paper introduces first evaluation of...

10.48550/arxiv.2406.18664 preprint EN arXiv (Cornell University) 2024-06-26

Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify characterize a phenomenon never discussed before, which call semantic leakage, where leak irrelevant information from prompt into generation in unexpected ways. We propose an evaluation setting to detect leakage both by humans automatically, curate diverse test suite for diagnosing behavior, measure significant 13 flagship models. also show that exhibit...

10.48550/arxiv.2408.06518 preprint EN arXiv (Cornell University) 2024-08-12

Today's most advanced multimodal models remain proprietary. The strongest open-weight rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed into open ones. As a result, the community is still missing foundational knowledge about how build performant scratch. We present Molmo, new family of that are state-of-the-art in their class openness. Our key innovation novel, highly detailed image caption dataset collected entirely human...

10.48550/arxiv.2409.17146 preprint EN arXiv (Cornell University) 2024-09-25

Conventional algorithms for training language models (LMs) with human feedback rely on preferences that are assumed to account an "average" user, disregarding subjectivity and finer-grained variations. Recent studies have raised concerns aggregating such diverse often contradictory finetune results in generic generate outputs not preferred by many user groups, as they tend average out styles norms. To address this issue, we draw inspiration from recommendation systems propose ComPO, a method...

10.48550/arxiv.2410.16027 preprint EN arXiv (Cornell University) 2024-10-21

Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones. The underlying training data are simultaneously the most important pieces puzzle portion with least transparency. To bridge this gap, we introduce T\"ULU 3, family fully-open state-of-the-art post-trained alongside its data, code, recipes, serving as comprehensive guide modern techniques....

10.48550/arxiv.2411.15124 preprint EN arXiv (Cornell University) 2024-11-22

When the Precision Bass (guitar) hit market in 1951, it was one of those genuine moments surprise, because no had seen anything like before.

10.1049/et.2016.1033 article EN Engineering & Technology 2016-11-01

We present a neural network architecture to predict point in color space from the sequence of characters color's name. Using large scale color--name pairs obtained an online design forum, we evaluate our model on "color Turing test" and find that, given name, colors predicted by are preferred annotators names created humans. Our datasets demo system available at colorlab.us.

10.48550/arxiv.1609.08777 preprint EN other-oa arXiv (Cornell University) 2016-01-01
Coming Soon ...