- Topic Modeling
- Natural Language Processing Techniques
- Speech and dialogue systems
- Multimodal Machine Learning Applications
- South Asian Studies and Conflicts
- Domain Adaptation and Few-Shot Learning
- Speech Recognition and Synthesis
- Indian and Buddhist Studies
- Eurasian Exchange Networks
- Adversarial Robustness in Machine Learning
- Anthropological Studies and Insights
- Text Readability and Simplification
- Historical Geography and Cartography
- Language and cultural evolution
- Explainable Artificial Intelligence (XAI)
- Computational and Text Analysis Methods
- Machine Learning and Algorithms
- Software Engineering Research
- Neural Networks and Applications
- Neural dynamics and brain function
- Robotics and Automated Systems
- Model-Driven Software Engineering Techniques
- Ancient Near East History
- Land Rights and Reforms
- Anomaly Detection Techniques and Applications
Stanford University
2019-2024
RIKEN Center for Advanced Intelligence Project
2023
Mongolia International University
2023
Bar-Ilan University
2021
University of Helsinki
2021
Tel Aviv University
2021
Technical University of Darmstadt
2021
University of Copenhagen
2021
Edinburgh Napier University
2021
Universitat Pompeu Fabra
2021
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and adaptable to wide range downstream tasks. We call these foundation underscore their critically central yet incomplete character. This report provides thorough account opportunities risks models, ranging from capabilities language, vision, robotics, reasoning, human interaction) technical principles(e.g., model architectures, training procedures, data, systems,...
John Hewitt, Christopher D. Manning. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
John Hewitt, Percy Liang. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby model simply tries to predict a masked word in given context. Human language communication is sequences words, but understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been prime mystery human acquisition, while engineering work has mainly proceeded supervised learning on...
Abstract While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze performance of on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. find can degrade significantly when changing position information, indicating current do not robustly make contexts. In particular, we observe often highest occurs at...
Recent work has found evidence that Multilingual BERT (mBERT), a transformer-based multilingual masked language model, is capable of zero-shot cross-lingual transfer, suggesting some aspects its representations are shared cross-lingually. To better understand this overlap, we extend recent on finding syntactic trees in neural networks’ internal to the setting. We show subspaces mBERT recover tree distances languages other than English, and these approximately across languages. Motivated by...
John Hewitt, Daphne Ippolito, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya, Chris Callison-Burch. Proceedings of the 56th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2018.
Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, should strive develop neologisms: new words that represent precise concepts want teach machines, or machine need learn. We start from the premise humans and machines have differing concepts. means interpretability can be framed as a communication problem: must able reference control concepts, communicate machines. Creating shared human-machine language through...
This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with purpose of enabling fast iteration in research and replicable, reliable results. In this we describe design experiment configuration system, demonstrate utility tasks machine translation, speech recognition, multi-tasked translation/parsing. is available at https://github.com/neulab/xnmt
With the advent of conversational assistants, like Amazon Alexa, Google Now, etc., dialogue systems are gaining a lot traction, especially in industrial setting. These typically consist Spoken Language understanding component which, turn, consists two tasks - Intent Classification (IC) and Slot Labeling (SL). Generally, these modeled together jointly to achieve best performance. However, this joint modeling adds model obfuscation. In work, we first design framework for modularization IC-SL...
Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently bounded hierarchical languages reflect the scaffolding of syntax. introduce Dyck-(k,m), well-nested brackets (of k types) and m-bounded nesting depth, reflecting memory needs long-distance dependencies The best known results...
Probing experiments investigate the extent to which neural representations make properties—like part-of-speech—predictable. One suggests that a representation encodes property if probing produces higher accuracy than baseline like non-contextual word embeddings. Instead of using baselines as point comparison, we’re interested in measuring information is contained but not baseline. For example, current methods can detect when more useful identity (a baseline) for predicting part-of-speech;...
Derry Tanti Wijaya, Brendan Callahan, John Hewitt, Jie Gao, Xiao Ling, Marianna Apidianaki, Chris Callison-Burch. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.
Extrapolation to unseen sequence lengths is a challenge for neural generative models of language. In this work, we characterize the effect on length extrapolation modeling decision often overlooked: predicting end process through use special end-of-sequence (EOS) vocabulary item. We study an oracle setting - forcing generate correct at test time compare length-extrapolative behavior networks trained predict EOS (+EOS) with not (-EOS). find that -EOS substantially outperforms +EOS, example...
No one can fully appreciate the great value of this work to all students ethnology until they realize historical importance an accurate classification characteristic differences which divide social strata known as castes living in a country occupying geographical position Bengal. Bengal is practically Deltas Ganges and Brahmaputra, Western rivers, rise Vindhyan range, called by Hindu geographers Sukti mountains, flow down thence Bay It has always been highways Southern tribes moved northward...
An abstract is not available for this content so a preview has been provided. As you have access to content, full PDF via the ‘Save PDF’ action button.
Probes, supervised models trained to predict properties (like parts-of-speech) from representations ELMo), have achieved high accuracy on a range of linguistic tasks. But does this mean that the encode structure or just probe has learned task? In paper, we propose control tasks, which associate word types with random outputs, complement By construction, these tasks can only be by itself. So good probe, (one reflects representation), should selective, achieving task and low accuracy. The...
Modeling derivational morphology to generate words with particular semantics is useful in many text generation tasks, such as machine translation or abstractive question answering. In this work, we tackle the task of derived word generation. That is, attempt “runner” for “someone who runs.” We identify two key problems generating from root and transformations. contribute a novel aggregation model that learns transformations both orthographic functions using sequence-to-sequence models...
In the previous papers of this series I have tried to trace in outline a truthful sketch general course early Indian History. The evidence consulted and set forth has led me believe that government, social institutions, fundamental principles religion country all originated among tribes for most part Dravidian race, who came into India from Euphrates valley. dealing with origin successively simultaneously ruled India, races which they belonged, religious beliefs held. doing also adduced...
A major challenge in both neuroscience and machine learning is the development of useful tools for understanding complex information processing systems. One such tool probes, i.e., supervised models that relate features interest to activation patterns arising biological or artificial neural networks. Neuroscience has paved way using through numerous studies conducted recent decades. In this work, we draw insights from help guide probing research learning. We highlight two important design...