- Topic Modeling
- Natural Language Processing Techniques
- Speech and dialogue systems
- Computer Graphics and Visualization Techniques
- Advanced Vision and Imaging
- Speech Recognition and Synthesis
- 3D Shape Modeling and Analysis
- Multimodal Machine Learning Applications
- Image Processing and 3D Reconstruction
- AI in Service Interactions
- Advanced Text Analysis Techniques
- Medical Image Segmentation Techniques
- Neural Networks and Applications
- Advanced Image and Video Retrieval Techniques
- Context-Aware Activity Recognition Systems
- Digital Media Forensic Detection
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Text Readability and Simplification
- Generative Adversarial Networks and Image Synthesis
- Machine Learning in Healthcare
- Authorship Attribution and Profiling
- Hate Speech and Cyberbullying Detection
- Robotics and Sensor-Based Localization
International Institute of Information Technology, Hyderabad
2024
Art Institute of Portland
2024
Indian Institute of Technology Hyderabad
2022-2024
Google (United States)
2023
University of California, Santa Barbara
2023
University of Rochester
2023
Georgia Institute of Technology
2023
Stanford University
2023
MultiWOZ 2.0 (Budzianowski et al., 2018) is a recently released multi-domain dialogue dataset spanning 7 distinct domains and containing over 10,000 dialogues. Though immensely useful one of the largest resources its kind to-date, has few shortcomings. Firstly, there substantial noise in state annotations utterances which negatively impact performance state-tracking models. Secondly, follow-up work (Lee 2019) augmented original with user acts. This leads to multiple co-existent versions same...
Radiance Fields (RF) are popular to represent casually-captured scenes for new view synthesis and several applications beyond it. Mixed reality on personal spaces needs understanding manipulating represented as RFs, with semantic segmentation of objects an important step. Prior efforts show promise but don't scale complex diverse appearance. We present the ISRF method interactively segment fine structure Nearest neighbor feature matching using distilled features identifies high-confidence...
Nanoscience plays a pivotal role in mitigating lethal pollutants, contributing to environmental rejuvenation. The rising demand for magnetic nanomaterials is driven by their extensive applications drug delivery, biosensors, remediation, resonance imaging (MRI), catalysis and cell separation. Various synthesis methods including solvothermal, co-precipitation, thermal decomposition, hydrothermal microemulsion processes, have been developed prepare these materials. This study highlights the...
Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani, Max Chang, HyunJeong Choe, David Greene, Chuan He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Zhou Yu. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.
Conversational agents are exploding in popularity. However, much work remains the area of non goal-oriented conversations, despite significant growth research interest over recent years. To advance state art conversational AI, Amazon launched Alexa Prize, a 2.5-million dollar university competition where sixteen selected teams built to deliver best social experience. Prize provided academic community with unique opportunity perform live system used by millions users. The subjectivity...
As open-ended human-chatbot interaction becomes commonplace, sensitive content detection gains importance. In this work, we propose a two stage semi-supervised approach to bootstrap large-scale data for automatic language from publicly available web resources. We explore various selection methods including 1) using blacklist rank online discussion forums by the level of their sensitiveness followed randomly sampling utterances and 2) training weakly supervised model in conjunction with...
Traditional Radiance Field (RF) representations capture details of a specific scene and must be trained afresh on each scene. Semantic feature fields have been added to RFs facilitate several segmentation tasks. Generalised RF learn the principles view interpolation. A generalised can render new views an unknown untrained scene, given few views. We present way distil into GNT representation. Our GSN representation generates unseen scenes fly along with consistent, per-pixel semantic...
Extending semantic parsers to code-switched input has been a challenging problem, primarily due lack of supervised training data. In this work, we introduce CST5, new data augmentation technique that finetunes T5 model using small seed set ($\approx$100 utterances) generate utterances from English utterances. We show CST5 generates high quality data, both intrinsically (per human evaluation) and extrinsically by comparing baseline models which are trained without with augmented Empirically...
Retraining modern deep learning systems can lead to variations in model performance even when trained using the same data and hyper-parameters by simply different random seeds. This phenomenon is known as churn or jitter. issue often exacerbated real world settings, where noise may be introduced collection process. In this work we tackle problem of stable retraining with a novel focus on structured prediction for conversational semantic parsing. We first quantify introducing metrics...
Traditional Radiance Field (RF) representations capture details of a specific scene and must be trained afresh on each scene. Semantic feature fields have been added to RFs facilitate several segmentation tasks. Generalised RF learn the principles view interpolation. A generalised can render new views an unknown untrained scene, given few views. We present way distil into GNT representation. Our GSN representation generates unseen scenes fly along with consistent, per-pixel semantic...
Large-scale capturing of real-world scenes as 3D point clouds (e.g., using LIDAR scanning) generates billions points that are challenging to visualize. High storage requirements prevent the quick and easy inspection captured datasets on user-grade hardware. The fastest real-time rendering methods limited by available GPU memory render only around 1 billion interactively. We show we can achieve state-of-the-art in both while simultaneously supporting surpass capabilities other methods....
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, resource requirements limit usability. Especially on constrained devices, training performance degrades quickly often cannot complete due to excessive memory consumption of the model. The method converges an indefinite number Gaussians—many them redundant—making rendering unnecessarily slow preventing usage in downstream tasks that expect fixed-size inputs. To...
Research interest in task-oriented dialogs has increased as systems such Google Assistant, Alexa and Siri have become ubiquitous everyday life. However, the impact of academic research this area been limited by lack datasets that realistically capture wide array user pain points. To enable on some more challenging aspects parsing realistic conversations, we introduce PRESTO, a public dataset over 550K contextual multilingual conversations between humans virtual assistants. PRESTO contains...
William Held, Christopher Hidey, Fei Liu, Eric Zhu, Rahul Goel, Diyi Yang, Rushin Shah. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.
Stylized view generation of scenes captured casually using a camera has received much attention recently. The geometry and appearance the scene are typically as neural point sets or radiance fields in previous work. An image stylization method is used to stylize by training its network jointly iteratively with structure capture network. state-of-the-art SNeRF [29] trains NeRF an alternating manner. These methods have high time require joint optimization. In this work, we present StyleTRF,...
Machine learning approaches for building task-oriented dialogue systems require large conversational datasets with labels to train on. We are interested in from human-human conversations, which may be available ample amounts existing customer care center logs or can collected crowd workers. Annotating these prohibitively expensive. Recently multiple annotated human-machine have been released, however their annotation schema varies across different collections, even well-defined categories...
Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these in production system poses significant memory constraints. Existing compression methods are either lossy or introduce latency. We propose a method that leverages low rank matrix factorization during training,to compress word embedding layer which represents size bottleneck most NLP models. Our trained, compressed and then further re-trained on downstream task to recover...
Retraining modern deep learning systems can lead to variations in model performance even when trained using the same data and hyper-parameters by simply different random seeds. We call this phenomenon jitter. This issue is often exacerbated production settings, where models are retrained on noisy data. In work we tackle problem of stable retraining with a focus conversational semantic parsers. first quantify jitter introducing agreement metric showing variation dataset noise sizes. then...
Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders reduce inference latency while maintaining competitive quality. However, major drawback NAR the difficulty generating top-k (i.e., k-best) outputs with such as beam search. To address this challenge, we propose novel...
Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying using pretrained transformer model. We apply binary classification of facial criminal non-criminal classes. The GPT-2 is trained generate text then fine-tuned images. During finetuning process with images, most layers GT-2 are frozen during...
Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization the structure, where row or column order encoded as unwanted bias. Such spurious biases make model vulnerable to and perturbations. Additionally, prior work has not thoroughly modeled structures table-text alignments, hindering ability. In this work, we propose a robust structurally aware encoding architecture TableFormer, tabular structural are...
Radiance Fields (RFs) have shown great potential to represent scenes from casually captured discrete views. Compositing parts or whole of multiple could greatly interest several XR applications. Prior works can generate new views such by tracing each scene in parallel. This increases the render times and memory requirements with number components. In this work, we provide a method create single, compact, fused RF representation for composited using RFs. The has same utilizations as single...
Typical spoken language understanding systems provide narrow semantic parses using a domain-specific ontology. The contain intents and slots that are directly consumed by downstream domain applications. In this work we discuss expanding such to handle compound entities introducing domain-agnostic shallow parser handles linguistic coordination. We show our model for parsing coordination learns domain-independent slot-independent features is able segment conjunct boundaries of many different...
Accurate prediction of conversation topics can be a valuable signal for creating coherent and engaging dialog systems. In this work, we focus on context-aware topic classification methods identifying in free-form human-chatbot dialogs. We extend previous work neural unsupervised keyword detection by incorporating conversational context act features. On annotated data, show that acts leads to relative gains accuracy 35% recall 11% interactions where frequently span multiple utterances....