- Topic Modeling
- Natural Language Processing Techniques
- Speech and dialogue systems
- Speech Recognition and Synthesis
- Text Readability and Simplification
- Caching and Content Delivery
- Machine Learning in Healthcare
- Multimodal Machine Learning Applications
- QR Code Applications and Technologies
- IPv6, Mobility, Handover, Networks, Security
- Handwritten Text Recognition Techniques
META Health
2024
Amazon (United States)
2021
We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At core of Lumos is a Scene Text Recognition (STR) component that extracts from person point-of-view images, output which used to augment input Multimodal Large Language Model (MM-LLM). While building we encountered numerous challenges related STR quality, overall latency, and model inference. In this paper, delve into those challenges, discuss architecture, design choices,...
Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive cross utterance contextual cues play an important role disambiguating specific content words from speech. In this paper, we investigate various techniques to improve contextualization, word adaptation of a Transformer-XL neural language model (NLM) rescore ASR N-best hypotheses....
Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models NLMs that use limited context. In this paper, we investigate various techniques incorporate turn based history into recurrent (LSTM) Transformer-XL NLMs. For NLMs, explore carry over mechanism feature augmentation, where other forms of contextual information such as bot response system dialogue acts classified...
In recent years, Federated Learning (FL) has shown significant advancements in its ability to perform various natural language processing (NLP) tasks. This work focuses on applying personalized FL for on-device modeling. Due limitations of memory and latency, these models cannot support the complexity sub-word tokenization or beam search decoding, resulting decision deploy a closed-vocabulary model. However, are unable handle out-of-vocabulary (OOV) words belonging specific users. To address...
We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At core of Lumos is a Scene Text Recognition (STR) component that extracts from person point-of-view images, output which used to augment input Multimodal Large Language Model (MM-LLM). While building we encountered numerous challenges related STR quality, overall latency, and model inference. In this paper, delve into those challenges, discuss architecture, design choices,...
QR codes have become ubiquitous in daily life, enabling rapid information exchange. With the increasing adoption of smart wearable devices, there is a need for efficient, and friction-less code reading capabilities from Egocentric point-of-views. However, adapting existing phone-based readers to egocentric images poses significant challenges. Code bring unique challenges such as wide field-of-view, distortion lack visual feedback compared phones where users can adjust position framing....
In recent years, Federated Learning (FL) has shown significant advancements in its ability to perform various natural language processing (NLP) tasks. This work focuses on applying personalized FL for on-device modeling. Due limitations of memory and latency, these models cannot support the complexity sub-word tokenization or beam search decoding, resulting decision deploy a closed-vocabulary model. However, are unable handle out-of-vocabulary (OOV) words belonging specific users. To address...
Current research in zero-shot translation is plagued by several issues such as high compute requirements, increased training time and off target translations. Proposed remedies often come at the cost of additional data or requirements. Pivot based neural machine preferred over a single-encoder model for most settings despite evaluation time. In this work, we overcome shortcomings taking advantage transliteration linguistic similarity. We build single encoder-decoder system...