- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Autism Spectrum Disorder Research
- Topic Modeling
- Video Surveillance and Tracking Methods
- Speech and dialogue systems
- Natural Language Processing Techniques
- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Child Development and Digital Technology
- Computer Graphics and Visualization Techniques
- COVID-19 diagnosis using AI
- Virology and Viral Diseases
- Adversarial Robustness in Machine Learning
- Genetics and Neurodevelopmental Disorders
- Infant Health and Development
- Remote Sensing and LiDAR Applications
- Automated Road and Building Extraction
University of California, Davis
2018-2022
Columbia University
2021
Amazon (Germany)
2020
Group Sense (China)
2019
The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different branches leverage complementary strengths both feature-map and multi-path representation. Our proposed Split-Attention module provides simple modular computation block that can serve as drop-in replacement for popular...
It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise on different network branches to leverage their success in capturing cross-feature interactions learning diverse representations. Our design results simple unified computation block, can be parameterized using only few variables. model, named ResNeSt, outperforms EfficientNet accuracy latency...
We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds. Inspired by Bilateral Convolutional Layers (BCL), we propose DownBCL, UpBCL, and CorrBCL operations restore structural information from unstructured clouds, fuse two consecutive Operating discrete sparse permutohedral lattice points, our architectural design is parsimonious in computational cost. Our model can efficiently process pair of cloud frames...
Video action recognition is one of the representative tasks for video understanding. Over last decade, we have witnessed great advancements in thanks to emergence deep learning. But also encountered new challenges, including modeling long-range temporal information videos, high computation costs, and incomparable results due datasets evaluation protocol variances. In this paper, provide a comprehensive survey over 200 existing papers on learning recognition. We first introduce 17 that...
Starting from the seminal work of Fully Convolutional Networks (FCN), there has been significant progress on semantic segmentation. However, deep learning models often require large amounts pixelwise annotations to train accurate and robust models. Given prohibitively expensive annotation cost segmentation masks, we introduce a self-training framework in this paper leverage pseudo labels generated unlabeled data. In order handle data imbalance problem segmentation, propose centroid sampling...
We present a new algorithm to train robust neural network against adversarial attacks. Our is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve robustness of networks (Liu 2017), we noticed adding noise blindly all layers not optimal way incorporate randomness. Instead, model under framework Bayesian Neural Network (BNN) formally learn posterior distribution models in scalable way. Second, formulate mini-max problem BNN best...
The success of deep neural networks for semantic segmentation heavily relies on large-scale and well-labeled datasets, which are hard to collect in practice. Synthetic data offers an alternative obtain ground-truth labels free. However, models directly trained synthetic often struggle generalize real images. In this paper, we consider transfer learning that aims mitigate the gap between abundant (source domain) limited (target domain). Unlike previous approaches either learn mappings target...
Deep learning usually achieves the best results with complete supervision. In case of semantic segmentation, this means that large amounts pixelwise annotations are required to learn accurate models. paper, we show can obtain state-of-the-art using a semi-supervised approach, specifically self-training paradigm. We first train teacher model on labeled data, and then generate pseudo labels set unlabeled data. Our robust training framework digest human-annotated jointly achieve top...
Semantic segmentation is a challenging problem due to difficulties in modeling context complex scenes and class confusions along boundaries. Most literature either focuses on or boundary refinement, which less generalizable open-world scenarios. In this work, we advocate unified framework (UN-EPT) segment objects by considering both information artifacts. We first adapt sparse sampling strategy incorporate the transformer-based attention mechanism for efficient modeling. addition, separate...
Early diagnosis of Autism Spectrum Disorder (ASD) is crucial for best outcomes to interventions. In this paper, we present a machine learning (ML) approach ASD based on identifying specific behaviors from videos infants ages 6 through 36 months. The interest include directed gaze towards faces or objects interest, positive affect, and vocalization. dataset consists 2000 3-minute duration with these manually coded by expert raters. Moreover, the has statistical features including frequency...
Signs of autism spectrum disorder (ASD) emerge in the first year life many children, but diagnosis is typically made much later, at an average age 4 years United States. Early intervention highly effective for young children with ASD, reserved a formal diagnosis, making accurate identification as early possible imperative. A screening tool that could identify ASD risk during infancy offers opportunity before full set symptoms present. In this paper, we propose two machine learning methods,...
Jing Gu, Qingyang Wu, Chongruo Weiyan Shi, Zhou Yu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021.
The recent success of large pre-trained language models such as BERT and GPT-2 has suggested the effectiveness incorporating priors in downstream dialog generation tasks. However, performance on task is not optimal expected. In this paper, we propose a Pre-trained Role Alternating Language model (PRAL), designed specifically for task-oriented conversational systems. We adopted (Wu et al., 2019) that two speakers separately. also design several techniques, start position randomization,...
We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds. Inspired by Bilateral Convolutional Layers (BCL), we propose DownBCL, UpBCL, and CorrBCL operations restore structural information from unstructured clouds, fuse two consecutive Operating discrete sparse permutohedral lattice points, our architectural design is parsimonious in computational cost. Our model can efficiently process pair of cloud frames...
Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception object association across video frames are critical. While the Segment Anything Model (SAM) has excelled static image segmentation, extending its capabilities to poses significant challenges. We tackle two major hurdles: a) SAM's embedding limitations associating objects frames, b) granularity inconsistencies segmentation. To this end, we introduce...
Semantic segmentation is a challenging problem due to difficulties in modeling context complex scenes and class confusions along boundaries. Most literature either focuses on or boundary refinement, which less generalizable open-world scenarios. In this work, we advocate unified framework(UN-EPT) segment objects by considering both information artifacts. We first adapt sparse sampling strategy incorporate the transformer-based attention mechanism for efficient modeling. addition, separate...
Crossword puzzles are popular word games that require not only a large vocabulary, but also broad knowledge of topics. Answering each clue is natural language task on its own as many clues contain nuances, puns, or counter-intuitive definitions. Additionally, it can be extremely difficult to ascertain definitive answers without the constraints crossword grid itself. This challenging for both humans and computers. We describe here new solving system, Cruciform. employ group components, which...