- Advanced Data Storage Technologies
- Parallel Computing and Optimization Techniques
- Algorithms and Data Compression
- Distributed and Parallel Computing Systems
- Music and Audio Processing
- Advanced Database Systems and Queries
- Speech and Audio Processing
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Language, Metaphor, and Cognition
- Natural Language Processing Techniques
- Subtitles and Audiovisual Media
- Multisensory perception and integration
- Cloud Computing and Resource Management
- Distributed systems and fault tolerance
- Face recognition and analysis
- Image and Signal Denoising Methods
- Video Analysis and Summarization
- Digital Games and Media
- Internet Traffic Analysis and Secure E-voting
- Network Security and Intrusion Detection
- Human Pose and Action Recognition
- Web Data Mining and Analysis
- Indoor and Outdoor Localization Technologies
- Simulation Techniques and Applications
META Health
2022-2024
Peking University
2022-2024
University of Illinois Chicago
2021
University of Chicago
2016-2021
Florida International University
2014
Clarkson University
2013
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of dailylife activity spanning hundreds scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations 9 different countries. The approach to collection is designed uphold rigorous privacy ethics standards, with consenting participants robust de-identification procedures where relevant. Ego4D dramatically expands the volume...
Virtual machine (VM) live migration is a critical feature for managing virtualized environments, enabling dynamic load balancing, consolidation power management, preparation planned maintenance, and other management features. However, not all virtual created equal. Variants include memory migration, which relies on shared backend storage between the source destination of migrates state as well state. We have developed an automated testing framework that measures important performance...
Augmented reality devices have the potential to enhance human perception and enable other assistive functionalities in complex conversational environments. Effectively capturing audio-visual context necessary for understanding these social interactions first requires detecting localizing voice activities of device wearer surrounding people. These tasks are challenging due their egocentric nature: wearer's head motion may cause blur, people appear difficult viewing angles, there be...
Video summarization has recently engaged increasing attention in computer vision communities. However, the scarcity of annotated data been a key obstacle this task. To address it, work explores new solution for video by transferring samples from correlated task (i.e., moment localization) equipped with abundant training data. Our main insight is that moments also indicate semantic highlights video, essentially similar to summary. Approximately, summary can be treated as sparse,...
Modern data-intensive applications often generate large amounts of low precision float data with a limited range values. Despite the prevalence such data, there is lack an effective solution to ingest, store, and analyze bounded, low-precision, numeric data. To address this gap, we propose Buff, new compression technique that uses decomposed columnar storage encoding methods provide compression, fast ingestion, high-speed in-situ adaptive query operators SIMD support.
Columnar databases rely on specialized encoding schemes to reduce storage requirements. These encodings also enable efficient in-situ data processing. Nevertheless, many existing columnar are encoding-oblivious. When storing the data, these systems a global understanding of dataset or types derive simple rules for selection. Such rule-based selection leads unsatisfactory performance. Specifically, when performing queries, always decode into memory, ignoring possibility optimizing access...
Augmented Reality (AR) as a platform has the potential to facilitate reduction of cocktail party effect. Future AR headsets could potentially leverage information from an array sensors spanning many different modalities. Training and testing signal processing machine learning algorithms on tasks such beam-forming speech enhancement require high quality representative data. To best author's knowledge, publication there are no available datasets that contain synchronized egocentric...
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity spanning hundreds scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations 9 different countries. The approach to collection is designed uphold rigorous privacy ethics standards, with consenting participants robust de-identification procedures where relevant. Ego4D dramatically expands the volume...
The ability to accurately determine the geographic location of an arbitrary IP address has potential in many applications. Previous methods based on observing relationship between network delay and physical distance are inaccurate. Methods similarity more accurate, but inefficient because they need information a large number landmark nodes near destination be collected maintained. We propose method that can overcome both problems. Our maintains stable collection observers covers target area....
We propose PIDS, Pattern Inference Decomposed Storage, an innovative storage method for decomposing string attributes in columnar stores. Using unsupervised approach, PIDS identifies common patterns from relational databases, and uses the discovered pattern to split each attribute into sub-attributes. First, by storing encoding sub-attribute individually, can achieve a compression ratio comparable Snappy Gzip. Second, attribute, push down many query operators sub-attributes, thereby...
Dictionary encoding, or domain is an important form of compression that uses a bijective mapping to replace attributes from large (i.e. strings) with finite 32 bit integers). This encoding both reduces data storage and allows for more efficient query execution. Traditional dictionary only supports equality queries, while range queries require encoded values are decoded evaluating the predicates. An order preserving without decoding by ensuring keys follow same as in dictionary. While this...
In columnar databases, data is generally stored in an encoded format to save storage space and reduce I/O. Popular encoding schemes include dictionary encoding, delta run-length bit-packed encoding. many open-source formats, performing queries on requires the be first decoded memory, which time-consuming. this paper, we design several novel SIMD-based algorithms speed up query execution data. Our use SIMD vectorize skip unnecessary decoding for higher efficiency, achieving a throughput of...
Scalable Simulation Framework (SSF), a parallel simulation application programming interface (API) for large-scale discrete-event models, has been widely adopted in many areas. This paper presents simplified and yet more streamlined implementation, called MiniSSF. MiniSSF maintains the core design concept of SSF, while removing some complex but rarely used features, sake efficiency. It also introduces several new features that can greatly simplify model development efforts and/or improve...