- Algorithms and Data Compression
- Cellular Automata and Applications
- DNA and Biological Computing
- Advanced Data Compression Techniques
- Genomics and Phylogenetic Studies
- Advanced Image Processing Techniques
- Advanced Data Storage Technologies
- Video Coding and Compression Technologies
- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Markov Chains and Monte Carlo Methods
- Image Enhancement Techniques
- Time Series Analysis and Forecasting
- Music and Audio Processing
- Image and Video Quality Assessment
- Advanced biosensing and bioanalysis techniques
- RNA and protein synthesis mechanisms
- Wireless Communication Security Techniques
- Topic Modeling
- Visual Attention and Saliency Detection
- Error Correcting Code Techniques
- Image Retrieval and Classification Techniques
- Video Surveillance and Tracking Methods
- Parallel Computing and Optimization Techniques
Stanford University
2016-2023
West Africa Vocational Education
2021-2022
University of California, Berkeley
2019
Tata Institute of Fundamental Research
2019
Ford Motor Company (United States)
2019
Arizona State University
2019
While learned video codecs have demonstrated great promise, they yet to achieve sufficient efficiency for practical deployment. In this work, we propose several novel ideas compression which allow improved performance the low-latency mode (I- and P-frames only) along with a considerable increase in computational efficiency. setting, natural videos our approach compares favorably across entire R-D curve under metrics PSNR, MS-SSIM VMAF against all mainstream standards (H.264, H.265, AV1) ML...
We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. introduce a new mechanism, execution guidance, to leverage semantics SQL. It detects and excludes faulty programs during decoding procedure by conditioning on partially generated program. The mechanism can be used with any autoregressive generative model, we demonstrate four state-of-the-art recurrent or template-based parsing models. that guidance universally improves...
Sequential data is being generated at an unprecedented pace in various forms, including text and genomic data. This creates the need for efficient compression mechanisms to enable better storage, transmission processing of such To solve this problem, many existing compressors attempt learn models perform prediction-based compression. Since neural networks are known as universal function approximators with capability arbitrarily complex mappings, practice show excellent performance prediction...
Abstract Motivation High-Throughput Sequencing technologies produce huge amounts of data in the form short genomic reads, associated quality values and read identifiers. Because significant structure present these FASTQ datasets, general-purpose compressors are unable to completely exploit much inherent redundancy. Although there has been a lot work on designing compressors, most them lack support one or more crucial properties, such as for variable length scalability high coverage...
We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills slots of query with feasible actions from pre-defined inventory. To account fact typically there are multiple correct queries same or very similar semantics, we draw inspiration syntactic techniques and propose train our models non-deterministic oracles. evaluate on WikiSQL dataset achieve an execution accuracy 83.7% test set, 2.1% absolute improvement over trained traditional...
With the amount of data being stored increasing rapidly, there is significant interest in exploring alternative storage technologies. In this context, DNA-based systems can offer significantly higher densities (petabytes/gram) and durability (thousands years) than current Specifically, DNA has been found to be stable over extended periods time which demonstrated analysis organisms long since extinct. Recent advances sequencing synthesis pipelines have made a promising candidate for...
For reliable transmission across a noisy communication channel, classical results from information theory show that it is asymptotically optimal to separate out the source and channel coding processes. However, this decomposition can fall short in finite bit-length regime, as requires non-trivial tuning of hand-crafted codes assumes infinite computational power for decoding. In work, we propose jointly learn encoding decoding processes using new discrete variational autoencoder model. By...
As magnetization and semiconductor based storage technologies approach their limits, bio-molecules, such as DNA, have been identified promising media for future systems, due to high density (petabytes/gram) long-term durability (thousands of years). Furthermore, nanopore DNA sequencing enables high-throughput using devices small a USB thumb drive thus is ideally suited applications. Due the insertion/deletion error rates associated with base-called reads, current approaches rely heavily on...
We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate model for the input leads to substantial improvements in compression. propose DZip, a general-purpose compressor sequential that exploits well-known capabilities of neural networks (NNs) prediction, arithmetic coding. DZip uses novel hybrid architecture adaptive and semi-adaptive training. Unlike most NN-based compressors, does not require additional training is...
Abstract Motivation New Generation Sequencing (NGS) technologies for genome sequencing produce large amounts of short genomic reads per experiment, which are highly redundant and compressible. However, general-purpose compressors unable to exploit this redundancy due the special structure present in data. Results We a new algorithm compressing both with without preserving read order. In cases, it achieves 1.4×–2× compression gain over state-of-the-art tools datasets containing as many 3...
Time series data compression is emerging as an important problem with the growth in IoT devices and sensors. Due to presence of noise these datasets, lossy can often provide significant gains without impacting performance downstream applications. In this work, we propose error-bounded compressor, LFZip, for multivariate floating-point time that provides guaranteed reconstruction up user-specified maximum absolute error. The compressor based on prediction-quantization-entropy coder framework...
The storage of data in DNA typically involves encoding and synthesizing into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption synthesized DNA, basecalling errors, limitations scaling up read operations for individual elements. Addressing these challenges, we describe system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive efficient readouts targeted files nanopore-based sequencing. By...
Abstract Motivation The dramatic decrease in the cost of sequencing has resulted generation huge amounts genomic data, as evidenced by projects such UK10K and Million Veteran Project, with number sequenced genomes ranging order 10 K to 1 M. Due large redundancies among sequences individuals from same species, most medical research deals variants compared a reference sequence, rather than complete sequences. Consequently, millions represented are stored databases. These databases constantly...
In new applications of data compression, it is desired to have random access any block the compressed dataset (without need decompress entire sequence and thus accessing all stored bits in memory). this work, we analyze problem universal compression with access. Building on work Mazumdar, Chandar, Wornell (2015), discuss a systematic scheme achieve close optimal finite We first performance for i.i.d sources. Using gained intuition, more general class Markov sources, show existence schemes....
For any Markov source, there exist universal codes whose normalized codelength approaches the Shannon limit asymptotically as number of samples goes to infinity. This paper investigates how fast gap between "best" compressor and (i.e. compression redundancy) vanishes non-asymptotically in terms alphabet size mixing time source. We show that, for sources relaxation is at least 1+ [((2+c))/(√k)], where k state space (and c > 0 a constant), phase transition required achieve vanishing redundancy...
We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate model for the input leads to substantial improvements in compression. propose DZip, a general-purpose compressor sequential that exploits well-known capabilities of neural networks (NNs) prediction, arithmetic coding. DZip uses novel hybrid architecture adaptive and semi-adaptive training. Unlike most NN compressors, does not require additional training is restricted...
Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery modified base detection than second generation technologies. The process generates huge amount of data in the form raw signal contained fast5 files, which must be compressed enable efficient storage transfer. Since is inherently noisy, lossy compression has potential significantly reduce space requirements without adversely impacting performance...
Abstract With the amount of data being stored increasing rapidly, there is significant interest in exploring alternative storage technologies. In this context, DNA-based systems can offer significantly higher densities (petabytes/gram) and durability (thousands years) than current Specifically, DNA has been found to be stable over extended periods time which demonstrated analysis organisms long since extinct. Recent advances sequencing synthesis pipelines have made a promising candidate for...
ABSTRACT As magnetization and semiconductor based storage technologies approach their limits, bio-molecules, such as DNA, have been identified promising media for future systems, due to high density (petabytes/gram) long-term durability (thousands of years). Furthermore, nanopore DNA sequencing enables high-throughput using devices small a USB thumb drive thus is ideally suited applications. Due the insertion/deletion error rates associated with basecalled reads, current approaches rely...
COVID-19 has made video communication one of the most important modes information exchange. While extensive research been conducted on optimization streaming pipeline, in particular development novel codecs, further improvement quality and latency is required, especially under poor network conditions. This paper proposes an alternative to conventional codec through implementation a keypoint-centric encoder relying transmission keypoint from within feed, as shown Figure 1. The decoder uses...
The deletion channel is known to be a notoriously diffcult design error-correction codes for. In spite of this difficulty, there are some beautiful code constructions which give intuition about the and what good look like. tutorial we will take at them. This document transcript my talk coding theory reading group on interesting works channel. It not intended an exhaustive survey channel, but more as important cute ideas in area. For comprehensive survey, refer reader cited sources surveys....
Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, it is not well understood what function might be most appropriate for human perception. Furthermore, availability massive public datasets appears to have hardly exploited compression. In this work, we perform experiments which one describes images another, using publicly available and text instructions. These reconstructions are rated by scorers on Amazon...