- Statistical Methods and Inference
- Bayesian Methods and Mixture Models
- Statistics Education and Methodologies
- Speech and Audio Processing
- Statistical Methods and Bayesian Inference
- Data Analysis with R
- Computational Physics and Python Applications
- Gaussian Processes and Bayesian Inference
- Speech Recognition and Synthesis
- Blind Source Separation Techniques
- Sparse and Compressive Sensing Techniques
- Imbalanced Data Classification Techniques
- Image and Signal Denoising Methods
- Heavy metals in environment
- Obesity, Physical Activity, Diet
- Smoking Behavior and Cessation
- Music and Audio Processing
- COVID-19 Clinical Research Studies
- Innovations in Educational Methods
- Meta-analysis and systematic reviews
- Advanced Statistical Process Monitoring
- PARP inhibition in cancer therapy
- Photoacoustic and Ultrasonic Imaging
- Time Series Analysis and Forecasting
- Vitamin C and Antioxidants Research
Stanford University
2013-2025
California Polytechnic State University
2015-2024
Cal Poly Corporation
2017-2022
Automotive Fuel Cell Cooperation (Canada)
2017
University of California, Berkeley
2016
Intel (United States)
2014
We develop a general approach to valid inference after model selection. At the core of our framework is result that characterizes distribution post-selection estimator conditioned on selection event. specialize by lasso form confidence intervals for selected coefficients and test whether all relevant variables have been included in model.
To perform inference after model selection, we propose controlling the selective type I error; i.e., error rate of a test given that it was performed. By doing so, recover long-run frequency properties among selected hypotheses analogous to those apply in classical (non-adaptive) context. Our proposal is closely related data splitting and has similar intuitive justification, but more powerful. Exploiting theory Lehmann Scheffé (1955), derive most powerful unbiased tests confidence intervals...
Non-negative matrix factorization (NMF) is a popular method for learning interpretable features from non-negative data, such as counts or magnitudes. Different cost functions are used with NMF in different applications. We develop an algorithm, based on the alternating direction of multipliers, that tackles problems whose function beta-divergence, broad class divergence functions. derive simple, closed-form updates most commonly beta-divergences. demonstrate experimentally this algorithm has...
Background Evidence of racial/ethnic inequalities in tobacco outlet density is limited by: (1) reliance on studies from single counties or states, (2) attention to spatial dependence, and (3) an unclear theory-based relationship between neighbourhood composition density. Methods In 97 the contiguous USA, we calculated 2012 likely outlets (N=90 407), defined as per 1000 population census tracts (n=17 667). We used 2 regression techniques, a errors approach GeoDa software fitting covariance...
Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult obtain. This limits the practical applicability these algorithms. We examine problem efficiently utilizing general data in absence specific examples. Specifically, we propose a method learn universal speech model from corpus show how use this separate...
Retail marketing surveillance research highlights concerns about lower priced cigarettes in neighborhoods with a higher proportion of racial/ethnic minorities but focuses almost exclusively on premium brands. To remedy this gap the literature, current study examines neighborhood variation prices for cheapest and popular brand cigarillos large statewide sample licensed tobacco retailers low-tax state.All 61 local health departments California trained data collectors to conduct observations...
Voice activity detection (VAD) in the presence of heavy, nonstationary noise is a challenging problem that has attracted attention recent years. Most modern VAD systems require training on highly specialized data: either labeled mixtures speech and are matched to application, or, at very least, data similar encountered application. Because obtaining can be laborious task practical applications, it desirable for voice detector able perform well any type without need data. In this paper, we...
Purpose – The purpose of this paper is to provide an example Lean Six Sigma (LSS) application in research and development (R&D) organizations eliminate waste improve systems based on available data that turn improves the innovative environment. Manufacturing R&D involves designing testing concepts taking them into high-volume manufacturing. infrastructure associated with such experimental manufacturing lines ability evaluate result under statistical process control configuration...
Feedback has a powerful influence on learning, but it is also expensive to provide. In large classes may even be impossible for instructors provide individualized feedback. Peer assessment one way personalized feedback that scales classes. Besides these obvious logistical benefits, been conjectured students learn from the practice of peer assessment. However, this never conclusively demonstrated. Using an online educational platform we developed, conducted in-class matched-set, randomized...
Bandwidth extension is the problem of recovering missing bandwidth in audio signals that have been band-passed, typically for compression purposes. One approach has shown to be successful non-negative matrix factorization (NMF). The disadvantage NMF it non-convex and intractable solve general. However, extension, only reconstruction needed not explicit factors. We formulate as a convex optimization problem, propose simple algorithm, demonstrate effectiveness this on practical examples.
The problem of recovering a signal from the magnitude its short-time Fourier transform (STFT) is longstanding one in audio processing. Existing approaches rely on heuristics that often perform poorly because nonconvexity problem. We introduce formulation lends itself to tractable convex program. observe our method yields better reconstructions than standard Griffin-Lim algorithm. provide an algorithm and discuss practical implementation details, including how can be scaled up larger examples.
Formulae display:?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax order to improve their display. Uncheck the box turn off. This feature requires Javascript. Click on a formula zoom.
Abstract Introduction US college students smoke hookah and vape nicotine at higher rates than other young adults. Density and/or proximity of lounges shops near colleges has been described, but this study is the first to test whether tobacco retailers spatially cluster campuses. Aims Methods We created linked spatial shapefiles for community 4-year in California with lists lounges, shops, licensed retailers. simulated 100 datasets, placing randomly census tracts proportion population...
Simulation is an effective tool for analyzing probability models as well facilitating understanding of concepts in and statistics. Unfortunately, implementing a simulation from scratch often requires users to think about programming issues that are not relevant the itself. We have developed Python package called Symbulate (https://github.com/dlsun/symbulate) which provides user friendly framework conducting simulations involving models. The syntax reflects "language probability" makes it...
Background: Coronavirus Disease 2019 (COVID-19) has no known specific treatments. However, there might be in vitro and early clinical data as well evidence from Severe Acute Respiratory Syndrome Middle Eastern that could inform clinicians researchers. This systematic review aims to create priorities for future research of drugs repurposed COVID-19. Methods: will include vitro, animal, studies evaluating the efficacy a list 34 compounds four groups identified previous scoping review. Studies...
One of the most attractive features R is its linear modeling capabilities. We describe a Python package, salmon, that brings best R's functionality to in Pythonic way - by providing composable objects for specifying and fitting models. This object-oriented design also enables other enhance easeof-use, such as automatic visualizations intelligent model building.
Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI
Abstract Background Coronavirus disease 2019 (COVID-19) has no confirmed specific treatments. However, there might be in vitro and early clinical data as well evidence from severe acute respiratory syndrome Middle Eastern that could inform clinicians researchers. This systematic review aims to create priorities for future research of drugs repurposed COVID-19. Methods will include vitro, animal, studies evaluating the efficacy a list 34 compounds 4 groups identified previous scoping review....
Lead (Pb) is one of the most common heavy metal urban soil contaminants with well-known toxicity to humans. This incubation study (2–159 d) compared ability bone meal (BM), potassium hydrogen phosphate (KP), and triple superphosphate (TSP), at phosphorus:lead (P:Pb) molar ratios 7.5:1, 15:1, 22.5:1, reduce bioaccessible Pb in contaminated by Pb-based paint relative control which no P amendment was added. Soil pH Mehlich 3 were measured as a function time amount type amendment. XAS assessed...
We demonstrate how data fission, a method for creating synthetic replicates from single observations, can be applied to empirical Bayes estimation. This extends recent work on with multiple the classical single-replicate setting. The key insight is that after estimation cast as general regression problem.