- Time Series Analysis and Forecasting
- Forecasting Techniques and Applications
- Stock Market Forecasting Methods
- Sparse and Compressive Sensing Techniques
- Advanced Bandit Algorithms Research
- Energy Load and Power Forecasting
- Functional Brain Connectivity Studies
- Reinforcement Learning in Robotics
- Topic Modeling
- Face and Expression Recognition
- Stochastic Gradient Optimization Techniques
- Statistical Methods and Inference
- Speech Recognition and Synthesis
- Cancer-related molecular mechanisms research
- Natural Language Processing Techniques
- Explainable Artificial Intelligence (XAI)
- Machine Learning and ELM
- Neural Networks and Applications
- Bayesian Methods and Mixture Models
- Distributed and Parallel Computing Systems
- Metabolomics and Mass Spectrometry Studies
- Domain Adaptation and Few-Shot Learning
- Gaussian Processes and Bayesian Inference
- Robotics and Sensor-Based Localization
- Data Stream Mining Techniques
Stanford University
2017-2020
Nune Eye Hospital
2014
Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics such data, it essential understand relationships between different entities how these evolve over time. this paper, we introduce time-varying graphical lasso (TVGL), method inferring networks from raw time series data. We cast problem in terms estimating sparse...
Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semi-structured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback. We present ProxSparse, a learning-based framework mask selection enabled by...
Supervised fine-tuning is a standard method for adapting pre-trained large language models (LLMs) to downstream tasks. Quantization has been recently studied as post-training technique efficient LLM deployment. To obtain quantized fine-tuned LLMs, conventional pipelines would first fine-tune the models, followed by quantization. This often yields suboptimal performance it fails leverage synergy between and effectively realize low-bit quantization of weights, activations, KV caches in we...
This paper proposes an adaptive metric selection strategy called diagonal Barzilai-Borwein (DBB) stepsize for the popular Variable Metric Proximal Gradient (VM-PG) algorithm [1], [2]. The proposed approach better captures local geometry of problem while keeping per-step computation cost similar to widely used scalar (BB) stepsize. We provide theoretical convergence analysis VM-PG using DBB Finally, our empirical results show ~10 - 40 % improvement in times compared BB different machine...
Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting recent years. In addition to learning attention time domain, works also explore frequency domains (e.g., Fourier wavelet domain), given that seasonal patterns can be better captured these domains. this work, we seek understand the relationships between different Theoretically, show are equivalent under linear conditions (i.e., kernel scores). Empirically, analyze how...
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community witnessed number applications, based on those models. Such applications include question answer, customer services, image video generation, code completions, among others. However, as the model parameters reaches to hundreds billions, their deployment incurs prohibitive inference costs high...
Foundation models such as ChatGPT and GPT-4 have garnered significant interest from both academia industry due to their emergent capabilities, few-shot prompting, multi-step reasoning, instruction following, model calibration. Such capabilities were previously only attainable with specially designed models, those using knowledge graphs, but can now be achieved on a much larger scale foundation models. As the of increased, so too sizes at rate faster than Moore's law. For example, BERT large...
Recently, deep neural networks have gained increasing popularity in the field of time series forecasting. A primary reason for their success is ability to effectively capture complex temporal dynamics across multiple related series. The advantages these forecasters only start emerge presence a sufficient amount data. This poses challenge typical forecasting problems practice, where there limited number or observations per series, both. To cope with this data scarcity issue, we propose novel...
Quantile regression is an effective technique to quantify uncertainty, fit challenging underlying distributions, and often provide full probabilistic predictions through joint learnings over multiple quantile levels. A common drawback of these regressions, however, \textit{quantile crossing}, which violates the desirable monotone property conditional function. In this work, we propose Incremental (Spline) Functions I(S)QF, a flexible efficient distribution-free estimation framework that...
To determine if short term effects of intravitreal anti-vascular endothelial growth factor or steroid injection are correlated with fluid turbidity, as detected by spectral domain optical coherence tomography (SD-OCT) in diabetic macular edema (DME) patients.A total 583 medical records were reviewed and 104 cases enrolled. Sixty eyes received a single bevacizumab (IVB) on the first attack DME 44 triamcinolone acetonide treatment (IVTA). Intraretinal turbidity patients was estimated initial...
We propose Multivariate Quantile Function Forecaster (MQF$^2$), a global probabilistic forecasting method constructed using multivariate quantile function and investigate its application to multi-horizon forecasting. Prior approaches are either autoregressive, implicitly capturing the dependency structure across time but exhibiting error accumulation with increasing forecast horizons, or sequence-to-sequence models, which do not exhibit accumulation, also typically model steps. MQF$^2$...
Probabilistic time series forecasting has played critical role in decision-making processes due to its capability quantify uncertainties. Deep models, however, could be prone input perturbations, and the notion of such together with that robustness, not even been completely established regime probabilistic forecasting. In this work, we propose a framework for robust First, generalize concept adversarial based on which formulate robustness terms bounded Wasserstein deviation. Then extend...
Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation ZO-SGD, been shown consistently outperform zero-shot and in-context learning when combined with suitable task prompts. In this work, we couple ZO variance...
Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution low-precision representation but troubled loss in numerical accuracy unstable rendering model less useful. We argue that floating points can perform well provided error properly compensated at critical locations process. propose Collage which utilizes multi-component float to accurately operations with errors accounted. To understand impact of imprecision training, we a simple novel...
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior algorithms for diffusion model prior. key idea to sample from chain of conditional posteriors, one each stage reverse process, which are estimated closed form Our approximations motivated by prior, and inherit its simplicity...
How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what effective discrete vocabulary real-valued sequential input? To address question, we WaveToken, wavelet-based tokenizer that allows learn complex representations directly the space of time-localized frequencies. Our method first scales and decomposes input series, then thresholds quantizes wavelet coefficients, finally pre-trains...
The world is not static: This causes real-world time series to change over through external, and potentially disruptive, events such as macroeconomic cycles or the COVID-19 pandemic. We present an adaptive sampling strategy that selects part of history relevant for forecasting. achieve this by learning a discrete distribution steps Bayesian optimization. instantiate idea with two-step method pre-trained uniform then training lightweight architecture sampling. show synthetic experiments...
Modern recommender systems usually include separate recommendation carousels such as 'trending now' to list trending items and further boost their popularity, thereby attracting active users. Though widely useful, typically generate item lists based on simple heuristics, e.g., the number of interactions within a time interval, therefore still leave much room for improvement. This paper aims systematically study this under-explored but important problem from new perspective series...
No abstract available.