Youngsuk Park

ORCID: 0000-0002-0970-9214
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Time Series Analysis and Forecasting
  • Forecasting Techniques and Applications
  • Stock Market Forecasting Methods
  • Sparse and Compressive Sensing Techniques
  • Advanced Bandit Algorithms Research
  • Energy Load and Power Forecasting
  • Functional Brain Connectivity Studies
  • Reinforcement Learning in Robotics
  • Topic Modeling
  • Face and Expression Recognition
  • Stochastic Gradient Optimization Techniques
  • Statistical Methods and Inference
  • Speech Recognition and Synthesis
  • Cancer-related molecular mechanisms research
  • Natural Language Processing Techniques
  • Explainable Artificial Intelligence (XAI)
  • Machine Learning and ELM
  • Neural Networks and Applications
  • Bayesian Methods and Mixture Models
  • Distributed and Parallel Computing Systems
  • Metabolomics and Mass Spectrometry Studies
  • Domain Adaptation and Few-Shot Learning
  • Gaussian Processes and Bayesian Inference
  • Robotics and Sensor-Based Localization
  • Data Stream Mining Techniques

Stanford University
2017-2020

Nune Eye Hospital
2014

Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics such data, it essential understand relationships between different entities how these evolve over time. this paper, we introduce time-varying graphical lasso (TVGL), method inferring networks from raw time series data. We cast problem in terms estimating sparse...

10.1145/3097983.3098037 article EN 2017-08-04

Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semi-structured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback. We present ProxSparse, a learning-based framework mask selection enabled by...

10.48550/arxiv.2502.00258 preprint EN arXiv (Cornell University) 2025-01-31

Supervised fine-tuning is a standard method for adapting pre-trained large language models (LLMs) to downstream tasks. Quantization has been recently studied as post-training technique efficient LLM deployment. To obtain quantized fine-tuned LLMs, conventional pipelines would first fine-tune the models, followed by quantization. This often yields suboptimal performance it fails leverage synergy between and effectively realize low-bit quantization of weights, activations, KV caches in we...

10.48550/arxiv.2502.09003 preprint EN arXiv (Cornell University) 2025-02-13

This paper proposes an adaptive metric selection strategy called diagonal Barzilai-Borwein (DBB) stepsize for the popular Variable Metric Proximal Gradient (VM-PG) algorithm [1], [2]. The proposed approach better captures local geometry of problem while keeping per-step computation cost similar to widely used scalar (BB) stepsize. We provide theoretical convergence analysis VM-PG using DBB Finally, our empirical results show ~10 - 40 % improvement in times compared BB different machine...

10.1109/icassp40776.2020.9054193 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting recent years. In addition to learning attention time domain, works also explore frequency domains (e.g., Fourier wavelet domain), given that seasonal patterns can be better captured these domains. this work, we seek understand the relationships between different Theoretically, show are equivalent under linear conditions (i.e., kernel scores). Empirically, analyze how...

10.48550/arxiv.2212.08151 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community witnessed number applications, based on those models. Such applications include question answer, customer services, image video generation, code completions, among others. However, as the model parameters reaches to hundreds billions, their deployment incurs prohibitive inference costs high...

10.1145/3637528.3671465 article EN cc-by Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

Foundation models such as ChatGPT and GPT-4 have garnered significant interest from both academia industry due to their emergent capabilities, few-shot prompting, multi-step reasoning, instruction following, model calibration. Such capabilities were previously only attainable with specially designed models, those using knowledge graphs, but can now be achieved on a much larger scale foundation models. As the of increased, so too sizes at rate faster than Moore's law. For example, BERT large...

10.1145/3580305.3599573 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Recently, deep neural networks have gained increasing popularity in the field of time series forecasting. A primary reason for their success is ability to effectively capture complex temporal dynamics across multiple related series. The advantages these forecasters only start emerge presence a sufficient amount data. This poses challenge typical forecasting problems practice, where there limited number or observations per series, both. To cope with this data scarcity issue, we propose novel...

10.48550/arxiv.2102.06828 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

Quantile regression is an effective technique to quantify uncertainty, fit challenging underlying distributions, and often provide full probabilistic predictions through joint learnings over multiple quantile levels. A common drawback of these regressions, however, \textit{quantile crossing}, which violates the desirable monotone property conditional function. In this work, we propose Incremental (Spline) Functions I(S)QF, a flexible efficient distribution-free estimation framework that...

10.48550/arxiv.2111.06581 preprint EN cc-by-nc-nd arXiv (Cornell University) 2021-01-01

To determine if short term effects of intravitreal anti-vascular endothelial growth factor or steroid injection are correlated with fluid turbidity, as detected by spectral domain optical coherence tomography (SD-OCT) in diabetic macular edema (DME) patients.A total 583 medical records were reviewed and 104 cases enrolled. Sixty eyes received a single bevacizumab (IVB) on the first attack DME 44 triamcinolone acetonide treatment (IVTA). Intraretinal turbidity patients was estimated initial...

10.3341/kjo.2014.28.4.298 article EN cc-by-nc Korean Journal of Ophthalmology 2014-01-01

We propose Multivariate Quantile Function Forecaster (MQF$^2$), a global probabilistic forecasting method constructed using multivariate quantile function and investigate its application to multi-horizon forecasting. Prior approaches are either autoregressive, implicitly capturing the dependency structure across time but exhibiting error accumulation with increasing forecast horizons, or sequence-to-sequence models, which do not exhibit accumulation, also typically model steps. MQF$^2$...

10.48550/arxiv.2202.11316 preprint EN cc-by arXiv (Cornell University) 2022-01-01

10.1007/s11590-019-01520-y article EN Optimization Letters 2020-01-04

Probabilistic time series forecasting has played critical role in decision-making processes due to its capability quantify uncertainties. Deep models, however, could be prone input perturbations, and the notion of such together with that robustness, not even been completely established regime probabilistic forecasting. In this work, we propose a framework for robust First, generalize concept adversarial based on which formulate robustness terms bounded Wasserstein deviation. Then extend...

10.48550/arxiv.2202.11910 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation ZO-SGD, been shown consistently outperform zero-shot and in-context learning when combined with suitable task prompts. In this work, we couple ZO variance...

10.48550/arxiv.2404.08080 preprint EN arXiv (Cornell University) 2024-04-11

Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution low-precision representation but troubled loss in numerical accuracy unstable rendering model less useful. We argue that floating points can perform well provided error properly compensated at critical locations process. propose Collage which utilizes multi-component float to accurately operations with errors accounted. To understand impact of imprecision training, we a simple novel...

10.48550/arxiv.2405.03637 preprint EN arXiv (Cornell University) 2024-05-06

Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior algorithms for diffusion model prior. key idea to sample from chain of conditional posteriors, one each stage reverse process, which are estimated closed form Our approximations motivated by prior, and inherit its simplicity...

10.48550/arxiv.2410.03919 preprint EN arXiv (Cornell University) 2024-10-04

How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what effective discrete vocabulary real-valued sequential input? To address question, we WaveToken, wavelet-based tokenizer that allows learn complex representations directly the space of time-localized frequencies. Our method first scales and decomposes input series, then thresholds quantizes wavelet coefficients, finally pre-trains...

10.48550/arxiv.2412.05244 preprint EN arXiv (Cornell University) 2024-12-06

The world is not static: This causes real-world time series to change over through external, and potentially disruptive, events such as macroeconomic cycles or the COVID-19 pandemic. We present an adaptive sampling strategy that selects part of history relevant for forecasting. achieve this by learning a discrete distribution steps Bayesian optimization. instantiate idea with two-step method pre-trained uniform then training lightweight architecture sampling. show synthetic experiments...

10.48550/arxiv.2302.11870 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Modern recommender systems usually include separate recommendation carousels such as 'trending now' to list trending items and further boost their popularity, thereby attracting active users. Though widely useful, typically generate item lists based on simple heuristics, e.g., the number of interactions within a time interval, therefore still leave much room for improvement. This paper aims systematically study this under-explored but important problem from new perspective series...

10.1145/3604915.3608810 article EN cc-by 2023-09-14
Coming Soon ...