Joonseok Lee

ORCID: 0000-0002-0786-8086
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Recommender Systems and Techniques
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • Human Pose and Action Recognition
  • Video Analysis and Summarization
  • Advanced Vision and Imaging
  • Topic Modeling
  • Domain Adaptation and Few-Shot Learning
  • Advanced Graph Neural Networks
  • Music and Audio Processing
  • Text and Document Classification Technologies
  • Advanced Image and Video Retrieval Techniques
  • Image Retrieval and Classification Techniques
  • Advanced Bandit Algorithms Research
  • Music Technology and Sound Studies
  • Interactive and Immersive Displays
  • Advanced Neural Network Applications
  • Computer Graphics and Visualization Techniques
  • Anomaly Detection Techniques and Applications
  • Advanced Image Processing Techniques
  • Web Data Mining and Analysis
  • Advanced Text Analysis Techniques
  • Multimedia Communication and Technology
  • Diverse Musicological Studies
  • Medical Image Segmentation Techniques

Google (United States)
2016-2025

University of Toronto
2025

Seoul National University
2022-2024

Samsung SDS (South Korea)
2022

Samsung (South Korea)
2022

National University
2021-2022

Korea Post
2022

Pohang University of Science and Technology
2022

Gwangju Institute of Science and Technology
2021

Alphabet (United States)
2019

Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry exploring novel approaches at scale. It is possible train models over millions examples within a few days. Although large-scale datasets exist image understanding, such as ImageNet, there no comparable size video classification In this paper, we introduce YouTube-8M, largest multi-label dataset,...

10.48550/arxiv.1609.08675 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Although essential to revealing biased performance, well intentioned attempts at algorithmic auditing can have effects that may harm the very populations these measures are meant protect. This concern is even more salient while biometric systems such as facial recognition, where data sensitive and technology often used in ethically questionable manners. We demonstrate a set of fiveethical concerns particular case commercial processing technology, highlighting additional design considerations...

10.1145/3375627.3375820 article EN 2020-02-05

Personalized recommendation systems are used in a wide variety of applications such as electronic commerce, social networks, web search, and more. Collaborative filtering approaches to typically assume that the rating matrix (e.g., movie ratings by viewers) is low-rank. In this paper, we examine an alternative approach which locally Concretely, low-rank within certain neighborhoods metric space defined (user, item) pairs. We combine recent for local approximation based on Frobenius norm with...

10.1145/2566486.2567970 article EN 2014-04-07

Tracking and predicting extreme events in large-scale spatio-temporal climate data are long standing challenges science. In this paper, we propose Convolutional LSTM (ConvLSTM)-based models to track predict hurricane trajectories from data; namely, pixel-level history of tropical cyclones. To address the tracking problem, model time-sequential density maps trajectories, enabling capture not only temporal dynamics but also spatial distribution trajectories. Furthermore, introduce a new...

10.1109/wacv.2019.00192 article EN 2019-01-01

For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process --- should be queried adaptively in sequential fashion, multiple items offered for opinion solicitation at each trial. In this work, we propose novel algorithm that learns conduct the guided by decision tree with questions split. The splits, represented as sparse weight vectors, are learned L_1-constrained optimization framework. directed...

10.1145/2433396.2433451 article EN 2013-02-04

The goal of video understanding is to develop algorithms that enable machines understand videos at the level human experts. Researchers have tackled various domains including classification, search, personalized recommendation, and more. However, there a research gap in combining these one unified learning framework. Towards that, we propose deep network embeds using their audio-visual content, onto metric space which preserves video-to-video relationships. Then, use trained embedding tackle...

10.1145/3219819.3219856 article EN 2018-07-19

The task of predicting future actions from a video is crucial for real-world agent interacting with others. When anticipating in the distant future, we humans typically consider long-term relations over whole sequence actions, i.e., not only observed past but also potential future. In similar spirit, propose an end-to-end attention model action anticipation, dubbed Future Transformer (FUTR), that leverages global all input frames and output tokens to predict minutes-long actions. Unlike...

10.1109/cvpr52688.2022.00306 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Although essential to revealing biased performance, well intentioned attempts at algorithmic auditing can have effects that may harm the very populations these measures are meant protect. This concern is even more salient while biometric systems such as facial recognition, where data sensitive and technology often used in ethically questionable manners. We demonstrate a set of five ethical concerns particular case commercial processing technology, highlighting additional design...

10.48550/arxiv.2001.00964 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: first attempt at new generation acoustic models that link audio directly to unconstrained natural language descriptions. MuLan takes the form two-tower, joint audio-text embedding model trained 44 million recordings (370K hours) weakly-associated, free-form annotations. Through its compatibility...

10.48550/arxiv.2208.12415 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Traditional recommendation systems using collaborative filtering (CF) approaches work relatively well when the candidate videos are sufficiently popular With increase of user-created videos, however, recommending fresh gets more and important, but pure CF-based may not perform in such cold-start situation. In this paper, we model as a video content-based similarity learning problem, learn deep embeddings trained to predict relationships identified by co-watch-based system only visual audial...

10.1109/iccvw.2017.121 article EN 2017-10-01

Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised of graph embeddings has benefited from the information contained random walks. In this paper, we propose a model: Network GCNs (N-GCN), which marries these two lines work. At its core, N-GCN trains multiple instances over node pairs discovered at different distances walks, and learns combination instance outputs optimizes classification...

10.48550/arxiv.1802.08888 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Zero-shot learning offers an efficient solution for a machine model to treat unseen categories, avoiding exhaustive data collection. Sketch-based Image Retrieval (ZS-SBIR) simulates real-world scenarios where it is hard and costly collect paired sketch-photo samples. We propose novel framework that indirectly aligns sketches photos by contrasting them through texts, removing the necessity of access pairs. With explicit modality encoding learned from data, our approach disentangles...

10.1109/wacv57701.2024.00555 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

10.1109/icassp49660.2025.10887608 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Session-based recommendation aims at predicting the next item given a sequence of previous items consumed in session, e.g., on e-commerce or multimedia streaming services. Specifically, session data exhibits some unique characteristics, i.e., consistency and sequential dependency over within repeated consumption, timeliness. In this paper, we propose simple-yet-effective linear models for considering holistic aspects sessions. The comprehensive nature our helps improve quality session-based...

10.1145/3442381.3450005 preprint EN 2021-04-19

X-ray computed tomography (CT) is one of the most common imaging techniques used to diagnose various diseases in medical field. Its high contrast sensitivity and spatial resolution allow physician observe details body parts such as bones, soft tissue, blood vessels, etc. As it involves potentially harmful radiation exposure patients surgeons, however, reconstructing 3D CT volume from perpendicular 2D images considered a promising alternative, thanks its lower risk better accessibility. This...

10.1109/icassp49357.2023.10096296 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music models excel at the former through advanced audio codecs, exploration of signatures has been confined to specific visual scenarios. In contrast, our research confronts challenge learning between video directly from paired videos, without explicitly modeling domain-specific rhythmic or semantic relationships. We propose V2Meow,...

10.1609/aaai.v38i5.28299 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

10.1145/3626772.3657801 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10
Coming Soon ...