Jen-Yu Liu

ORCID: 0000-0003-1299-6688
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Speech and Audio Processing
  • Music Technology and Sound Studies
  • Speech Recognition and Synthesis
  • Animal Vocal Communication and Behavior
  • Neuroscience and Music Perception
  • Mechanical Engineering and Vibrations Research
  • Robotic Mechanisms and Dynamics
  • Marine animal studies overview
  • Human Pose and Action Recognition
  • Iterative Learning Control Systems
  • Generative Adversarial Networks and Image Synthesis
  • Engineering Technology and Methodologies
  • Emotion and Mood Recognition
  • Metallurgy and Material Forming
  • Manufacturing Process and Optimization
  • Gear and Bearing Dynamics Analysis
  • Advanced machining processes and optimization
  • Soil Mechanics and Vehicle Dynamics
  • Topological and Geometric Data Analysis
  • Cell Image Analysis Techniques
  • Sensorless Control of Electric Motors
  • Privacy-Preserving Technologies in Data
  • Recommender Systems and Techniques
  • Mechanics and Biomechanics Studies

Research Center for Information Technology Innovation, Academia Sinica
2012-2019

National Taiwan University
2016-2018

Academia Sinica
2012-2016

Center for Information Technology
2013-2014

National University of Formosa
2009

National Formosa University
2001-2009

National Yunlin University of Science and Technology
1993

Yangon University of Economics
1991

To apply neural sequence models such as the Transformers to music generation tasks, one has represent a piece of by tokens drawn from finite set pre-defined vocabulary. Such vocabulary usually involves various types. For example, describe musical note, needs separate indicate note’s pitch, duration, velocity (dynamics), and placement (onset time) along time grid. While different types may possess properties, existing treat them equally, in same way modeling words natural languages. In this...

10.1609/aaai.v35i1.16091 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

The use of machine learning (ML) based techniques has become increasingly popular in the field bioacoustics over last years. Fundamental requirement for successful application ML are curated, agreed upon, high-quality datasets and benchmark tasks to be learned on a given dataset. However, so far lacks such public benchmarks which cover multiple species measure performance controlled standardized way that allows benchmarking newly proposed existing ones. Here, we propose BEANS (the BEnchmark...

10.1109/icassp49357.2023.10096686 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

A scientific understanding of emotion experience requires information on the contexts in which is induced. Moreover, as one primary functions music to regulate listener's mood, individual's short-term preference may reveal emotional state individual. In light these observations, this paper presents first study that exploits online repository social data investigate connections between a blogger's state, user context manifested blog articles, and content titles blogger attached post. number...

10.1109/tmm.2013.2265078 article EN IEEE Transactions on Multimedia 2013-05-29

Audio event detection aims at discovering the elements inside an audio clip. In addition to labeling clips with events, we want find out temporal locations of these events. However, creating clearly annotated training data can be time-consuming. Therefore, provide a model based on convolutional neural networks that relies only weakly-supervised for training. These directly obtained from online platforms, such as Freesound, clip-level labels assigned by uploaders. The structure our is...

10.1109/icassp.2017.7952264 article EN 2017-03-01

Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation convolution layers, they can preserve high-resolution information and still reach distant locations. Producing predictions is also crucial music source separation, whose goal to separate different sound sources while maintain the quality of separated sounds. Therefore, this paper, we use stacked as backbone separation. Although wider context...

10.24963/ijcai.2019/655 article EN 2019-07-28

There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised techniques, such as sparse coding and deep belief networks have utilized to represent a term-document structure comprising of elementary codewords. Despite widespread use bag-of-frames (BoF) model, few attempts made systematically compare different component settings. Moreover, whether techniques...

10.1109/tmm.2014.2311016 article EN IEEE Transactions on Multimedia 2014-03-11

In music auto-tagging, people develop models to automatically label a clip with attributes such as instruments, styles or acoustic properties. Many of these tags are actually descriptors local events in clip, rather than holistic description the whole clip. Localizing time can potentially innovate way retrieve and interact music, but little work has been done date due scarcity labeled data granularity specific enough frame level. Most for training learning-based model auto-tagging level,...

10.1145/2964284.2964292 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Convolutional neural networks with skip connections have shown good performance in music source separation. In this work, we propose a denoising Auto-encoder Recurrent Connections (ARC). We use 1D convolution along the temporal axis of time-frequency feature map all layers fully-convolutional network. The makes it possible to apply recurrent intermediate outputs layers. addition, also an enhancement network and residual regression method further improve separation result. connections,...

10.1109/icmla.2018.00123 article EN 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2018-12-01

10.1109/icassp49660.2025.10889313 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Being able to predict whether a song can be hit has important applications in the music industry. Although it is true that popularity of greatly affected by external factors such as social and commercial influences, which degree audio features computed from musical signals (whom we regard internal factors) an interesting research question on its own. Motivated recent success deep learning techniques, attempt extend previous work prediction jointly models using learning. Specifically,...

10.1109/icassp.2017.7952230 article EN 2017-03-01

In a recent paper, we have presented generative adversarial network (GAN)-based model for unconditional generation of the mel-spectrograms singing voices.As generator is designed to take variable-length sequence noise vectors as input, it can generate variable length.However, our previous listening test shows that quality generated audio leaves room improvement.The present paper extends and expands work in following aspects.First, employ hierarchical architecture induce some structure...

10.21437/interspeech.2020-1137 article EN Interspeech 2022 2020-10-25

Recent years have witnessed an increased interest in the application of persistent homology, a topological tool for data analysis, to machine learning problems. Persistent homology is known its ability numerically characterize shapes spaces induced by features or functions. On other hand, deep neural networks been shown effective various tasks. To our best knowledge, however, existing network models seldom exploit shape information. In this paper, we investigate way use framework networks....

10.48550/arxiv.1608.07373 preprint EN other-oa arXiv (Cornell University) 2016-01-01

This paper derives the surface geometry and machine tool settings for a double threaded variable pitch lead screw with four cylindrical meshing elements. A 4-axis machining center rotary milling head attachment is adopted manufacturing of profiles screws. And, based on developed equations required settings, authors develop computer program solid modeling to simulate such mechanism before after machining. The result this work necessary task computer-aided design transmission mechanisms...

10.1115/1.2919216 article EN Journal of Mechanical Design 1993-09-01

Nowadays, we often leave our personal information on the Internet without noticing it. People could learn things about you from these information. It has been reported that it is possible to infer some web browsing records or blog articles. As music streaming services become increasingly popular, listening history of one person be acquired easily. This paper investigates possibility for a computer automatically traits such as gender and age history. Specifically, consider three types...

10.1145/2390848.2390856 article EN 2012-11-02

This paper proposes a context-aware approach that recommends music to user based on the user's emotional state predicted from article writes. We analyze association between user-generated text and by using real-world dataset with user, text, tripartite information collected social blogging website LiveJournal. The audio represents various perceptual dimensions of listening, including danceability, loudness, mode, tempo; consists bag-of-words three dimensional affective states within an...

10.1145/2502081.2502170 article EN 2013-10-21

Generative models for singing voice have been mostly concerned with the task of ``singing synthesis,'' i.e., to produce waveforms given musical scores and text lyrics. In this work, we explore a novel yet challenging alternative: generation without pre-assigned lyrics, in both training inference time. particular, outline three such schemes, propose pipeline tackle these new tasks. Moreover, implement using generative adversarial networks evaluate them objectively subjectively.

10.48550/arxiv.1912.11747 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Music videos are one of the most popular types video streaming services, and instrument playing is among common scenes in such videos. In order to understand instrument-playing videos, it important know what instruments played, when they where actions occur scene. While audio-based recognition has been widely studied, visual aspect music remains largely unaddressed literature. One main obstacles difficulty collecting annotated data action locations for training-based methods. To address this...

10.1109/tmm.2018.2871418 article EN IEEE Transactions on Multimedia 2018-09-20

Can we make a famous rap singer like Eminem sing whatever our favorite song? Singing style transfer attempts to this possible, by replacing the vocal of song from source target singer. This paper presents method that learns unpaired data for singing using generative adversarial networks.

10.48550/arxiv.1807.02254 preprint EN other-oa arXiv (Cornell University) 2018-01-01

This paper proposes a music recommendation approach based on various similarity information via Factorization Machines (FM). We introduce the idea of similarity, which has been widely studied in filed retrieval, and incorporate multiple feature similarities into FM framework, including content-based context-based similarities. The not only captures similar patterns from referred objects, but enhances convergence speed accuracy FM. In addition, order to avoid noise within large features, we...

10.1109/wi-iat.2013.10 article EN 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) 2013-11-01

Recent years have witnessed a growing interest in modeling user behaviors multimedia research, emphasizing the need to consider human factors such as preference, activity, and emotion system development evaluation. Following this research line, we present paper LiveJournal two-million post (LJ2M) dataset foster on user-centered music information retrieval. The new is characterized by great diversity of real-life listening contexts where people interact. It contains blog articles from social...

10.1109/icme.2014.6890172 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2014-07-01

10.1016/s0890-6955(03)00157-3 article EN International Journal of Machine Tools and Manufacture 2003-07-22

10.1016/j.mechmachtheory.2006.01.007 article EN Mechanism and Machine Theory 2006-03-14

To apply neural sequence models such as the Transformers to music generation tasks, one has represent a piece of by tokens drawn from finite set pre-defined vocabulary. Such vocabulary usually involves various types. For example, describe musical note, needs separate indicate note's pitch, duration, velocity (dynamics), and placement (onset time) along time grid. While different types may possess properties, existing treat them equally, in same way modeling words natural languages. In this...

10.48550/arxiv.2101.02402 preprint EN cc-by arXiv (Cornell University) 2021-01-01
Coming Soon ...