Yichi Zhang

ORCID: 0000-0003-0465-6341
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Music and Audio Processing
  • Speech Recognition and Synthesis
  • Advanced Malware Detection Techniques
  • Network Security and Intrusion Detection
  • Music Technology and Sound Studies
  • Software Engineering Research
  • Hearing Loss and Rehabilitation
  • Acoustic Wave Phenomena Research
  • Advanced Adaptive Filtering Techniques
  • Spam and Phishing Detection
  • Spectroscopy and Chemometric Analyses
  • Software Testing and Debugging Techniques
  • Video Analysis and Summarization
  • Digital and Cyber Forensics
  • Complex Systems and Time Series Analysis
  • Time Series Analysis and Forecasting
  • Advanced Steganography and Watermarking Techniques

Apple (United Kingdom)
2023

Hearing4all
2022-2023

University of Rochester
2015-2020

In this paper, we present a Sequence-to-Sequence Attentional Siamese Neural Network (Seq2Seq-ASNN) that leverages temporal alignment information for end-to-end speaker verification. prior works of discriminative neural networks, utterance-level evaluation/enrollment representations are usually calculated. Our proposed model, utilizing sequence-to-sequence (Seq2Seq) attention mechanism, maps the frame-level evaluation representation into enrollment feature domain and further generates an...

10.1109/icassp.2019.8682676 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Cochlear implants (CIs) have proven to be successful at restoring the sensation of hearing in people who suffer from profound sensorineural loss. CI users generally achieve good speech understanding quiet acoustic conditions. However, their ability understand degrades drastically when background interfering noise is present. To address this problem, current systems are delivered with front-end enhancement modules that can aid listener noisy environments. these only perform well under certain...

10.1109/tbme.2023.3262677 article EN cc-by IEEE Transactions on Biomedical Engineering 2023-03-28

Conventional methods for finding audio in databases typically search text labels, rather than the itself. This can be problematic as labels may missing, irrelevant to content, or not known by users. Query vocal imitation lets users query using imitations instead. To do so, appropriate feature representations and effective similarity measures of original sounds must developed. In this paper, we build upon our preliminary work propose Siamese style convolutional neural networks learn a unified...

10.1109/taslp.2018.2868428 article EN publisher-specific-oa IEEE/ACM Transactions on Audio Speech and Language Processing 2018-09-03

Malware has become one of the most serious threats to computer information system. In this paper, we describe HERO (Hybrid security extension binary translation), a novel framework that exploits static and dynamic translation features detect broad spectrum malware prevent its execution. By operating directly on code without any assumption availability source code, is appropriate for translating low-level high-level proper representation, obtaining CFG (Control Flow Graph) other Control...

10.1109/icicisys.2010.5658586 article EN 2010-10-01

The proliferation of new malware in recent years has presented a serious security threat to our society. Research shows that variants some known ones take large amount malware, so one the challenges detection is how find similarities between and its variants. Since API(Application Programming Interface)functions used extensively achieve function program It difficult for different versions conceal similarity on functional flow level, making use their API-calling sequences an essential method....

10.1109/icise.2009.494 article EN 2009-01-01

Vocal imitation is widely used in human interactions. In this paper, we propose a novel human-computer interaction system called IMISOUND that listens to vocal and retrieves similar sounds from sound library. This allows users search even if they do not remember their semantic labels or the have these (e.g., synthesized effects). employs Stacked Auto-Encoder (SAE) extract features both (query) library (candidates). The SAE pre-trained using training imitations of automatically learn more...

10.1109/icassp.2016.7472081 article EN 2016-03-01

Searching sounds by text labels is often difficult, as cannot always provide sufficient information for the sound content. Previously we proposed an unsupervised system called IMISOUND search vocal imitation. In this paper, further propose a Convolutional Semi-Siamese Network (CSN) IMINET. IMINET uses two towers of Neural Networks (CNN) to extract features from imitations and recordings, respectively. It then adopts fully connected network predict similarity between recordings. We three...

10.1109/waspaa.2017.8170044 article EN 2017-10-01

Vocal imitation is widely used in human communication. In this paper, we propose an approach to automatically recognize the concept of a vocal imitation, and then retrieve sounds concept. Because different acoustic aspects (e.g., pitch, loudness, timbre) are emphasized imitating sounds, key challenge recognition extract appropriate features. Hand-crafted features may not work well for large variety imitations. Instead, use stacked auto-encoder learn from set imitations unsupervised way....

10.1109/mlsp.2015.7324316 article EN 2015-09-01

Designing systems that allow users to search sounds through vocal imitation augments the current text-based engines and advances human-computer interaction. Previously we proposed a Siamese style convolutional network called IMINET for sound by imitation, which jointly addresses feature extraction Convolutional Neural Network (CNN) similarity calculation Fully Connected (FCN), is currently state of art. However, how such architecture works still mystery. In this paper, try answer question....

10.1109/icassp.2018.8461729 article EN 2018-04-01

10.17743/jaes.2016.0013 article EN Journal of the Audio Engineering Society 2016-08-11

The current commercial anti-virus software detects a virus only after the has appeared and caused damage. Motivated by inference technique for detecting viruses, recent successful classification method, we explore system (Radux: Reverse Analysis Detecting Unsafe eXecutables) automatically malicious code using collected dataset of benign code. Our rests on fuzzy based behavior hidden in Decompile is applied to characterize behavioral structural properties binary code, which creates more...

10.1109/isdea.2010.314 article EN 2010-10-01

Speaker diarization (detecting who-spoke-when using relative identity labels) and speaker recognition absolute labels without timing) are different but related tasks that often need to be completed simultaneously in many scenarios. Traditional methods, however, address them independently. In this paper, we propose a method jointly diarize recognize speakers from collection of conversations. This benefits the sparsity temporal smoothness within conversation large-scale timbre modeling across...

10.1109/icassp.2018.8461666 article EN 2018-04-01

Traditional search through collections of audio recordings compares a text-based query to text metadata associated with each file and does not address the actual content audio. Text descriptions do describe all aspects in detail. Query by vocal imitation (QBV) is kind example that lets users imitate they seek, providing an alternative method traditional search. Prior work proposed several neural networks, such as TL-IMINET, for QBV, however, previous systems have been deployed engine nor...

10.1145/3343413.3377963 article EN 2020-03-12

Malware is rapidly becoming a major security issue. In order to avoid being analyzed statically, malwares resort various obfuscation techniques hide their malicious behaviors. The technique based on the exception return of subroutine one techniques. Currently disassemblers couldn't deal with malware which uses this technique. This paper presents static disassembly algorithm base virtual stack for handling return. result test proves that effective.

10.1109/ifita.2009.137 article EN International Forum on Information Technology and Applications 2009-05-01

This work investigates pretrained audio representations for few shot Sound Event Detection. We specifically address the task of detection novel acoustic sequences, or sound events with semantically meaningful temporal structure, without assuming access to non-target audio. develop procedures pretraining suitable representations, and methods which transfer them our learning scenario. Our experiments evaluate general purpose utility on AudioSet, proposed via tasks constructed from real-world...

10.1109/icassp49357.2023.10095265 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

This work investigates pretrained audio representations for few shot Sound Event Detection. We specifically address the task of detection novel acoustic sequences, or sound events with semantically meaningful temporal structure, without assuming access to non-target audio. develop procedures pretraining suitable representations, and methods which transfer them our learning scenario. Our experiments evaluate general purpose utility on AudioSet, proposed via tasks constructed from real-world...

10.48550/arxiv.2305.02382 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In multivariate time series systems, key insights can be obtained by discovering lead-lag relationships inherent in the data, which refer to dependence between two shifted relative one another, and leveraged for purposes of control, forecasting or clustering. We develop a clustering-driven methodology robust detection lagged multi-factor models. Within our framework, envisioned pipeline takes as input set series, creates an enlarged universe extracted subsequence from each via sliding window...

10.48550/arxiv.2305.06704 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Due to the underground economy stimulation, malware creators are writing malicious codes at an alarming rate. same time, novel resistance techniques commonly available, leading a huge number of variants. Behavior-based detection promising solution this serious problem. in paper we propose fuse program behaviors identify malware. This approach uses Bayesian training get degree behavior, adopt D-S synthesize rule detect virus. Our experimental evaluation shows that our prototype system...

10.1109/iscid.2012.30 article EN 2012-10-01

Searching sounds by text labels is often difficult, as descriptions cannot describe the audio content in detail. Query vocal imitation bridges such gap and provides a novel way to sound search. Several algorithms for search have been proposed evaluated simulation environment, however, they not deployed into real engine nor users. This pilot work conducts subjective study compare these two approaches search, tries answer question of which approach works better what kinds sounds. To do so, we...

10.48550/arxiv.1907.08661 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Abstract Cochlear implants (CIs) have proven to be successful at restoring the sensation of hearing in people who suffer from profound sensorineural loss. CI users generally achieve good speech understanding quiet acoustic conditions. However, their ability understand degrades drastically when background interfering noise is present. To address this problem, current systems are delivered with front-end enhancement modules that can aid listener noisy environments. these only perform well...

10.1101/2022.11.11.516123 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-11-13
Coming Soon ...