Chao Zhang

ORCID: 0000-0002-7730-5131
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Topic Modeling
  • Sentiment Analysis and Opinion Mining
  • Text and Document Classification Technologies
  • Security and Verification in Computing
  • Emotion and Mood Recognition
  • Catalytic C–H Functionalization Methods
  • Advanced Malware Detection Techniques
  • Blockchain Technology Applications and Security
  • Analytical chemistry methods development
  • Neural Networks and Applications
  • Anomaly Detection Techniques and Applications
  • Misinformation and Its Impacts
  • Opinion Dynamics and Social Influence
  • Blind Source Separation Techniques
  • Analytical Chemistry and Sensors
  • Mass Spectrometry Techniques and Applications
  • Oxidative Organic Chemistry Reactions
  • Vanadium and Halogenation Chemistry
  • Face and Expression Recognition
  • Multimodal Machine Learning Applications

Tsinghua University
2013-2024

University of Cambridge
2018-2024

Jingdong (China)
2020-2023

Google (United States)
2023

Bridge University
2022

Georgia Institute of Technology
2020

Donghua University
2017

China Institute of Atomic Energy
2002

Deep learning methods have revolutionized speech recognition, image and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications the artificial intelligence field involve multiple modalities. Therefore, it is broad interest to study more difficult complex problem modeling across In this paper, we provide technical review available models for multimodal intelligence. The main focus combination vision modalities,...

10.1109/jstsp.2020.2987728 article EN IEEE Journal of Selected Topics in Signal Processing 2020-03-01

This paper describes the study of atomization nanoparticles by inductively coupled plasma mass spectrometry (ICPMS) and developes a novel nonisotopic immunoassay coupling sandwich-type immunoreaction to ICPMS. The goat-anti-rabbit immunoglobulin G (IgG) labeled with colloidal gold served as an analyte in ICPMS for indirect measurement rabbit-anti-human IgG. Matrix effect studies showed signal was not sensitive organic matrix. A relatively good correlation (r2 = 0.9528) between proposed...

10.1021/ac0103468 article EN Analytical Chemistry 2001-11-30

Binary code similarity detection (BCSD) has many applications, including patch analysis, plagiarism detection, malware and vulnerability search etc. Existing solutions usually perform comparisons over specific syntactic features extracted from binary code, based on expert knowledge. They have either high performance overheads or low accuracy. Moreover, few are suitable for detecting similarities between cross-version binaries, which may not only diverge in structures but also slightly semantics.

10.1145/3238147.3238199 article EN 2018-08-20

A room-temperature Pd(II)-catalyzed regioselective chlorination reaction has been developed for a facile one-pot synthesis of broad range 2-chlorophenols. The demonstrates an excellent regioselectivity and reactivity C–H chlorination. This represents one the rare examples mild functionalization at ambient temperature.

10.1039/c3cc47431c article EN Chemical Communications 2013-11-26

An efficient Co(<sc>ii</sc>)-catalyzed intermolecular annulation of <italic>N</italic>-(quinolin-8-yl)benzamide with allenes for the synthesis novel isoquinolin-1(2<italic>H</italic>)-one scaffolds has been developed.

10.1039/c6qo00567e article EN Organic Chemistry Frontiers 2016-11-08

In this paper, a novel two-branch neural network model structure is proposed for multimodal emotion recognition, which consists of time synchronous branch (TSB) and asynchronous (TAB). To capture correlations between each word its acoustic realisation, the TSB combines speech text modalities at input window frame then uses pooling across to form single embedding vector. The TAB, by contrast, provides cross-utterance information integrating sentence embeddings from number context utterances...

10.1109/icassp39728.2021.9414880 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, their combination automatic speech recognition (ASR). Unlike unidirectional LM GPT BERT is whose direct product the output probabilities no longer valid prior probability. A conversion...

10.1109/asru51503.2021.9688232 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR), with several pioneering studies attempting to build integrated ASR by connecting a encoder an LLM. This paper presents comparative study three commonly used structures as connectors, including fully connected layers, multi-head cross-attention, Q-Former. Speech encoders from the Whisper model series well LLMs Vicuna different sizes were studied....

10.1109/icassp48485.2024.10445874 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

A novel approach to access ortho iodinated phenols using cyclic hypervalent iodine reagents through palladium(II) catalyzed C-H activation has been developed weak coordination. The reaction showed excellent regioselectivity, reactivity and good functional group tolerance. unique mechanism was proposed.

10.1039/c5cc02533h article EN Chemical Communications 2015-01-01

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as supervised sequence-to-sequence learning problem. Com-pared to traditional unsupervised algorithms, DNC learns patterns from training without requiring an explicit definition similarity measure. An implementation based on the Transformer architecture is shown be effective speaker diarisation task using challenging AMI dataset. Since contains only 147 complete...

10.1109/slt48900.2021.9383617 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2021-01-19

Control-flow integrity (CFI) is a promising technique to mitigate control-flow hijacking attacks. In the past decade, dozens of CFI mechanisms have been proposed by researchers. Despite claims made themselves, security promises these not carefully evaluated, and thus are questionable.

10.1145/3372297.3417867 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2020-10-30

A new highly fluorescent β-diketone−europium chelate was synthesized and employed as a tracer to develop time-resolved fluoroimmunoassay (TRFIA) for detection of serum total thyroxine (T4). The tetradentate β-diketone chelator, 1,10-bis(thiophene-2'-yl)-4,4,5,5,6,6,7,7-octafluorodecane-1,3,8,10-tetraone (BTOT), structurally composed two units thenoyltrifluoroacetone (TTA) derivatives but expressed fluorescence that greatly enhanced, compared the original TTA molecules, in presence excess...

10.1021/ac025727f article EN Analytical Chemistry 2002-10-22

Memory safety is a key security property that stops memory corruption vulnerabilities. Different types of enforcement solutions have been proposed and adopted by sanitizers or mitigations to catch stop such bugs, at the development deployment phase. However, existing either provide partial overwhelmingly high performance overheads.

10.1145/3548606.3560598 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2022-11-07

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as supervised sequence-to-sequence learning problem. Compared to traditional unsupervised algorithms, DNC learns patterns from training without requiring an explicit definition similarity measure. An implementation based on the Transformer architecture is shown be effective speaker diarisation task using challenging AMI dataset. Since contains only 147 complete...

10.48550/arxiv.1910.09703 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Aspect-based sentiment analysis is a substantial step towards text understanding which benefits numerous applications. Since most existing algorithms require large amount of labeled data or external language resources, applying them on new domain usually expensive and time-consuming. We aim to build an aspect-based model from unlabeled corpus with minimal guidance users, i.e., only small set seed words for each aspect class class. employ autoencoder structure attention learn two dictionary...

10.1145/3397271.3401179 article EN 2020-07-25

Contextual knowledge is important for real-world automatic speech recognition (ASR) applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component proposed that incorpo-rates such as list of biasing words into both attention-based encoder-decoder and transducer end-to-end ASR models in neural-symbolic way. TCPGen structures the an efficient prefix tree to serve its symbolic input creates neu-ral shortcut between final output distribution facilitate recognising...

10.1109/asru51503.2021.9687915 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to perception and understanding general auditory information consisting at least three types sounds: speech, audio events, music. In this paper, we propose SALMONN, a speech language music open neural network, built by integrating pre-trained text-based large model (LLM) with encoders into single multimodal model. SALMONN enables LLM directly process understand inputs achieve...

10.48550/arxiv.2310.13289 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Incorporating biasing words obtained through contextual knowledge is paramount in automatic speech recognition (ASR) applications. This paper proposes an innovative method for achieving end-to-end ASR using graph neural network (GNN) encodings based on the tree-constrained pointer generator method. GNN node facilitate lookahead future word pieces process of decoding at each tree by incorporating information about all branches rooted from it. results a more precise prediction generation...

10.1109/taslp.2024.3389645 article EN cc-by IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifically, we a novel indicator that empirically integrates step-wise information during assess token-level...

10.48550/arxiv.2405.14161 preprint EN arXiv (Cornell University) 2024-05-23

To further expand the range of analytes that can be detected by using ICP-MS coupled with bioanalytical methods, we have employed a new separation system based on highly active surface streptavidin and biotinylated monoclonal antibody (McAb) in competitive immunoassay followed detection. Specifically, demonstrated its application for determination total thyroxine (T4) human serum Eu3+ as label. In this method, immobilized to pre-coated bovine albumin (BSA)-biotin microwells showed...

10.1039/b205623b article EN Journal of Analytical Atomic Spectrometry 2002-08-27

It is important to test convolutional neural networks (CNNs) identify defects (e.g. error-inducing inputs) before deploying them in security-sensitive scenarios. Although existing white-box testing methods can effectively CNN models with high neuron coverage, they are not applicable privacy-sensitive scenarios where full knowledge of target lacking. In this work, we propose a novel Black-box Efficient Testing (BET) method for models. The core insight BET that CNNs generally prone be affected...

10.1145/3533767.3534386 article EN 2022-07-15

The traditional hybrid deep neural network (DNN)–hidden Markov model (HMM) system and attention-based encoder–decoder (AED) are both commonly used automatic speech recognition (ASR) approaches with distinct characteristics advantages. While systems per-frame-based highly modularised to leverage external phonetic linguistic knowledge, AED models operate on a per-label basis jointly learn the acoustic language information using single in an end-to-end trainable fashion. In this paper, we...

10.1016/j.specom.2022.12.002 article EN cc-by Speech Communication 2022-12-25
Coming Soon ...