- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Speech and dialogue systems
- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Text and Document Classification Technologies
- Security and Verification in Computing
- Emotion and Mood Recognition
- Catalytic C–H Functionalization Methods
- Advanced Malware Detection Techniques
- Blockchain Technology Applications and Security
- Analytical chemistry methods development
- Neural Networks and Applications
- Anomaly Detection Techniques and Applications
- Misinformation and Its Impacts
- Opinion Dynamics and Social Influence
- Blind Source Separation Techniques
- Analytical Chemistry and Sensors
- Mass Spectrometry Techniques and Applications
- Oxidative Organic Chemistry Reactions
- Vanadium and Halogenation Chemistry
- Face and Expression Recognition
- Multimodal Machine Learning Applications
Tsinghua University
2013-2024
University of Cambridge
2018-2024
Jingdong (China)
2020-2023
Google (United States)
2023
Bridge University
2022
Georgia Institute of Technology
2020
Donghua University
2017
China Institute of Atomic Energy
2002
Deep learning methods have revolutionized speech recognition, image and natural language processing since 2010. Each of these tasks involves a single modality in their input signals. However, many applications the artificial intelligence field involve multiple modalities. Therefore, it is broad interest to study more difficult complex problem modeling across In this paper, we provide technical review available models for multimodal intelligence. The main focus combination vision modalities,...
This paper describes the study of atomization nanoparticles by inductively coupled plasma mass spectrometry (ICPMS) and developes a novel nonisotopic immunoassay coupling sandwich-type immunoreaction to ICPMS. The goat-anti-rabbit immunoglobulin G (IgG) labeled with colloidal gold served as an analyte in ICPMS for indirect measurement rabbit-anti-human IgG. Matrix effect studies showed signal was not sensitive organic matrix. A relatively good correlation (r2 = 0.9528) between proposed...
Binary code similarity detection (BCSD) has many applications, including patch analysis, plagiarism detection, malware and vulnerability search etc. Existing solutions usually perform comparisons over specific syntactic features extracted from binary code, based on expert knowledge. They have either high performance overheads or low accuracy. Moreover, few are suitable for detecting similarities between cross-version binaries, which may not only diverge in structures but also slightly semantics.
A room-temperature Pd(II)-catalyzed regioselective chlorination reaction has been developed for a facile one-pot synthesis of broad range 2-chlorophenols. The demonstrates an excellent regioselectivity and reactivity C–H chlorination. This represents one the rare examples mild functionalization at ambient temperature.
An efficient Co(<sc>ii</sc>)-catalyzed intermolecular annulation of <italic>N</italic>-(quinolin-8-yl)benzamide with allenes for the synthesis novel isoquinolin-1(2<italic>H</italic>)-one scaffolds has been developed.
In this paper, a novel two-branch neural network model structure is proposed for multimodal emotion recognition, which consists of time synchronous branch (TSB) and asynchronous (TAB). To capture correlations between each word its acoustic realisation, the TSB combines speech text modalities at input window frame then uses pooling across to form single embedding vector. The TAB, by contrast, provides cross-utterance information integrating sentence embeddings from number context utterances...
Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, their combination automatic speech recognition (ASR). Unlike unidirectional LM GPT BERT is whose direct product the output probabilities no longer valid prior probability. A conversion...
The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR), with several pioneering studies attempting to build integrated ASR by connecting a encoder an LLM. This paper presents comparative study three commonly used structures as connectors, including fully connected layers, multi-head cross-attention, Q-Former. Speech encoders from the Whisper model series well LLMs Vicuna different sizes were studied....
A novel approach to access ortho iodinated phenols using cyclic hypervalent iodine reagents through palladium(II) catalyzed C-H activation has been developed weak coordination. The reaction showed excellent regioselectivity, reactivity and good functional group tolerance. unique mechanism was proposed.
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as supervised sequence-to-sequence learning problem. Com-pared to traditional unsupervised algorithms, DNC learns patterns from training without requiring an explicit definition similarity measure. An implementation based on the Transformer architecture is shown be effective speaker diarisation task using challenging AMI dataset. Since contains only 147 complete...
Control-flow integrity (CFI) is a promising technique to mitigate control-flow hijacking attacks. In the past decade, dozens of CFI mechanisms have been proposed by researchers. Despite claims made themselves, security promises these not carefully evaluated, and thus are questionable.
A new highly fluorescent β-diketone−europium chelate was synthesized and employed as a tracer to develop time-resolved fluoroimmunoassay (TRFIA) for detection of serum total thyroxine (T4). The tetradentate β-diketone chelator, 1,10-bis(thiophene-2'-yl)-4,4,5,5,6,6,7,7-octafluorodecane-1,3,8,10-tetraone (BTOT), structurally composed two units thenoyltrifluoroacetone (TTA) derivatives but expressed fluorescence that greatly enhanced, compared the original TTA molecules, in presence excess...
Memory safety is a key security property that stops memory corruption vulnerabilities. Different types of enforcement solutions have been proposed and adopted by sanitizers or mitigations to catch stop such bugs, at the development deployment phase. However, existing either provide partial overwhelmingly high performance overheads.
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as supervised sequence-to-sequence learning problem. Compared to traditional unsupervised algorithms, DNC learns patterns from training without requiring an explicit definition similarity measure. An implementation based on the Transformer architecture is shown be effective speaker diarisation task using challenging AMI dataset. Since contains only 147 complete...
Aspect-based sentiment analysis is a substantial step towards text understanding which benefits numerous applications. Since most existing algorithms require large amount of labeled data or external language resources, applying them on new domain usually expensive and time-consuming. We aim to build an aspect-based model from unlabeled corpus with minimal guidance users, i.e., only small set seed words for each aspect class class. employ autoencoder structure attention learn two dictionary...
Contextual knowledge is important for real-world automatic speech recognition (ASR) applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component proposed that incorpo-rates such as list of biasing words into both attention-based encoder-decoder and transducer end-to-end ASR models in neural-symbolic way. TCPGen structures the an efficient prefix tree to serve its symbolic input creates neu-ral shortcut between final output distribution facilitate recognising...
Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to perception and understanding general auditory information consisting at least three types sounds: speech, audio events, music. In this paper, we propose SALMONN, a speech language music open neural network, built by integrating pre-trained text-based large model (LLM) with encoders into single multimodal model. SALMONN enables LLM directly process understand inputs achieve...
Incorporating biasing words obtained through contextual knowledge is paramount in automatic speech recognition (ASR) applications. This paper proposes an innovative method for achieving end-to-end ASR using graph neural network (GNN) encodings based on the tree-constrained pointer generator method. GNN node facilitate lookahead future word pieces process of decoding at each tree by incorporating information about all branches rooted from it. results a more precise prediction generation...
We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifically, we a novel indicator that empirically integrates step-wise information during assess token-level...
To further expand the range of analytes that can be detected by using ICP-MS coupled with bioanalytical methods, we have employed a new separation system based on highly active surface streptavidin and biotinylated monoclonal antibody (McAb) in competitive immunoassay followed detection. Specifically, demonstrated its application for determination total thyroxine (T4) human serum Eu3+ as label. In this method, immobilized to pre-coated bovine albumin (BSA)-biotin microwells showed...
It is important to test convolutional neural networks (CNNs) identify defects (e.g. error-inducing inputs) before deploying them in security-sensitive scenarios. Although existing white-box testing methods can effectively CNN models with high neuron coverage, they are not applicable privacy-sensitive scenarios where full knowledge of target lacking. In this work, we propose a novel Black-box Efficient Testing (BET) method for models. The core insight BET that CNNs generally prone be affected...
The traditional hybrid deep neural network (DNN)–hidden Markov model (HMM) system and attention-based encoder–decoder (AED) are both commonly used automatic speech recognition (ASR) approaches with distinct characteristics advantages. While systems per-frame-based highly modularised to leverage external phonetic linguistic knowledge, AED models operate on a per-label basis jointly learn the acoustic language information using single in an end-to-end trainable fashion. In this paper, we...