- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Speech Recognition and Synthesis
- Bayesian Methods and Mixture Models
- Gaussian Processes and Bayesian Inference
- Markov Chains and Monte Carlo Methods
- Evolutionary Algorithms and Applications
- Machine Learning and Algorithms
- Anomaly Detection Techniques and Applications
- Image and Signal Denoising Methods
- Metaheuristic Optimization Algorithms Research
- Advanced Database Systems and Queries
- Complex Systems and Time Series Analysis
- Model Reduction and Neural Networks
- Advanced Text Analysis Techniques
- Advanced Image Processing Techniques
- Data Mining Algorithms and Applications
- Biomedical Text Mining and Ontologies
- Sparse and Compressive Sensing Techniques
- AI and Big Data Applications
- Stock Market Forecasting Methods
- Internet Traffic Analysis and Secure E-voting
- Random Matrices and Applications
University of International Business and Economics
2021-2023
Alibaba Group (United States)
2018-2023
Alibaba Group (China)
2021-2022
Alibaba Group (Cayman Islands)
2018-2022
North China Institute of Aerospace Engineering
2021
Tongji University
2021
Shenzhen Institutes of Advanced Technology
2020
China Southern Power Grid (China)
2020
Duke University
2015-2019
Harbin University of Science and Technology
2019
The Generative Adversarial Network (GAN) has achieved great success in generating realistic (real-valued) synthetic data. However, convergence issues and difficulties dealing with discrete data hinder the applicability of GAN to text. We propose a framework for text via adversarial training. employ long short-term memory network as generator, convolutional discriminator. Instead using standard objective GAN, we matching high-dimensional latent feature distributions real sentences, kernelized...
Abstract For many biological applications, exploration of the massive parametric space a mechanism-based model can impose prohibitive computational demand. To overcome this limitation, we present framework to improve efficiency by orders magnitude. The key concept is train neural network using limited number simulations generated mechanistic model. This small enough such that be completed in short time frame but large enable reliable training. trained then used explore much larger space. We...
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results \cite{lample2018phrase} with only large monolingual corpora in each language. However, the uncertainty of associating target source sentences makes UNMT theoretically an ill-posed problem. This work investigates possibility utilizing images for disambiguation to improve performance UNMT. Our assumption is intuitively based on invariant property image, i.e., description same visual content by different...
The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when golden references provided. However, in case model inference or production deployment, expensively available, such as human annotation with bilingual expertise. In order to address issue quality estimation (QE) without reference, we propose a general framework for automatic evaluation output QE task Conference on Statistical Machine Translation (WMT). We first build conditional target language...
Recent advances in sequence modeling have highlighted the strengths of transformer architecture, especially achieving state-of-the-art machine translation results. However, depending on up-stream systems, e.g., speech recognition, or word segmentation, input to system can vary greatly. The goal this work is extend attention mechanism naturally consume lattice addition traditional sequential input. We first propose a general for where output automatic recognition (ASR) which contains multiple...
We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods this problem, that represent each class as point (via semantic embedding), we seen/unseen using class-specific latent-space distribution, conditioned on attributes. use these distributions prior supervised variational autoencoder (VAE), which also facilitates highly discriminative feature representations the inputs. The entire framework is learned end-to-end only...
Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates approximation as an optimization problem: to find the closest distribution exact over some family distributions. For practical reasons, distributions VI is usually constrained so that does not include posterior, even limit point. Thus, no matter how long run, resulting will approach posterior. We propose instead consider more flexible approximating consisting all possible finite...
Distant supervised relation extraction has been successfully applied to large corpus with thousands of relations. However, the inevitable wrong labeling problem by distant supervision will hurt performance extraction. In this paper, we propose a method neural noise converter alleviate impact noisy data, and conditional optimal selector make proper prediction. Our learns structured transition matrix on logit level captures property dataset. The other hand helps prediction decision an entity...
Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number parameters and computational complexity. However, few attention is paid to baseline model. In this paper, we research extensively pros cons standard transformer in translation, find that auto-regressive property can simultaneously bring both advantage consistency disadvantage error accumulation. Therefore, propose a surprisingly simple...
Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these still struggle with complex problems that require multiple steps, frequently leading to logical or numerical errors. While mistakes can largely be addressed by integrating a code interpreter, identifying errors within intermediate steps is more challenging. Moreover, manually annotating for training not only expensive but also demands specialized expertise....
Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility modern to yield scalable learning and inference, while maintaining a measure uncertainty model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are family diffusion-based sampling for large-scale learning. In SG-MCMC, multivariate stochastic thermostats (mSGNHT) augment each parameter interest, with momentum thermostat variable maintain stationary...
The Hierarchical Graph-Coupled Hidden Markov Model (hGCHMM) is a useful tool for tracking and predicting the spread of contagious diseases, such as influenza, by leveraging social contact data collected from individual wearable devices. However, existing inference algorithms depend on assumption that infection rates are small in probability, typically close to 0. purpose this paper build unified learning framework latent state estimation hGCHMM, regardless rate transition function. We derive...
The purpose of this study is to leverage modern technology (mobile or web apps) enrich epidemiology data and infer the transmission disease. We develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) simultaneously track spread infection in a small cell phone community capture person-specific parameters by leveraging link prior that incorporates additional covariates. In paper we investigate two functions, beta-exponential sigmoid link, both which allow development principled...
We propose a new method that uses deep learning techniques to accelerate the popular alternating direction of multipliers (ADMM) solution for inverse problems. The ADMM updates consist proximity operator, least squares regression includes big matrix inversion, and an explicit updating dual variables. Typically, inner loops are required solve first two sub-minimization problems due intractability prior inversion. To avoid such drawbacks or limitations, we inner-loop free update rule with...
We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton can be developed as well. This is accomplished generalizing the gradient computation in stochastic backpropagation via reparametrization trick with lower complexity. As an illustrative example, we apply this approach to problems of Bayesian logistic regression auto-encoder (VAE). Additionally, compute bounds on estimator...
Massage therapy (MT) is a useful complementary and alternative widely used in treating low back pain (LBP), including lumbar disc herniation (LDH). However, few studies revealed the quantitative entropy-based features of electroencephalography (EEG) for MT effectiveness LDH patients. This study investigated immediate effects Chinese massage on four EEG rhythms, using eight (approximation entropy (ApEn), Sample Entropy (SampEn), wavelet (WaveEn), Hilbert-Huang Transform Marginal spectrum...
Cross-Lingual Summarization (CLS) is a task that extracts important information from source document and summarizes it into summary in another language. It challenging requires system to understand, summarize, translate at the same time, making highly related Monolingual (MS) Machine Translation (MT). In practice, training resources for are far more than cross-lingual monolingual summarization. Thus incorporating corpus CLS would be beneficial its performance. However, present work only...
A triple speech translation data comprises speech, transcription, and translation.In the end-to-end paradigm, text machine (MT) usually plays role of a teacher model for (ST) via knowledge distillation. Parameter sharing with is often adopted to construct ST architecture, however, two modalities are independently fed trained different losses. This situation does not match ST’s properties across also limits upper bound performance. Inspired by works video Transformer, we propose simple...