Kai Fan

ORCID: 0000-0002-8256-0807
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • Speech Recognition and Synthesis
  • Bayesian Methods and Mixture Models
  • Gaussian Processes and Bayesian Inference
  • Markov Chains and Monte Carlo Methods
  • Evolutionary Algorithms and Applications
  • Machine Learning and Algorithms
  • Anomaly Detection Techniques and Applications
  • Image and Signal Denoising Methods
  • Metaheuristic Optimization Algorithms Research
  • Advanced Database Systems and Queries
  • Complex Systems and Time Series Analysis
  • Model Reduction and Neural Networks
  • Advanced Text Analysis Techniques
  • Advanced Image Processing Techniques
  • Data Mining Algorithms and Applications
  • Biomedical Text Mining and Ontologies
  • Sparse and Compressive Sensing Techniques
  • AI and Big Data Applications
  • Stock Market Forecasting Methods
  • Internet Traffic Analysis and Secure E-voting
  • Random Matrices and Applications

University of International Business and Economics
2021-2023

Alibaba Group (United States)
2018-2023

Alibaba Group (China)
2021-2022

Alibaba Group (Cayman Islands)
2018-2022

North China Institute of Aerospace Engineering
2021

Tongji University
2021

Shenzhen Institutes of Advanced Technology
2020

China Southern Power Grid (China)
2020

Duke University
2015-2019

Harbin University of Science and Technology
2019

The Generative Adversarial Network (GAN) has achieved great success in generating realistic (real-valued) synthetic data. However, convergence issues and difficulties dealing with discrete data hinder the applicability of GAN to text. We propose a framework for text via adversarial training. employ long short-term memory network as generator, convolutional discriminator. Instead using standard objective GAN, we matching high-dimensional latent feature distributions real sentences, kernelized...

10.48550/arxiv.1706.03850 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Abstract For many biological applications, exploration of the massive parametric space a mechanism-based model can impose prohibitive computational demand. To overcome this limitation, we present framework to improve efficiency by orders magnitude. The key concept is train neural network using limited number simulations generated mechanistic model. This small enough such that be completed in short time frame but large enable reliable training. trained then used explore much larger space. We...

10.1038/s41467-019-12342-y article EN cc-by Nature Communications 2019-09-25

Unsupervised neural machine translation (UNMT) has recently achieved remarkable results \cite{lample2018phrase} with only large monolingual corpora in each language. However, the uncertainty of associating target source sentences makes UNMT theoretically an ill-posed problem. This work investigates possibility utilizing images for disambiguation to improve performance UNMT. Our assumption is intuitively based on invariant property image, i.e., description same visual content by different...

10.1109/cvpr.2019.01073 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when golden references provided. However, in case model inference or production deployment, expensively available, such as human annotation with bilingual expertise. In order to address issue quality estimation (QE) without reference, we propose a general framework for automatic evaluation output QE task Conference on Statistical Machine Translation (WMT). We first build conditional target language...

10.1609/aaai.v33i01.33016367 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Recent advances in sequence modeling have highlighted the strengths of transformer architecture, especially achieving state-of-the-art machine translation results. However, depending on up-stream systems, e.g., speech recognition, or word segmentation, input to system can vary greatly. The goal this work is extend attention mechanism naturally consume lattice addition traditional sequential input. We first propose a general for where output automatic recognition (ASR) which contains multiple...

10.18653/v1/p19-1649 preprint EN cc-by 2019-01-01

We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods this problem, that represent each class as point (via semantic embedding), we seen/unseen using class-specific latent-space distribution, conditioned on attributes. use these distributions prior supervised variational autoencoder (VAE), which also facilitates highly discriminative feature representations the inputs. The entire framework is learned end-to-end only...

10.48550/arxiv.1711.05820 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates approximation as an optimization problem: to find the closest distribution exact over some family distributions. For practical reasons, distributions VI is usually constrained so that does not include posterior, even limit point. Thus, no matter how long run, resulting will approach posterior. We propose instead consider more flexible approximating consisting all possible finite...

10.48550/arxiv.1611.05559 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Distant supervised relation extraction has been successfully applied to large corpus with thousands of relations. However, the inevitable wrong labeling problem by distant supervision will hurt performance extraction. In this paper, we propose a method neural noise converter alleviate impact noisy data, and conditional optimal selector make proper prediction. Our learns structured transition matrix on logit level captures property dataset. The other hand helps prediction decision an entity...

10.1609/aaai.v33i01.33017273 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number parameters and computational complexity. However, few attention is paid to baseline model. In this paper, we research extensively pros cons standard transformer in translation, find that auto-regressive property can simultaneously bring both advantage consistency disadvantage error accumulation. Therefore, propose a surprisingly simple...

10.18653/v1/2020.emnlp-main.81 article EN cc-by 2020-01-01

Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these still struggle with complex problems that require multiple steps, frequently leading to logical or numerical errors. While mistakes can largely be addressed by integrating a code interpreter, identifying errors within intermediate steps is more challenging. Moreover, manually annotating for training not only expensive but also demands specialized expertise....

10.48550/arxiv.2405.03553 preprint EN arXiv (Cornell University) 2024-05-06

Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility modern to yield scalable learning and inference, while maintaining a measure uncertainty model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are family diffusion-based sampling for large-scale learning. In SG-MCMC, multivariate stochastic thermostats (mSGNHT) augment each parameter interest, with momentum thermostat variable maintain stationary...

10.1609/aaai.v30i1.10199 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-02-21

The Hierarchical Graph-Coupled Hidden Markov Model (hGCHMM) is a useful tool for tracking and predicting the spread of contagious diseases, such as influenza, by leveraging social contact data collected from individual wearable devices. However, existing inference algorithms depend on assumption that infection rates are small in probability, typically close to 0. purpose this paper build unified learning framework latent state estimation hGCHMM, regardless rate transition function. We derive...

10.1609/aaai.v30i1.9894 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-03-05

The purpose of this study is to leverage modern technology (mobile or web apps) enrich epidemiology data and infer the transmission disease. We develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) simultaneously track spread infection in a small cell phone community capture person-specific parameters by leveraging link prior that incorporates additional covariates. In paper we investigate two functions, beta-exponential sigmoid link, both which allow development principled...

10.1145/2783258.2783326 article EN 2015-08-07

We propose a new method that uses deep learning techniques to accelerate the popular alternating direction of multipliers (ADMM) solution for inverse problems. The ADMM updates consist proximity operator, least squares regression includes big matrix inversion, and an explicit updating dual variables. Typically, inner loops are required solve first two sub-minimization problems due intractability prior inversion. To avoid such drawbacks or limitations, we inner-loop free update rule with...

10.48550/arxiv.1709.01841 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton can be developed as well. This is accomplished generalizing the gradient computation in stochastic backpropagation via reparametrization trick with lower complexity. As an illustrative example, we apply this approach to problems of Bayesian logistic regression auto-encoder (VAE). Additionally, compute bounds on estimator...

10.48550/arxiv.1509.02866 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Massage therapy (MT) is a useful complementary and alternative widely used in treating low back pain (LBP), including lumbar disc herniation (LDH). However, few studies revealed the quantitative entropy-based features of electroencephalography (EEG) for MT effectiveness LDH patients. This study investigated immediate effects Chinese massage on four EEG rhythms, using eight (approximation entropy (ApEn), Sample Entropy (SampEn), wavelet (WaveEn), Hilbert-Huang Transform Marginal spectrum...

10.1109/access.2020.2964050 article EN cc-by IEEE Access 2020-01-01

Cross-Lingual Summarization (CLS) is a task that extracts important information from source document and summarizes it into summary in another language. It challenging requires system to understand, summarize, translate at the same time, making highly related Monolingual (MS) Machine Translation (MT). In practice, training resources for are far more than cross-lingual monolingual summarization. Thus incorporating corpus CLS would be beneficial its performance. However, present work only...

10.1145/3477495.3532071 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022-07-06

A triple speech translation data comprises speech, transcription, and translation.In the end-to-end paradigm, text machine (MT) usually plays role of a teacher model for (ST) via knowledge distillation. Parameter sharing with is often adopted to construct ST architecture, however, two modalities are independently fed trained different losses. This situation does not match ST’s properties across also limits upper bound performance. Inspired by works video Transformer, we propose simple...

10.18653/v1/2023.acl-short.153 article EN cc-by 2023-01-01
Coming Soon ...