Biao Zhang

ORCID: 0000-0002-4865-7090
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Speech Recognition and Synthesis
  • Text Readability and Simplification
  • Handwritten Text Recognition Techniques
  • Hand Gesture Recognition Systems
  • Advanced Text Analysis Techniques
  • Neural Networks and Applications
  • Machine Learning and Algorithms
  • Advanced Clustering Algorithms Research
  • Human Pose and Action Recognition
  • Educational Technology and Pedagogy
  • Hearing Impairment and Communication
  • Domain Adaptation and Few-Shot Learning
  • E-commerce and Technology Innovations
  • Cancer-related molecular mechanisms research
  • Time Series Analysis and Forecasting
  • Text and Document Classification Technologies
  • Translation Studies and Practices
  • Network Security and Intrusion Detection
  • Subtitles and Audiovisual Media
  • Explainable Artificial Intelligence (XAI)
  • Bayesian Methods and Mixture Models
  • Algorithms and Data Compression

Soochow University
2016-2024

Nanjing Normal University
2024

University of Edinburgh
2019-2023

Hefei University of Technology
2007-2023

China West Normal University
2023

Anhui University of Science and Technology
2020-2022

Harbin University of Science and Technology
2021-2022

Tencent (China)
2021

Xiamen University
2015-2020

Beijing Advanced Sciences and Innovation Center
2018

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that NMT requires stronger modeling capacity support language pairs with varying typological characteristics, overcome bottleneck via language-specific components deepening architectures. identify the off-target issue (i.e. translating into a wrong target language) as...

10.18653/v1/2020.acl-main.148 preprint EN cc-by 2020-01-01

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, underperforming phrase-based statistical (PBSMT) and requiring large amounts auxiliary data to achieve competitive results. In this paper, we re-assess validity these results, arguing they are result lack system adaptation settings. We discuss some pitfalls be aware when training NMT systems, recent techniques have especially helpful settings, resulting a set best practices...

10.18653/v1/p19-1021 preprint EN cc-by 2019-01-01

Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study strategies translation, examining various factors prompt template and demonstration example selection. further explore use of monolingual data feasibility cross-lingual, cross-domain, sentence-to-document transfer learning prompting. Extensive experiments...

10.48550/arxiv.2301.07069 preprint EN other-oa arXiv (Cornell University) 2023-01-01

With parallelizable attention networks, the neural Transformer is very fast to train. However, due auto-regressive architecture and self-attention in decoder, decoding procedure becomes slow. To alleviate this issue, we propose an average network as alternative decoder of Transformer. The consists two layers, with layer that models dependencies on previous positions a gating stacked over enhance expressiveness proposed network. We apply part replace original target-side model. masking tricks...

10.18653/v1/p18-1166 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018-01-01

Deepening neural models has been proven very successful in improving the model's capacity when solving complex learning tasks, such as machine translation task. Previous efforts on deep mainly focus encoder and decoder, while little attention mechanism. However, mechanism is of vital importance to induce correspondence between different languages where shallow networks are relatively insufficient, especially decoder deep. In this paper, we propose a model (DeepAtt). Based low-level...

10.1109/tpami.2018.2876404 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-10-16

Neural machine translation (NMT) heavily relies on context vectors generated by an attention network to predict target words. In practice, we observe that the for different words are quite similar one another and translations with such nondiscriminatory tend be degenerative. We ascribe this similarity invariant source representations lack dynamics across decoding steps. article, propose a novel gated recurrent unit (GRU)-gated model (GAtt) NMT. By updating previous decoder state via GRU,...

10.1109/tnnls.2019.2957276 article EN IEEE Transactions on Neural Networks and Learning Systems 2020-01-01

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that NMT requires stronger modeling capacity support language pairs with varying typological characteristics, overcome bottleneck via language-specific components deepening architectures. identify the off-target issue (i.e. translating into a wrong target language) as...

10.48550/arxiv.2004.11867 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Biao Zhang, Ivan Titov, Rico Sennrich. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1083 article EN cc-by 2019-01-01

Partially inspired by successful applications of variational recurrent neural networks, we propose a novel machine translation (VRNMT) model in this paper. Different from the NMT, VRNMT introduces series latent random variables to procedure sentence generative way, instead single variable. Specifically, are included into hidden states NMT decoder with elements autoencoder. In these recurrently generated, which enables them further capture strong and complex dependencies among output...

10.1609/aaai.v32i1.11985 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-27

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks help stabilize training and boost model convergence because of its capability in handling re-centering re-scaling both inputs weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive significantly slows underlying network, e.g. RNN particular. In this paper, we hypothesize that invariance is dispensable propose root mean square layer...

10.48550/arxiv.1910.07467 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Neural machine translation (NMT) heavily relies on its encoder to capture the underlying meaning of a source sentence so as generate faithful translation. However, most NMT encoders are built upon either unidirectional or bidirectional recurrent neural networks, which do not deal with future context simply concatenate history and form context-dependent word representations, implicitly assuming independence two types contextual information. In this paper, we propose novel context-aware...

10.1109/taslp.2017.2751420 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-09-11

While large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications, our understanding on the inductive biases (especially scaling properties) of different methods is still limited. To fill this gap, we conduct systematic experiments studying whether and how factors, including LLM model size, pretraining data new parameter size affect performance. We consider two types -- full-model tuning (FMT) efficient (PET, prompt LoRA), explore behaviors in...

10.48550/arxiv.2402.17193 preprint EN arXiv (Cornell University) 2024-02-26

Models of neural machine translation are often from a discriminative family encoderdecoders that learn conditional distribution target sentence given source sentence. In this paper, we propose variational model to for translation: encoderdecoder can be trained end-to-end. Different the vanilla encoder-decoder generates translations hidden representations sentences alone, introduces continuous latent variable explicitly underlying semantics and guide generation translations. order perform...

10.48550/arxiv.1605.07869 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Although future context is widely regarded useful for word prediction in machine translation, it quite difficult practice to incorporate into neural translation. In this paper, we propose a future-aware knowledge distillation framework (FKD) address issue. the FKD framework, learn distill from backward language model (teacher) vectors (student) during training phase. The vector each position computed bridge network and optimized towards corresponding hidden state via mechanism. We further an...

10.1109/taslp.2019.2946480 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2019-10-09

End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or decoder using source transcripts via speech recognition or text tasks, without which performance drops substantially. However, are not always available, and how significant such is for E2E ST has rarely been studied in the literature. In this paper, we revisit question explore extent to quality of trained speech-translation pairs alone can be improved. We reexamine several techniques proven...

10.48550/arxiv.2206.04571 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Despite recent successes with neural models for sign language translation (SLT), quality still lags behind spoken languages because of the data scarcity and modality gap between video text. To address both problems, we investigate strategies cross-modality representation sharing SLT. We propose SLTUNET, a simple unified model designed to support multiple SLTrelated tasks jointly, such as sign-to-gloss, gloss-to-text sign-to-text translation. Jointly modeling different endows SLTUNET...

10.48550/arxiv.2305.01778 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Neural machine translation (NMT) heavily relies on an attention network to produce a context vector for each target word prediction. In practice, we find that vectors different words are quite similar one another and therefore insufficient in discriminatively predicting words. The reason this might be produced by the vanilla just weighted sum of source representations invariant decoder states. paper, propose novel GRU-gated model (GAtt) NMT which enhances degree discrimination enabling...

10.48550/arxiv.1704.08430 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Exploiting semantic interactions between the source and target linguistic items at different levels of granularity is crucial for generating compact vector representations bilingual phrases. To achieve this, we propose alignment-supervised bidimensional attention-based recursive autoencoders (ABattRAE) in this paper. ABattRAE first individually employs two to recover hierarchical tree structures phrase, treats subphrase covered by each node on as a item. Unlike previous methods, introduces...

10.1109/tcyb.2018.2868982 article EN IEEE Transactions on Cybernetics 2018-09-27

Abstract Marine mammals, especially cetaceans, have evolved a very special form of sleep characterized by unihemispheric slow-wave (USWS) and negligible amount or complete absence rapid-eye-movement sleep; however, the underlying genetic mechanisms remain unclear. Here, we detected unique, significant selection signatures in basic helix-loop-helix ARNT like 2 (BMAL2; also called ARNTL2), key circadian regulator, marine mammal lineages, identified two nonsynonymous amino acid substitutions...

10.1093/sleep/zsae018 article EN cc-by-nc SLEEP 2024-01-30

Building a scalable and real-time recommendation system is vital for many businesses driven by time-sensitive customer feedback, such as short-videos ranking or online ads. Despite the ubiquitous adoption of production-scale deep learning frameworks like TensorFlow PyTorch, these general-purpose fall short business demands in scenarios various reasons: on one hand, tweaking systems based static parameters dense computations with dynamic sparse features detrimental to model quality; other are...

10.48550/arxiv.2209.07663 preprint EN cc-by arXiv (Cornell University) 2022-01-01

10.18653/v1/d15-1146 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Implicit discourse relation recognition is a crucial component for automatic discourselevel analysis and nature language understanding.Previous studies exploit discriminative models that are built on either powerful manual features or deep representations.In this paper, instead, we explore generative propose variational neural recognizer.We refer to model as VarNDRR.VarNDRR establishes directed probabilistic with latent continuous variable generates both the between two arguments of...

10.18653/v1/d16-1037 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks help stabilize training and boost model convergence because of its capability in handling re-centering re-scaling both inputs weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive significantly slows underlying network, e.g. RNN particular. In this paper, we hypothesize that invariance is dispensable propose root mean square layer...

10.5167/uzh-177483 article EN Neural Information Processing Systems 2019-12-14

Biao Zhang, Ivan Titov, Barry Haddow, Rico Sennrich. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.200 article EN cc-by 2021-01-01
Coming Soon ...