Jian Yang

ORCID: 0000-0003-1983-012X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Anomaly Detection Techniques and Applications
  • Advanced Graph Neural Networks
  • Network Security and Intrusion Detection
  • Reservoir Engineering and Simulation Methods
  • Advanced Computational Techniques and Applications
  • Library Science and Administration
  • Hydraulic Fracturing and Reservoir Analysis
  • Speech Recognition and Synthesis
  • Educational Technology and Pedagogy
  • Data Quality and Management
  • Oil and Gas Production Techniques
  • Speech and dialogue systems
  • Language, Linguistics, Cultural Analysis
  • Neural Networks and Applications
  • Advanced Text Analysis Techniques
  • Domain Adaptation and Few-Shot Learning
  • Text Readability and Simplification
  • Subtitles and Audiovisual Media
  • Linguistic Studies and Language Acquisition
  • Text and Document Classification Technologies
  • Adversarial Robustness in Machine Learning
  • Complex Network Analysis Techniques

Nanjing University of Science and Technology
2018-2025

Research Institute of Petroleum Exploration and Development
2020-2025

Beihang University
2011-2024

Dalian Maritime University
2014-2024

Second Affiliated Hospital of Chengdu University of Traditional Chinese
2024

Chengdu University of Traditional Chinese Medicine
2024

Chengdu Fifth People's Hospital
2024

Hebei University of Engineering
2024

First People's Hospital of Chongqing
2023

Beijing University of Civil Engineering and Architecture
2023

Language model pre-training has achieved success in many natural language processing tasks. Existing methods for cross-lingual adopt Translation Model to predict masked words with the concatenation of source sentence and its target equivalent. In this work, we introduce a novel method, called Alternating Modeling (ALM). It code-switches sentences different languages rather than simple concatenation, hoping capture rich context phrases. More specifically, randomly substitute phrases...

10.1609/aaai.v34i05.6480 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora, especially for non-English data. While prior efforts mainly focus on data-driven transfer methods, a significant aspect that has not been fully explored is aligning both semantic and token-level representations across diverse languages. In this paper, we propose Multi-view Contrastive Learning Named Entity Recognition (MCL-NER). Specifically,...

10.1609/aaai.v38i17.29843 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Recognition of multi‐function radar (MFR) work mode in an input pulse sequence is a fundamental task to interpret the functions and behaviour MFR. There are three major challenges that must be addressed: (i) The received pulses stream may contain unknown number multiple class segments. (ii) intra‐mode inter‐mode knowledge modern MFR too flexible complicated represented learned through traditional hand‐crafted features learning models. (iii) variable duration each enclosed makes...

10.1049/iet-rsn.2020.0060 article EN IET Radar Sonar & Navigation 2020-04-30

Our previous findings confirmed the high enrichment of Bacteroides fragilis (BF) in fecal samples from patients with colorectal cancer (CRC). The intestinal mucosal barrier is first defense organism against commensal flora and pathogens closely associated occurrence development CRC. Therefore, this study aimed to investigate molecular mechanisms through which BF mediates injury CRC progression. SW480 cells a Caco2 model were treated entero-toxigenic (ETBF), its enterotoxin (B. toxin, BFT),...

10.1080/15384101.2024.2309005 article EN cc-by-nc-nd Cell Cycle 2024-01-02

Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data variant domains, retraining whole network unknown domains inefficient real industrial scenarios. However, previous deep models merely focused on extracting semantics sequences same domain, leading to poor generalization multi-domain logs. To alleviate this issue, we propose unified Transformer-based framework (LogFormer) improve ability across different where...

10.1609/aaai.v38i1.27764 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Aspect based sentiment analysis (ABSA) is a fine-grained task, whose main goal to identify the polarity of an aspect in sentence. A sentence may contain many different aspects, each which have polarities. Based on current researches this area, ABSA can be divided into two subtasks: aspect-category (ACSA) and aspect-term (ATSA). In past, more commonly used method adopt time serial algorithm such as Long Short-Term Memory (LSTM) or Recurrent Neural Network (RNN), usually needs training has...

10.1109/icsess49938.2020.9237640 article EN 2020-10-16

Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in large language models and improve variety of downstream tasks. CoT mainly demonstrates excellent performance English, but its usage low-resource languages is constrained due poor generalization. To bridge the gap among different languages, we propose cross-lingual instruction fine-tuning framework (xCOT) transfer knowledge from high-resource languages. Specifically, multilingual training data (xCOT-INSTRUCT)...

10.48550/arxiv.2401.07037 preprint EN other-oa arXiv (Cornell University) 2024-01-01

This study aimed to analyze the relationship between sleep quality of healthcare professionals and incidence overweight obesity, exploring potential impact on onset obesity in order provide a scientific basis for formulating effective health intervention measures.

10.3389/fpubh.2024.1390643 article EN cc-by Frontiers in Public Health 2024-05-30

Rotated object detection has made significant progress in the optical remote sensing. However, advancements Synthetic Aperture Radar (SAR) field are laggard behind, primarily due to absence of a large-scale dataset. Annotating such dataset is inefficient and costly. A promising solution employ weakly supervised model (e.g., trained with available horizontal boxes only) generate pseudo-rotated for reference before manual calibration. Unfortunately, existing models exhibit limited accuracy...

10.48550/arxiv.2501.04440 preprint EN arXiv (Cornell University) 2025-01-08

Depth completion endeavors to reconstruct a dense depth map from sparse measurements, leveraging the information provided by corresponding color image. Existing approaches mostly hinge on single-scale propagation strategies that iteratively ameliorate initial coarse estimates through pixel-level message passing. Despite their commendable outcomes, these techniques are frequently hampered computational inefficiencies and limited grasp of scene context. To circumvent challenges, we introduce...

10.48550/arxiv.2502.07289 preprint EN arXiv (Cornell University) 2025-02-11

Abstract The injected gas override and cap coning cause severe breakthrough issue in horizontal wells, have an adverse impact on the sweep efficiency stable production. Statistics show that over 20% of producers a drive reservoir with Middle East experience fast production decline because gas-oil ratio (GOR) up to 4 ~ 5 Mscf/bbl, some wells are even shut down due violation management guideline. quantitative identification operational effects is key optimization injection strategies. A...

10.2523/iptc-25031-ms article EN International Petroleum Technology Conference 2025-02-17

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities LLMs many these fields-particularly light industry, agriculture, service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that...

10.48550/arxiv.2502.14739 preprint EN arXiv (Cornell University) 2025-02-20

Accurately predicting protein structure, from sequences to 3D structures, is of great significance in biological research. To tackle this issue, a representative deep big model, RoseTTAFold, proposed with promising success. Here, "a light-weight graph network, named LightRoseTTA," reported achieve accurate and highly efficient prediction for proteins. Notably, three highlights are possessed by LightRoseTTA: i) high-accurate structure proteins, being "competitive RoseTTAFold" on multiple...

10.1002/advs.202309051 article EN cc-by Advanced Science 2025-03-25

Abstract: The literature on China English available seems to focus mostly the attitudes toward English, use of or EFL industry in this country. Lexical borrowing as part nativization has rarely been investigated. This paper presents a data-based analysis 59 borrowed lexical items found 84 articles from two newspapers China, including both loanwords and loan translations. On whole these do not seem be widespread use. Additionally, findings show that tend culture-specific items, nonce...

10.1111/j.0883-2919.2005.00424.x article EN World Englishes 2005-11-22

Just how many millions are there? China’s huge English-knowing population of 200–350 million is often cited as evidence the language being nativized in world’s most populous country. We may note, however, that words user and learner used interchangeably reference to its speakers English. When however focus on nativization English China, a country Kachru’s ‘Expanding Circle’ Englishes, it imperative distinguish between users learners language. Kachru points out institutionalized varieties...

10.1017/s0266078406002021 article EN English Today 2006-04-01

Although neural machine translation (NMT) has achieved significant progress in recent years, most previous NMT models only depend on the source text to generate translation. Inspired by success of template-based and syntax-based approaches other fields, we propose use extracted templates from tree structures as soft target guide procedure. In order learn syntactic structure sentences, adopt constituency-based parse candidate templates. We incorporate template information into encoder-decoder...

10.18653/v1/2020.acl-main.531 article EN cc-by 2020-01-01

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as modeling and machine translation. However, when the vocabulary is large, RNN model will become very big (e.g., possibly beyond memory capacity of a GPU device) its training inefficient. In this work, we propose novel technique to tackle challenge. The key idea use 2-Component (2C) shared embedding for word representations. We allocate every into table, each row...

10.48550/arxiv.1610.09893 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Pre-trained language models learn informative word representations on a large-scale text corpus through self-supervised learning, which has achieved promising performance in fields of natural processing (NLP) after fine-tuning. These models, however, suffer from poor robustness and lack interpretability. We refer to pre-trained with knowledge injection as knowledge-enhanced (KEPLMs). demonstrate deep understanding logical reasoning introduce In this survey, we provide comprehensive overview...

10.48550/arxiv.2110.00269 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation. However, vanilla mainly exploits the top-layer representation, assuming lower layers provide trivial or redundant information thus ignoring bottom-layer feature that is potentially valuable. In this work, we propose Group-Transformer model (GTrans) flexibly divides multi-layer representations both into different groups then fuses these group...

10.1109/taslp.2022.3221040 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2022-11-10
Coming Soon ...