- Enhanced Oil Recovery Techniques
- Topic Modeling
- Reservoir Engineering and Simulation Methods
- Natural Language Processing Techniques
- Hydrocarbon exploration and reservoir analysis
- Hydraulic Fracturing and Reservoir Analysis
- Petroleum Processing and Analysis
- T-cell and Retrovirus Studies
- Multimodal Machine Learning Applications
- Statistical Methods and Inference
- Circular RNAs in diseases
- Cancer-related molecular mechanisms research
- Oil and Gas Production Techniques
- Polymer Foaming and Composites
- Chalcogenide Semiconductor Thin Films
- Quantum Dots Synthesis And Properties
- Visual Attention and Saliency Detection
- Advanced Image and Video Retrieval Techniques
- Injection Molding Process and Properties
- Text Readability and Simplification
- biodegradable polymer synthesis and properties
- MicroRNA in disease regulation
- Advanced Graph Neural Networks
- Advanced Statistical Methods and Models
- Complex Network Analysis Techniques
China University of Petroleum, East China
2016-2025
Qingdao University of Science and Technology
2023-2025
Southeast University
2022-2025
Jilin University
2023-2025
State Key Laboratory of Superhard Materials
2023-2025
Luye Pharma (China)
2025
Zaozhuang University
2015-2024
Sun Yat-sen University Cancer Center
2020-2024
Sun Yat-sen University
2020-2024
Shandong Institute of Business and Technology
2017-2024
Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, huge number of easy-negative overwhelms training. The most commonly used cross entropy (CE) criteria is actually an accuracy-oriented objective, thus creates a discrepancy between training test: at time, each instance contributes equally to objective function, while test time F1 score concerns more about examples. In...
Zijun Sun, Xiaoya Li, Xiaofei Yuxian Meng, Xiang Ao, Qing He, Fei Wu, Jiwei Li. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
In this work, we propose BertGCN, a model that combines large scale pretraining and transductive learning for text classification.Bert-GCN constructs heterogeneous graph over the dataset represents documents as nodes using BERT representations.By jointly training GCN modules within Bert-GCN, proposed is able to leverage advantages of both worlds: large-scale which takes advantage massive amount raw data learns representations unlabeled test by propagating label influence through...
Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its performance NER is still significantly below supervised baselines. This due to gap between two tasks and LLMs: former sequence labeling task in nature while latter text-generation model. In this paper, we propose GPT-NER resolve issue. bridges by transforming generation can be easily adapted LLMs e.g., finding location entities input text "Columbus city" transformed generate...
Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in task text classification.This is due to (1) lack reasoning ability addressing complex linguistic phenomena (e.g., intensification, contrast, irony etc); (2) limited number tokens allowed in-context learning. In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts a progressive strategy tailored involved...
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance capabilities and controllability large language models (LLMs). Instruction refers process further training LLMs on dataset consisting \textsc{(instruction, output)} pairs supervised fashion, which bridges gap between next-word prediction objective users' having adhere human instructions. In this work, we make systematic review literature, including general methodology...
Blending poly(lactic acid) (PLA) with polyhydroxybutyrate-valerate (PHBV) presents a practical approach to producing fully biobased blends tailored material properties and improved foam morphologies. This study investigated the effects of PLA/PHBV blend composition on morphology, as well thermal mechanical properties, both solid microcellular injection molded components. Nitrogen (N2) in supercritical state was used physical blowing agent for molding experiments. Thermal analysis results...
Epidemiological studies suggest that insulin resistance accelerates progression of age-based cognitive impairment, which neuroimaging has linked to brain glucose hypometabolism. As cellular inputs, ketones increase Gibbs free energy change for ATP by 27% compared glucose. Here we test whether dietary changes are capable modulating sustained functional communication between regions (network stability) changing their predominant fuel from ketones. We first established network stability as a...
Segmenting a chunk of text into words is usually the first step processing Chinese text, but its necessity has rarely been explored. In this paper, we ask fundamental question whether word segmentation (CWS) necessary for deep learning-based Natural Language Processing. We benchmark neural word-based models which rely on against char-based do not involve in four end-to-end NLP tasks: language modeling, machine translation, sentence matching/paraphrase and classification. Through direct...
It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of glyph information in those languages. However, due to lack rich pictographic evidence glyphs and weak generalization ability standard computer vision models on character data, an effective way utilize remains be found. In this paper, we address gap by presenting Glyce, glyph-vectors representations. We make three major innovations: (1) historical scripts (e.g., bronzeware script, seal...
Intensity normalization is an important preprocessing step in brain magnetic resonance image (MRI) analysis. During MR acquisition, different scanners or parameters would be used for scanning subjects the same subject at a time, which may result large intensity variations. This variation will greatly undermine performance of subsequent MRI processing and population analysis, such as registration, segmentation, tissue volume measurement. In this work, we proposed new histogram method to...
This paper investigates the problem of network embedding, which aims at learning low-dimensional vector representation nodes in networks. Most existing embedding methods rely solely on structure, i.e., linkage relationships between nodes, but ignore rich content information associated with it, is common real world networks and beneficial to describing characteristics a node. In this paper, we propose content-enhanced (CENE), capable jointly leveraging structure information. Our approach...
With rapid growth in the use of liquid crystal display (LCD) and increasing concerns for environmental protection as well conservation scarce metals such indium, recycling indium from waste LCDs is becoming a hot issue current society. In this study, leaching process exploration its mechanism were carried out with full consideration potential theory experiments. The optimal parameters controlled at <75 μm sample size, 180 min retention time, 50 °C temperature, H2SO4 agent, 100 g/L initial...
In this work, we propose BertGCN, a model that combines large scale pretraining and transductive learning for text classification. BertGCN constructs heterogeneous graph over the dataset represents documents as nodes using BERT representations. By jointly training GCN modules within proposed is able to leverage advantages of both worlds: large-scale which takes advantage massive amount raw data learns representations unlabeled test by propagating label influence through convolution....
Though nearest neighbor Machine Translation (kNN-MT) (CITATION) has proved to introduce significant performance boosts over standard neural MT systems, it is prohibitively slow since uses the entire reference corpus as datastore for search. This means each step beam in search corpus. kNN-MT thus two-orders slower than vanilla models, making hard be applied real-world applications, especially online services. In this work, we propose Fast address issue. constructs a significantly smaller...
Abstract Existing methods to measure sentence similarity are faced with two challenges: (1) labeled datasets usually limited in size, making them insufficient train supervised neural models; and (2) there is a training-test gap for unsupervised language modeling (LM) based models compute semantic scores between sentences, since sentence-level semantics not explicitly modeled at training. This results inferior performances this task. In work, we propose new framework address these issues. The...