- Topic Modeling
- Natural Language Processing Techniques
- Web Data Mining and Analysis
- Text and Document Classification Technologies
- Academic integrity and plagiarism
- Spam and Phishing Detection
- Information Retrieval and Search Behavior
- Network Security and Intrusion Detection
- Advanced Text Analysis Techniques
- Authorship Attribution and Profiling
- Imbalanced Data Classification Techniques
- Semantic Web and Ontologies
- Recommender Systems and Techniques
- Handwritten Text Recognition Techniques
- Insect-Plant Interactions and Control
- Rough Sets and Fuzzy Logic
- Complex Network Analysis Techniques
- Advanced Computational Techniques and Applications
- Data Quality and Management
- Artificial Intelligence in Law
- Insect Resistance and Genetics
- Insect and Pesticide Research
- Data Management and Algorithms
- Advanced Algorithms and Applications
- Online Learning and Analytics
Foshan University
2019-2023
Heilongjiang Institute of Technology
2010-2019
State Key Laboratory of Digital Publishing Technology
2018-2019
Chinese Academy of Agricultural Sciences
2016-2018
Institute of Plant Protection
2016-2018
Harbin Institute of Technology
2004-2017
Shandong Provincial Hospital
2017
Shandong University
2017
Shandong Agricultural University
2016
Harbin Engineering University
2015
The cotton bollworm Helicoverpa armigera is a worldwide insect pest with the ability to develop resistance many insecticides. Indoxacarb, sodium channel blocker, an important insecticide that used control H. armigera. Cross-resistance, metabolic mechanisms and life history traits were established for indoxacarb-selected (IND-SEL) population of armigera.After 11 generations selection, susceptibility indoxacarb was decreased by 4.43-fold estimated realized heritability (h2 ) only 0.072....
This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant (LDM), which provides flexible framework incorporate arbitrary features. LDM is different from most existing models in that it takes into account variety of linguistic features are derived the component HMM widely used language modeling approaches IR. Therefore, means melding and generative We present two algorithms parameter learning LDM. One optimize average precision (AP)...
<title>Abstract</title> P-tuning has demonstrated that anchor tokens are beneficial for improving the performance of downstream tasks. However, manual selection manually may result in subjective or suboptimal results. In this paper, we present aCat to automatically select tokens. Following framework soft-hard prompt paradigm, achieves automatic template construction. Experiments conducted on natural language understanding benchmarks demonstrate effectiveness our proposed method. On seven...
Detailed comparison is one important sub-task of external plagiarism detection. Seed heuristic between two documents often used in this task. Vector space model (VSM) and Jaccard coefficient are commonly VSM can produce high recall performance; precision performance. In paper, we propose a hybrid similarity measure on the basis fitting function optimal dividing line none-plagiarism where integrates into unified one, our method make full use advantage coefficient, it extract more reasonable...
Wheat is a major food source throughout the world.  However, biological factors like pests and weeds can lead to lower crop yield. Most protection nowadays involves pesticide herbicides application. This commonly conducted with knapsack in China, which inefficient high labor intensive. Unmanned aerial vehicle (UAV) are an spraying technology recently-developed. Using UAV application more flexible standardized, efficiency 60 times than sprayer. weed management using still challenge.Â...
BLEU is one of the most popular metrics for automatic evaluation machine translation quality. Focusing on its ignorance different effects various units upon quality, this paper extends proper weights to words and n-grams in framework BLEU. The linear regression method adopted capture human perception quality via word types n-gram length. Compared with other linguistic-rich based learning, proposed approach simple largely preserves BLEUpsilas advantage language independence. Experimental...
Providing effective methods of identification high-obfuscation plagiarism seeds presents a significant research problem in the field detection. The conventional detection are based on single type features to capture seeds. But for detection, these not sufficient identifying effectively because varied used plagiarism. This paper multi-features fusion method highobfuscation identification. exploits Logical Regression model integrate lexicon features, syntax semantics and structure which...
In view of the fact that propagation path topology cannot effectively deal with complex social network consists hundreds millions users.More researchers choose to use machine learning methods complete retweet prediction.Those classification method judge whether a message will be retweeted or not.This paper argues prediction should regression analysis problem, not just problem.Through collecting user characteristics on Twitter and selecting some features which have an important impact...
The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of feature weights appeared both messages and ham ones during training period. This paper presents an improved which reduces impact features appearing ones. Byte level n-grams are employed to extract from messages, TONE (train on or near error) adopted, proved effective state-of-the-art filtering system. official runs CEAS (Conference email anti-spam) spam-filter Challenge...
This paper describes the technology and an experiment of subcategorization acquisition for Chinese verbs.The SCF hypotheses are generated by means linguistic heuristic information filtered via statistical methods.Evaluation on 20 multi-pattern verbs shows that our achieved similar precision recall with former researches.Besides, simple application acquired lexicon to a PCFG parser indicates great potentialities in fields NLP.
The goal of predicting query potential for personalization is to determine which queries can benefit from personalization. In this paper, we investigate kind strategy better task: classification or regression. We quantify the benefits personalizing search results using two implicit click-based measures: Click entropy and Potential@N. Meanwhile, are characterized by features history features. Then build C-SVM model epsilon-SVM regression respectively according these measures. experimental...
Logistic average misclassification percentage (lam%) is a key measure for the spam filtering performance. This paper demonstrates that filter can achieve perfect 0.00% in lam%, minimal value theory, by simply setting biased threshold during classifier modeling. At same time, overall classification performance reaches only low accuracy. The result suggests role of lam% evaluation should be re-examined.
This paper addresses the issue of text matching for plagiarism detection. task aims at identifying segments in a pair suspicious document and its source document. All time, heuristic-based methods are mainly utilized to resolve this problem. But heuristics rely on experts' experiences fail integrate more features detect high obfuscation matches. In paper, statistical machine learning approach, named Ranking-based Text Matching Approach Plagiarism Detection, is proposed deal with issues The...
The task of real-time microblog filtering is to decide if the subsequently posted tweets are relevant a given query representing special information needs. filters based on retrieval model or text classification main solutions for this task. To best exploit strengths two models, hybrid using as prior knowledge rectify hyperplane proposed. incorporates language and logistic regression model. Evaluated Text RetriEval Conference (TREC) 2012 track dataset, experimental results show that proposed...
Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the alignment corpus is between English and French or German. And there little specialized dataset Chinese. So our aim to create large-scale, high-precision English-Chinese aligned sentences. Length based method used align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). one largest academic website, contains huge Chinese-English paragraph. Our...
The identification of high-obfuscation plagiarism seeds is one the most difficult problems to be solved in detection. Single feature type cannot identify effectively because varied methods used plagiarism. In this paper, a multi-features fusion method based on Logical Regression model for was proposed. This combine lexicon features, syntax semantics features and structure extracted from suspicious text fragments pairs. Experiments show that feasible effective.