- Advanced Text Analysis Techniques
- Topic Modeling
- Biomedical Text Mining and Ontologies
- scientometrics and bibliometrics research
- Semantic Web and Ontologies
- Scientific Computing and Data Management
- Data Quality and Management
- Complex Network Analysis Techniques
- Knowledge Management and Sharing
- Information Retrieval and Search Behavior
- Natural Language Processing Techniques
- Artificial Intelligence in Healthcare and Education
- Web Data Mining and Analysis
- Advanced Computational Techniques and Applications
- COVID-19 and healthcare impacts
- Ethics in Clinical Research
- Data Visualization and Analytics
- COVID-19 epidemiological studies
- AI in Service Interactions
- Computational and Text Analysis Methods
- Bioinformatics and Genomic Networks
- Speech and dialogue systems
- Software Engineering Research
- Explainable Artificial Intelligence (XAI)
- Multimodal Machine Learning Applications
Wuhan University
2012-2024
Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most current approaches ordering require additional knowledge and similarity calculation. We advocate few-shot curriculum (ICCL), a simple but effective demonstration method ICL, implies gradually increasing complexity prompt demonstrations during inference process. Then we design three experiments to discuss effectiveness...
Abstract Each section header of an article has its distinct communicative function. Citations from sections may be different regarding citing motivation. In this paper, we grouped headers with similar functions as a structural function and defined the distribution citations for paper citation structure. We aim to explore relationship between structure future impact publication disclose relative importance among functions. Specifically, proposed two counting methods life cycle identification...
Abstract Informal knowledge constantly transitions into formal domain in the dynamic base. This article focuses on an integrative understanding of role transition from perspective codification. The process is characterized by several dynamics involving a variety bibliometric entities, such as authors, keywords, institutions, and venues. We thereby designed series temporal cumulative indicators to respectively explore possibility (whether new could be transitioned knowledge) pace (how long it...
The unprecedented COVID-19 outbreak at the end of 2019 has produced a worldwide health crisis. Scientific research, especially international research collaboration, is crucial to deal successfully with epidemic. This article aims review response modes, and collaboration characteristic, academic community similar public events in past. Based on relevant studies four major emergencies past, were regarded as ‘new knowledge’ field. By using knowledge diffusion indicators, such breadth speed...
Abstract Purpose Our study proposes a bootstrapping-based method to automatically extract data-usage statements from academic texts. Design/methodology/approach The for extraction starts with seed entities and iteratively learns patterns unlabeled text. In each iteration, new are constructed added the pattern list based on their calculated score. Three seed-selection strategies also proposed in this paper. Findings performance of is verified by means experiments real data collected computer...
Author identifier (ID) is essential for many downstream tasks, such as co-author network and scientist mobility analysis. As a widely used database, author ID of PubMed not officially provided by National Institutes Health (NIH), that restrict some identifier-based researches or systems. This study exploited three open bibliographic databases Aminer, Microsoft Academic Graph (MAG) Semantic Scholar (S2) to associate PubMed. For this purpose, paper linking was performed in order mine links...
Purpose This paper aims to identify data set entities in scientific literature. To address poor recognition caused by a lack of training corpora existing studies, distant supervised learning-based approach is proposed automatically from large-scale literature an open domain. Design/methodology/approach Firstly, the authors use dictionary combined with bootstrapping strategy create labelled corpus apply learning. Secondly, bidirectional encoder representation transformers (BERT)-based neural...
TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms edges relations between terms. Quite often the relation simple term co-occurrence within fixed window k The output when applied iteratively score for each vertex, i.e. weight, can be information retrieval (IR) just like conventional frequency based weights.
This brief communication finds a clear and universal inequality of authors’ reference reuse behaviour. We observe that few references are reused many times in an author’s oeuvre while most his or her only occur the list for quite limited number times. A power law distribution depicts such inequality. particularly utilise value, [Formula: see text], to characterise nuanced difference inequalities. pilot study based upon Microsoft Academic Graph (MAG) shows text] tends be normally distributed,...
Abstract This study quantifies and analyzes individual-level abilities of scientists from utilizing either an exploration or exploitation strategy. Specifically, we present a Research Strategy Q model, which untangles the coupling effect scientists’ research ability (Qα) strategy (Eαπ) on performance. Qα indicates fundamental to publish high-quality papers, while Eαπ proficiency in terms strategies. Five strategies proposed by our previous are employed. We generate synthetic data collect...
Abstract Topic analysis aims to study topic evolution and trends in order help researchers understand the process of knowledge creation. This paper develops a novel framework, which we use demonstrate, forecast, explain from perspective geometrical motion embeddings generated by pretrained language models. Our dataset comprises approximately 15 million papers computer science field, with 7,000 “fields study” represent topics. First, demonstrated that over 80% topics had undergone obvious...