Kanyao Han

ORCID: 0000-0003-2100-8637
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Semantic Web and Ontologies
  • Advanced Text Analysis Techniques
  • Computational and Text Analysis Methods
  • Biomedical Text Mining and Ontologies
  • Sentiment Analysis and Opinion Mining
  • Data Quality and Management
  • Wikis in Education and Collaboration

University of Illinois Urbana-Champaign
2020-2025

<title>Abstract</title> Prior analyses and assessments of the impact scientific research has mainly relied on analyzing its scope within academia influence scholarly circles. However, by not considering broader societal, economic, policy implications projects, these studies overlook ways in which discoveries contribute to technological innovation, public health improvements, environmental sustainability, other areas real-world application. We expand upon this prior work developing validating...

10.21203/rs.3.rs-5543205/v1 preprint EN cc-by Research Square (Research Square) 2025-01-23

Abstract Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), enrich the data with information helpful to study substantive questions. Despite a variety newly developed that require substantial amounts annotated data, little is known about how build models when (a) labeling texts categories requires expertise and/or in‐depth reading, (b) only few documents...

10.1002/asi.24714 article EN cc-by Journal of the Association for Information Science and Technology 2022-10-10

10.18653/v1/2024.emnlp-main.350 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models a variety of tasks, including text classification. For multi-class classification prompt-based under low-resource scenarios resulted performance levels comparable to those fully methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space class space, solve problem as masked modeling task. However, cross-domain...

10.48550/arxiv.2410.01946 preprint EN arXiv (Cornell University) 2024-10-02

Hierarchical domain-specific classification schemas (or subject heading vocabularies) are often used to identify, classify, and disambiguate concepts that occur in scholarly articles. In this work, we develop, apply, evaluate a human-in-the-loop workflow first extracts an initial category tree from crowd-sourced Wikipedia data, then combines community detection, machine learning, hand-crafted heuristics or rules prune the tree. This work resulted WikiCSSH; large-scale, hierarchically...

10.48550/arxiv.2109.04945 preprint EN cc-by arXiv (Cornell University) 2021-01-01
Coming Soon ...