NFDI4DS | UHH-SEMS - Publication Details

Kanyao Han

ORCID: 0000-0003-2100-8637

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5048934151

Research Areas

Topic Modeling
Natural Language Processing Techniques
Semantic Web and Ontologies
Advanced Text Analysis Techniques
Computational and Text Analysis Methods
Biomedical Text Mining and Ontologies
Sentiment Analysis and Opinion Mining
Data Quality and Management
Wikis in Education and Collaboration

University of Illinois Urbana-Champaign
2020-2025

Impact Classification within and beyond Academia: Domain-Robust Annotation and the Capacity of Large Language Models

OPENALEX - Publications

María Becker Kanyao Han Rezvaneh Rezapour Jana Diesner Andreas Witt

<title>Abstract</title> Prior analyses and assessments of the impact scientific research has mainly relied on analyzing its scope within academia influence scholarly circles. However, by not considering broader societal, economic, policy implications projects, these studies overlook ways in which discoveries contribute to technological innovation, public health improvements, environmental sustainability, other areas real-world application. We expand upon this prior work developing validating...

10.21203/rs.3.rs-5543205/v1 preprint EN cc-by Research Square (Research Square) 2025-01-23

An expert‐in‐the‐loop method for domain‐specific document categorization based on small training data

OPENALEX - Publications

Kanyao Han Rezvaneh Rezapour Katia Nakamura Dikshya Devkota Daniel C. Miller and 1 more

Abstract Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), enrich the data with information helpful to study substantive questions. Despite a variety newly developed that require substantial amounts annotated data, little is known about how build models when (a) labeling texts categories requires expertise and/or in‐depth reading, (b) only few documents...

10.1002/asi.24714 article EN cc-by Journal of the Association for Information Science and Technology 2022-10-10

Two-Stage Graph-Augmented Summarization of Scientific Documents

OPENALEX - Publications

Rezvaneh Rezapour Yubin Ge Kanyao Han Ray Jeong Jana Diesner

10.18653/v1/2024.nlp4science-1.5 article EN 2024-01-01

SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

OPENALEX - Publications

Zhiwen You Kanyao Han Haotian Zhu Bertram Ludaescher Jana Diesner

10.18653/v1/2024.emnlp-main.350 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

OPENALEX - Publications

Zhiwen You Kanyao Han Haotian Zhu Bertram Ludäscher Jana Diesner

Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models a variety of tasks, including text classification. For multi-class classification prompt-based under low-resource scenarios resulted performance levels comparable to those fully methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space class space, solve problem as masked modeling task. However, cross-domain...

10.48550/arxiv.2410.01946 preprint EN arXiv (Cornell University) 2024-10-02

WikiCSSH - Computer Science Subject Headings from Wikipedia

OPENALEX - Publications

Kanyao Han Pingjing Yang Shubhanshu Mishra Jana Diesner

10.13012/b2idb-0424970_v1 article EN 2020-01-01

WikiCSSH: Extracting and Evaluating Computer Science Subject Headings from Wikipedia

OPENALEX - Publications

Kanyao Han Pingjing Yang Shubhanshu Mishra Jana Diesner

Hierarchical domain-specific classification schemas (or subject heading vocabularies) are often used to identify, classify, and disambiguate concepts that occur in scholarly articles. In this work, we develop, apply, evaluate a human-in-the-loop workflow first extracts an initial category tree from crowd-sourced Wikipedia data, then combines community detection, machine learning, hand-crafted heuristics or rules prune the tree. This work resulted WikiCSSH; large-scale, hierarchically...

10.48550/arxiv.2109.04945 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Coming Soon ...