NFDI4DS | UHH-SEMS - Publication Details

Shu‐Kai Hsieh

ORCID: 0000-0001-9674-1249

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5059604503

Research Areas

Natural Language Processing Techniques
Topic Modeling
Advanced Text Analysis Techniques
Semantic Web and Ontologies
Language, Metaphor, and Cognition
Syntax, Semantics, Linguistic Variation
Speech and dialogue systems
Sentiment Analysis and Opinion Mining
Biomedical Text Mining and Ontologies
Second Language Acquisition and Learning
Lexicography and Language Studies
Translation Studies and Practices
Authorship Attribution and Profiling
Language and cultural evolution
Language, Discourse, Communication Strategies
Text and Document Classification Technologies
Categorization, perception, and language
Text Readability and Simplification
Linguistics, Language Diversity, and Identity
Swearing, Euphemism, Multilingualism
Neurobiology of Language and Bilingualism
Multimodal Machine Learning Applications
linguistics and terminology studies
Reading and Literacy Development
Wikis in Education and Collaboration

Fu Jen Catholic University
2025

Institute of Linguistics, Academia Sinica
2013-2024

National Taiwan University
2014-2024

National Taiwan Normal University
2007-2019

Academia Sinica
2006-2018

National Tsing Hua University
2018

Hong Kong Polytechnic University
2018

Centre National de la Recherche Scientifique
2018

Delft University of Technology
2014-2015

National Ilan University
2007

Reasoning Over the Glyphs: Evaluation of LLM's Decipherment of Rare Scripts

OPENALEX - Publications

Ying-Chun Shih Zhiwei Lin Shu‐Kai Hsieh

We explore the capabilities of LVLMs and LLMs in deciphering rare scripts not encoded Unicode. introduce a novel approach to construct multimodal dataset linguistic puzzles involving such scripts, utilizing tokenization method for language glyphs. Our methods include Picture Method Description LLMs, enabling these models tackle challenges. conduct experiments using prominent models, GPT-4o, Gemini, Claude 3.5 Sonnet, on puzzles. findings reveal strengths limitations current AI decipherment,...

10.48550/arxiv.2501.17785 preprint EN arXiv (Cornell University) 2025-01-29

Probing Large Language Models in Reasoning and Translating Complex Linguistic Puzzles

OPENALEX - Publications

Zhiwei Lin Ying-Chun Shih Shu‐Kai Hsieh

This paper investigates the utilization of Large Language Models (LLMs) for solving complex linguistic puzzles, a domain requiring advanced reasoning and adept translation capabilities akin to human cognitive processes. We explore specific prompting techniques designed enhance ability LLMs reason elucidate their decision-making pathways, with focus on Input-Output Prompting (IO), Chain-of-Thought (CoT), Solo Performance (SPP). Utilizing datasets from Puzzling Machine Competition various...

10.48550/arxiv.2502.00817 preprint EN arXiv (Cornell University) 2025-02-02

Rethinking Chinese word segmentation

OPENALEX - Publications

Chu‐Ren Huang Petr Šimon Shu‐Kai Hsieh Laurent Prévot

This paper addresses two remaining challenges in Chinese word segmentation. The challenge HLT is to find a robust segmentation method that requires no prior lexical knowledge and extensive training adapt new types of data. modelling human cognition acquisition it segment words efficiently without using wordhood. We propose radical meet both challenges. most critical concept we introduce the classification string character-boundaries (CB's) into either word-boundaries (WB's)...

10.3115/1557769.1557791 article EN 2007-01-01

Wiktionary and NLP

OPENALEX - Publications

Enrique Navarro Franck Sajous Bruno Gaume Laurent Prévot Shu‐Kai Hsieh and 3 more

Wiktionary, a satellite of the Wikipedia initiative, can be seen as potential resource for Natural Language Processing. It requires however to processed before being used efficiently an NLP resource. After describing relevant aspects Wiktionary our purposes, we focus on its structural properties. Then, describe how extracted synonymy networks from this We provide in-depth study these and compare them those traditional resources. Finally, two methods semi-automatically improving network by...

10.3115/1699765.1699768 article EN 2009-01-01

Assessing Text Readability Using Hierarchical Lexical Relations Retrieved from WordNet

OPENALEX - Publications

Shu-Yen Lin Cheng-chao Su Yu-Da Lai Li-Chin Yang Shu‐Kai Hsieh

Although some traditional readability formulas have shown high predictive validity in the r=0.8 range and above (Chall & Dale, 1995), they are generally not based on genuine linguistic processing factors, but statistical correlations (Crossley et al., 2008). Improvement of assessment should focus finding variables that truly represent comprehensibility text as well indices accurately measure correlations. In this study, we explore hierarchical relations between lexical items conceptual...

10.30019/ijclclp.200903.0003 article EN 2009-03-01

Exploring interoperability of language resources: the case of cross-lingual semi-automatic enrichment of wordnets

OPENALEX - Publications

Claudia Soria Monica Monachini Francesca Bertagna Nicoletta Calzolari Chu‐Ren Huang and 3 more

10.1007/s10579-009-9082-3 article EN Language Resources and Evaluation 2009-02-10

Religion, cognition, and emotion: What can automated text analysis tell us about culture?

OPENALEX - Publications

Louise Sundararajan Rachel Sing‐Kiat Ting Shu‐Kai Hsieh Seong‐Hyeon Kim

As cultural conflicts are intensifying locally and internationally in the aftermath of COVID-19 pandemic, fine-tuned investigation culture/religion, especially that marginalized populations, holds potential to reduce disparity suffering global village This study used 3 textual analysis programs-Topic Modeling, C-LIWC, SSWC-Chinese-to shed light on differences cognition emotion between two communities with radically different religious beliefs (Bimo Christianity) among Yi ethnic minority...

10.1037/hum0000201 article EN The Humanistic Psychologist 2020-11-12

CWN-LMF

OPENALEX - Publications

Lung‐Hao Lee Shu‐Kai Hsieh Chu‐Ren Huang

Lexical Markup Framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for construction of natural language processing lexicons.LMF facilitates data exchange among computational linguistic resources, and also promises convenient uniformity future application.This study describes design implementation WordNet-LMF used to represent lexical semantics in Chinese WordNet.The compiled CWN-LMF will be released community researches.

10.3115/1690299.1690317 article EN 2009-01-01

Neuro-Cognitive Differences in Semantic Processing Between Native Speakers and Proficient Learners of Mandarin Chinese

OPENALEX - Publications

Marco Lai Shu‐Kai Hsieh Chia‐Lin Lee Lily I-wen Su Te-Hsin Liu and 3 more

The present study aimed to investigate the neural mechanism underlying semantic processing in Mandarin Chinese adult learners, focusing on learners who were Indo-European language speakers with advanced levels of proficiency Chinese. We used functional magnetic resonance imaging technique and a judgment task test 24 (L2 group) 26 native (L1 as control group. In task, participants asked indicate whether two-character pairs related meaning. Compared L1 group, L2 group had greater activation...

10.3389/fpsyg.2021.781304 article EN cc-by Frontiers in Psychology 2021-11-18

Coming Soon ...