Shu‐Kai Hsieh

ORCID: 0000-0001-9674-1249
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Advanced Text Analysis Techniques
  • Semantic Web and Ontologies
  • Language, Metaphor, and Cognition
  • Syntax, Semantics, Linguistic Variation
  • Speech and dialogue systems
  • Sentiment Analysis and Opinion Mining
  • Biomedical Text Mining and Ontologies
  • Second Language Acquisition and Learning
  • Lexicography and Language Studies
  • Translation Studies and Practices
  • Authorship Attribution and Profiling
  • Language and cultural evolution
  • Language, Discourse, Communication Strategies
  • Text and Document Classification Technologies
  • Categorization, perception, and language
  • Text Readability and Simplification
  • Linguistics, Language Diversity, and Identity
  • Swearing, Euphemism, Multilingualism
  • Neurobiology of Language and Bilingualism
  • Multimodal Machine Learning Applications
  • linguistics and terminology studies
  • Reading and Literacy Development
  • Wikis in Education and Collaboration

Fu Jen Catholic University
2025

Institute of Linguistics, Academia Sinica
2013-2024

National Taiwan University
2014-2024

National Taiwan Normal University
2007-2019

Academia Sinica
2006-2018

National Tsing Hua University
2018

Hong Kong Polytechnic University
2018

Centre National de la Recherche Scientifique
2018

Delft University of Technology
2014-2015

National Ilan University
2007

We explore the capabilities of LVLMs and LLMs in deciphering rare scripts not encoded Unicode. introduce a novel approach to construct multimodal dataset linguistic puzzles involving such scripts, utilizing tokenization method for language glyphs. Our methods include Picture Method Description LLMs, enabling these models tackle challenges. conduct experiments using prominent models, GPT-4o, Gemini, Claude 3.5 Sonnet, on puzzles. findings reveal strengths limitations current AI decipherment,...

10.48550/arxiv.2501.17785 preprint EN arXiv (Cornell University) 2025-01-29

This paper investigates the utilization of Large Language Models (LLMs) for solving complex linguistic puzzles, a domain requiring advanced reasoning and adept translation capabilities akin to human cognitive processes. We explore specific prompting techniques designed enhance ability LLMs reason elucidate their decision-making pathways, with focus on Input-Output Prompting (IO), Chain-of-Thought (CoT), Solo Performance (SPP). Utilizing datasets from Puzzling Machine Competition various...

10.48550/arxiv.2502.00817 preprint EN arXiv (Cornell University) 2025-02-02

This paper addresses two remaining challenges in Chinese word segmentation. The challenge HLT is to find a robust segmentation method that requires no prior lexical knowledge and extensive training adapt new types of data. modelling human cognition acquisition it segment words efficiently without using wordhood. We propose radical meet both challenges. most critical concept we introduce the classification string character-boundaries (CB's) into either word-boundaries (WB's)...

10.3115/1557769.1557791 article EN 2007-01-01

Wiktionary, a satellite of the Wikipedia initiative, can be seen as potential resource for Natural Language Processing. It requires however to processed before being used efficiently an NLP resource. After describing relevant aspects Wiktionary our purposes, we focus on its structural properties. Then, describe how extracted synonymy networks from this We provide in-depth study these and compare them those traditional resources. Finally, two methods semi-automatically improving network by...

10.3115/1699765.1699768 article EN 2009-01-01

Although some traditional readability formulas have shown high predictive validity in the r=0.8 range and above (Chall & Dale, 1995), they are generally not based on genuine linguistic processing factors, but statistical correlations (Crossley et al., 2008). Improvement of assessment should focus finding variables that truly represent comprehensibility text as well indices accurately measure correlations. In this study, we explore hierarchical relations between lexical items conceptual...

10.30019/ijclclp.200903.0003 article EN 2009-03-01

As cultural conflicts are intensifying locally and internationally in the aftermath of COVID-19 pandemic, fine-tuned investigation culture/religion, especially that marginalized populations, holds potential to reduce disparity suffering global village This study used 3 textual analysis programs-Topic Modeling, C-LIWC, SSWC-Chinese-to shed light on differences cognition emotion between two communities with radically different religious beliefs (Bimo Christianity) among Yi ethnic minority...

10.1037/hum0000201 article EN The Humanistic Psychologist 2020-11-12

Lexical Markup Framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for construction of natural language processing lexicons.LMF facilitates data exchange among computational linguistic resources, and also promises convenient uniformity future application.This study describes design implementation WordNet-LMF used to represent lexical semantics in Chinese WordNet.The compiled CWN-LMF will be released community researches.

10.3115/1690299.1690317 article EN 2009-01-01

The present study aimed to investigate the neural mechanism underlying semantic processing in Mandarin Chinese adult learners, focusing on learners who were Indo-European language speakers with advanced levels of proficiency Chinese. We used functional magnetic resonance imaging technique and a judgment task test 24 (L2 group) 26 native (L1 as control group. In task, participants asked indicate whether two-character pairs related meaning. Compared L1 group, L2 group had greater activation...

10.3389/fpsyg.2021.781304 article EN cc-by Frontiers in Psychology 2021-11-18
Coming Soon ...