- Natural Language Processing Techniques
- Topic Modeling
- Advanced Text Analysis Techniques
- Semantic Web and Ontologies
- Language, Metaphor, and Cognition
- Syntax, Semantics, Linguistic Variation
- Speech and dialogue systems
- Sentiment Analysis and Opinion Mining
- Biomedical Text Mining and Ontologies
- Second Language Acquisition and Learning
- Lexicography and Language Studies
- Translation Studies and Practices
- Authorship Attribution and Profiling
- Language and cultural evolution
- Language, Discourse, Communication Strategies
- Text and Document Classification Technologies
- Categorization, perception, and language
- Text Readability and Simplification
- Linguistics, Language Diversity, and Identity
- Swearing, Euphemism, Multilingualism
- Neurobiology of Language and Bilingualism
- Multimodal Machine Learning Applications
- linguistics and terminology studies
- Reading and Literacy Development
- Wikis in Education and Collaboration
Fu Jen Catholic University
2025
Institute of Linguistics, Academia Sinica
2013-2024
National Taiwan University
2014-2024
National Taiwan Normal University
2007-2019
Academia Sinica
2006-2018
National Tsing Hua University
2018
Hong Kong Polytechnic University
2018
Centre National de la Recherche Scientifique
2018
Delft University of Technology
2014-2015
National Ilan University
2007
We explore the capabilities of LVLMs and LLMs in deciphering rare scripts not encoded Unicode. introduce a novel approach to construct multimodal dataset linguistic puzzles involving such scripts, utilizing tokenization method for language glyphs. Our methods include Picture Method Description LLMs, enabling these models tackle challenges. conduct experiments using prominent models, GPT-4o, Gemini, Claude 3.5 Sonnet, on puzzles. findings reveal strengths limitations current AI decipherment,...
This paper investigates the utilization of Large Language Models (LLMs) for solving complex linguistic puzzles, a domain requiring advanced reasoning and adept translation capabilities akin to human cognitive processes. We explore specific prompting techniques designed enhance ability LLMs reason elucidate their decision-making pathways, with focus on Input-Output Prompting (IO), Chain-of-Thought (CoT), Solo Performance (SPP). Utilizing datasets from Puzzling Machine Competition various...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge HLT is to find a robust segmentation method that requires no prior lexical knowledge and extensive training adapt new types of data. modelling human cognition acquisition it segment words efficiently without using wordhood. We propose radical meet both challenges. most critical concept we introduce the classification string character-boundaries (CB's) into either word-boundaries (WB's)...
Wiktionary, a satellite of the Wikipedia initiative, can be seen as potential resource for Natural Language Processing. It requires however to processed before being used efficiently an NLP resource. After describing relevant aspects Wiktionary our purposes, we focus on its structural properties. Then, describe how extracted synonymy networks from this We provide in-depth study these and compare them those traditional resources. Finally, two methods semi-automatically improving network by...
Although some traditional readability formulas have shown high predictive validity in the r=0.8 range and above (Chall & Dale, 1995), they are generally not based on genuine linguistic processing factors, but statistical correlations (Crossley et al., 2008). Improvement of assessment should focus finding variables that truly represent comprehensibility text as well indices accurately measure correlations. In this study, we explore hierarchical relations between lexical items conceptual...
As cultural conflicts are intensifying locally and internationally in the aftermath of COVID-19 pandemic, fine-tuned investigation culture/religion, especially that marginalized populations, holds potential to reduce disparity suffering global village This study used 3 textual analysis programs-Topic Modeling, C-LIWC, SSWC-Chinese-to shed light on differences cognition emotion between two communities with radically different religious beliefs (Bimo Christianity) among Yi ethnic minority...
Lexical Markup Framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for construction of natural language processing lexicons.LMF facilitates data exchange among computational linguistic resources, and also promises convenient uniformity future application.This study describes design implementation WordNet-LMF used to represent lexical semantics in Chinese WordNet.The compiled CWN-LMF will be released community researches.
The present study aimed to investigate the neural mechanism underlying semantic processing in Mandarin Chinese adult learners, focusing on learners who were Indo-European language speakers with advanced levels of proficiency Chinese. We used functional magnetic resonance imaging technique and a judgment task test 24 (L2 group) 26 native (L1 as control group. In task, participants asked indicate whether two-character pairs related meaning. Compared L1 group, L2 group had greater activation...