NFDI4DS | UHH-SEMS - Publication Details

Yuan Sun

ORCID: 0000-0003-0565-9659

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101731298

Research Areas

Natural Language Processing Techniques
Topic Modeling
Advanced Text Analysis Techniques
Advanced Computational Techniques and Applications
Educational Technology and Assessment
Semantic Web and Ontologies
Web Data Mining and Analysis
Leaf Properties and Growth Measurement
Text and Document Classification Technologies
Advanced Neural Network Applications
Multimodal Machine Learning Applications
Remote Sensing in Agriculture
Data Quality and Management
Advanced Graph Neural Networks
Data Management and Algorithms
China's Ethnic Minorities and Relations
Face recognition and analysis
AI in cancer detection
Mobile Agent-Based Network Management
Neural Networks and Applications
Remote Sensing and LiDAR Applications
Remote Sensing and Land Use
Digital Imaging for Blood Diseases
Petri Nets in System Modeling
Speech Recognition and Synthesis

Minzu University of China
2014-2025

People’s Hospital of Rizhao
2023

Tongji Zhejiang College
2019

Tsinghua University
2003

Convolutional neural network‐based multi‐label classification of PCB defects

OPENALEX - Publications

Zhang Li Yongqing Jin Xuesong Yang Xia Li Xiaodong Duan and 2 more

Due to the rapid development of printed circuit board (PCB) design technology, inspection PCB surface defects has become an increasingly critical issue. The classification facilitates root causes detects' identification. As may be intensive, actual should not considered as a binary or multi-category problem. This type problem is called multi-label Recently, one deep learning frameworks, convolutional neural network (CNN) major breakthrough in many areas image processing, especially...

10.1049/joe.2018.8279 article EN cc-by The Journal of Engineering 2018-08-18

Tibetan Question Generation Based on Key Sentence and Knowledge Graph

OPENALEX - Publications

Yan Zhuang Yuan Sun Yijie Li Sisi Liu Xiaobing Zhao

Question generation aims to generate questions according the given context and answer, it has made significant progress in both Chinese English languages. However, research on Tibetan question is still early stages, with key challenges including omission of crucial keywords that render unanswerable. Existing large-scale models do not provide robust support for low-resource languages, such as GPT or BERT. To solve problem, this paper proposes based sentences knowledge graph. The generator...

10.1145/3725531 article EN ACM Transactions on Asian and Low-Resource Language Information Processing 2025-03-25

Lenet-5 Convolution Neural Network with Mish Activation Function and Fixed Memory Step Gradient Descent Method

OPENALEX - Publications

Zhihao Zhang Zan Yang Yuan Sun Yang-Fan Wu Yidan Xing

Convolutional neural network is the most important algorithm in field of deep learning. The traditional convolution usually uses Sigmoid or Relu as activation function, but two sides are saturated, and has a dead zone, which very easy to cause gradient disappearance explosion. In this paper, Mish function introduced into LENET-5 convolutional network, overcomes shortcomings function. At same time, fixed memory step descent method used replace optimization part, improves global convergence algorithm.

10.1109/iccwamtip47768.2019.9067661 article EN 2019-12-01

TiBERT: Tibetan Pre-trained Language Model

OPENALEX - Publications

Sisi Liu Junjie Deng Yuan Sun Xiaobing Zhao

The pre-trained language model is trained on large-scale unlabeled text and can achieve state-of-the-art results in many different downstream tasks. However, the current mainly concentrated Chinese English fields. For low resource such as Tibetan, there lack of a monolingual model. To promote development Tibetan natural processing tasks, this paper collects training data from websites constructs vocabulary that cover 99.95% words corpus by using Sentencepiece. Then, we train named TiBERT...

10.1109/smc53654.2022.9945074 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2022-10-09

QuGAN: Quasi Generative Adversarial Network for Tibetan Question Answering Corpus Generation

OPENALEX - Publications

Yuan Sun Chaofan Chen Tianci Xia Xiaobing Zhao

In recent years, the large-scale open Chinese and English question answering (QA) corpora have provided important support for application of deep learning in QA systems. However, low-resource languages, such as Tibetan, it is difficult to construct satisfactory systems, owing lack Tibetan corpora. To solve this problem, paper proposes a corpus generation model, called QuGAN. This model combines Quasi-Recurrent Neural Networks Reinforcement Learning. The used generator Generative Adversarial...

10.1109/access.2019.2934581 article EN cc-by IEEE Access 2019-01-01

Improved Distant Supervised Model in Tibetan Relation Extraction Using ELMo and Attention

OPENALEX - Publications

Yuan Sun Wang Li-ke Chaofan Chen Tianci Xia Xiaobing Zhao

The task of relation extraction is classifying the relations between two entities in a sentence. Distant supervision can automatically align texts based on Knowledge Base without labeled training data. For low-resource language extraction, such as Tibetan, main problem lack In this paper, we propose an improved distant supervised model Piecewise Convolutional Neural Network (PCNN) to expand Tibetan corpus. We add self-attention mechanism and soft-label method decrease wrong labels, use...

10.1109/access.2019.2955977 article EN cc-by IEEE Access 2019-01-01

Design of a Tibetan Automatic Word Segmentation Scheme

OPENALEX - Publications

Yuan Sun Zhijuan Wang Xiaobing Zhao Guosheng Yang

This paper proposes a Tibetan automatic word segmentation approach, which is taking the advantage of case-auxiliary words and continuous feature. Meanwhile, we also conduct further investigation on chunk based words. Finally, an experiment performed to verify algorithm proposed in this paper, results prove method effective.

10.1109/iciecs.2009.5366542 article EN International Conference on Information Engineering and Computer Science 2009-12-01

A Hybrid Network Model for Tibetan Question Answering

OPENALEX - Publications

Yuan Sun Tianci Xia

Currently, research on question answering (QA) with deep learning methods is a hotspot in natural language processing. In addition, most of the mainly focused English or Chinese since there are large-scale open corpora, such as WikiQA DoubanQA. However, how to use QA low resource languages, like Tibetan becomes challenge. this paper, we propose hybrid network model for QA, which combines convolutional neural and long short memory (LSTM) extract effective features from small-scale corpora....

10.1109/access.2019.2911320 article EN cc-by-nc-nd IEEE Access 2019-01-01

A Method of English Test Knowledge Graph Construction

OPENALEX - Publications

Yuan Sun Jiayi Tang Zhen Zhu

English is one of the key subjects basic education in many countries; more and students tend to learn online. This paper takes middle school texts as research object proposes a method test knowledge graph construction. Through acquiring data, preprocessing corpus, designing feature vectors, this realizes extract points from tests based on SVM model construct an graph. It important standardization, automation systematization online learning.

10.4236/jcc.2021.99007 article EN Journal of Computer and Communications 2021-01-01

MiLMo:Minority Multilingual Pre-Trained Language Model

OPENALEX - Publications

Junjie Deng Hanru Shi Xinhe Yu Wugedele Bao Yuan Sun and 1 more

Pre-trained language models are trained on large-scale unsupervised data, and they can fine-tune the model only small-scale labeled datasets, achieve good results. Multilingual pre-trained be multiple languages, understand languages at same time. At present, search mainly focuses rich resources, while there is relatively little research low-resource such as minority public multilingual not work well for languages. Therefore, this paper constructs a named MiLMo that performs better tasks,...

10.1109/smc53992.2023.10393961 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023-10-01

Knowledge Points Extraction of Junior High School English Exercises Based on SVM Method

OPENALEX - Publications

Wang Li-ke Yuan Sun Zhen Zhu

In the process of learning English, students need to do a lot exercises improve English performance. The knowledge points are important students, yet how extract from automatically is difficult, which foundation graph construction for learning. this paper, we use SVM realize extraction junior high school exercises. Firstly, paper obtains amounts question data through analyzing electronic documents, and uses NLP tools segment, POS tagging named entity recognition. Secondly, based on model,...

10.1145/3241748.3241768 article EN 2018-01-01

Reassessing Woody-to-total Area Ratio in Leaf Area Index Measurement: Refinement and Novel Methodology

OPENALEX - Publications

Yunping Chen Yuanlei Cheng Lin Sun Xingfa Gu Yuan Sun

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4744739 preprint EN 2024-01-01

Quantifying Error Factors in Hemispherical Photography Leaf Area Index Measurement: A Comprehensive Analysis of Camera and Environmental Influences

OPENALEX - Publications

Yunping Chen Lin Sun Zhentao Gao Yuanlei Cheng Yuan Sun and 1 more

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4747982 preprint EN 2024-01-01

DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models

OPENALEX - Publications

Yang‐Yang Qian Yuan Sun Yu Guo

Generating and editing dynamic 3D head avatars are crucial tasks in virtual reality film production. However, existing methods often suffer from facial distortions, inaccurate movements, limited fine-grained capabilities. To address these challenges, we present DynamicAvatars, a model that generates photorealistic, moving video clips parameters associated with positions expressions. Our approach enables precise through novel prompt-based model, which integrates user-provided prompts guiding...

10.48550/arxiv.2411.15732 preprint EN arXiv (Cornell University) 2024-11-24

TIFD: Tibetan Instruction-Following Dataset for Large Language Models Supervised Fine-Tuning

OPENALEX - Publications

Wenhao Zhuang Dawa Cairen Yuan Sun

In addressing challenges within the field of Natural Language Processing (NLP), supervised fine-tuning is an efficient technique that allows pre-trained Large Models to adapt specific tasks. This especially crucial for low-resource languages, such as Tibetan, where demand high-quality datasets particularly pronounced. paper introduces Tibetan Instruction-Following Dataset (TIFD), comprising 11,535 JSON objects, each with four attributes: a unique identifier, instructions, input, and output....

10.3724/2096-7004.di.2024.0010 article EN Data Intelligence 2024-12-01

AlpaCream: an Effective Method of Data Selection on Alpaca

OPENALEX - Publications

Yijie Li Yuan Sun

10.1109/smc54092.2024.10831719 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2024-10-06

Research on cross-language text similarity calculation

OPENALEX - Publications

Yuan Sun Qian Zhao

Cross-language text similarity calculation is a critical and fundamental problem in natural language processing. It widely used cross-language research, such as information retrieval. In this paper, we the LDA (Latent Dirichlet Allocation) model to calculate similarities of Tibetan Chinese texts at topic level. Through modelling forecasting, are mapped feature space topics. This method reduced dimensions vector improved speed efficiency computation.

10.1109/iceiec.2015.7284573 article EN 2015-05-01

Person attributes extraction in profiles based on SVM and pattern

OPENALEX - Publications

Zhen Zhu Yuan Sun

This paper is an exploration to find a way get the person attributes in profiles. Considering those exists large volume of unstructured data, and it very difficult gain short time. So, we use method combing pattern SVM extract attributes. Firstly, collect many raw profiles websites by our configurable crawler. Secondly, statistic methods do pre-processing works include lexical analysis name recognition. Thirdly, build patterns, which can model Also generalize patterns features. Finally,...

10.1109/icsess.2015.7339037 article EN 2015-09-01

Research on Some Key Technologies of Tibetan Automatic Word Segmentation

OPENALEX - Publications

Yuan Sun Xiaodong Yan Xiaobing Zhao Guosheng Yang

This paper researches on some key technologies of Tibetan automatic word segmentation. We propose a segmentation approach, which is taking the advantage case-auxiliary words and continuous feature. Meanwhile, resolution method overlapping ambiguity in proposed, based forward-backward scanning identification improved maximum probability algorithm. Finally, an experiment conducted, results prove algorithm effective.

10.1109/icinis.2011.43 article EN 2011-11-01

Tibetan-Chinese cross language named entity extraction based on comparable corpus and naturally annotated resources

OPENALEX - Publications

Yuan Sun Wenbin Guo Xiaobing Zhao

Tibetan-Chinese named entity extraction can effectively improve the performance of cross language question answering system, information retrieval, machine translation and other researches. In condition no practical Tibetan recognition system model, this paper proposes a method to extract entities based on comparable corpus naturally annotated resources from webs. The main work is in following: (1) construction. (2) Combining sentence length, word matching boundary term features, using...

10.1109/cidm.2014.7008680 article EN 2014-12-01

Tibetan-Chinese Named Entity Extraction Based on Comparable Corpus

OPENALEX - Publications

Yuan Sun Qian Zhao

Tibetan-Chinese named entity extraction is the foundation of cross language information processing, and provides a basis for machine translation retrieval research. In this paper, we use multi-language links Wikipedia to obtain comparable corpus, combine sentence length, word matching boundary words together get parallel sentence. Then extract from corpus in three ways: (1) Extracting Natural labeling information. (2) Acquiring Tibetan entries Chinese entries. (3) Using sequence intersection...

10.4028/www.scientific.net/amm.571-572.1202 article EN Applied Mechanics and Materials 2014-06-10

Coming Soon ...