Yuan Sun

ORCID: 0000-0003-0565-9659
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Advanced Text Analysis Techniques
  • Advanced Computational Techniques and Applications
  • Educational Technology and Assessment
  • Semantic Web and Ontologies
  • Web Data Mining and Analysis
  • Leaf Properties and Growth Measurement
  • Text and Document Classification Technologies
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Remote Sensing in Agriculture
  • Data Quality and Management
  • Advanced Graph Neural Networks
  • Data Management and Algorithms
  • China's Ethnic Minorities and Relations
  • Face recognition and analysis
  • AI in cancer detection
  • Mobile Agent-Based Network Management
  • Neural Networks and Applications
  • Remote Sensing and LiDAR Applications
  • Remote Sensing and Land Use
  • Digital Imaging for Blood Diseases
  • Petri Nets in System Modeling
  • Speech Recognition and Synthesis

Minzu University of China
2014-2025

People’s Hospital of Rizhao
2023

Tongji Zhejiang College
2019

Tsinghua University
2003

Due to the rapid development of printed circuit board (PCB) design technology, inspection PCB surface defects has become an increasingly critical issue. The classification facilitates root causes detects' identification. As may be intensive, actual should not considered as a binary or multi-category problem. This type problem is called multi-label Recently, one deep learning frameworks, convolutional neural network (CNN) major breakthrough in many areas image processing, especially...

10.1049/joe.2018.8279 article EN cc-by The Journal of Engineering 2018-08-18

Question generation aims to generate questions according the given context and answer, it has made significant progress in both Chinese English languages. However, research on Tibetan question is still early stages, with key challenges including omission of crucial keywords that render unanswerable. Existing large-scale models do not provide robust support for low-resource languages, such as GPT or BERT. To solve problem, this paper proposes based sentences knowledge graph. The generator...

10.1145/3725531 article EN ACM Transactions on Asian and Low-Resource Language Information Processing 2025-03-25

Convolutional neural network is the most important algorithm in field of deep learning. The traditional convolution usually uses Sigmoid or Relu as activation function, but two sides are saturated, and has a dead zone, which very easy to cause gradient disappearance explosion. In this paper, Mish function introduced into LENET-5 convolutional network, overcomes shortcomings function. At same time, fixed memory step descent method used replace optimization part, improves global convergence algorithm.

10.1109/iccwamtip47768.2019.9067661 article EN 2019-12-01

The pre-trained language model is trained on large-scale unlabeled text and can achieve state-of-the-art results in many different downstream tasks. However, the current mainly concentrated Chinese English fields. For low resource such as Tibetan, there lack of a monolingual model. To promote development Tibetan natural processing tasks, this paper collects training data from websites constructs vocabulary that cover 99.95% words corpus by using Sentencepiece. Then, we train named TiBERT...

10.1109/smc53654.2022.9945074 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2022-10-09

In recent years, the large-scale open Chinese and English question answering (QA) corpora have provided important support for application of deep learning in QA systems. However, low-resource languages, such as Tibetan, it is difficult to construct satisfactory systems, owing lack Tibetan corpora. To solve this problem, paper proposes a corpus generation model, called QuGAN. This model combines Quasi-Recurrent Neural Networks Reinforcement Learning. The used generator Generative Adversarial...

10.1109/access.2019.2934581 article EN cc-by IEEE Access 2019-01-01

The task of relation extraction is classifying the relations between two entities in a sentence. Distant supervision can automatically align texts based on Knowledge Base without labeled training data. For low-resource language extraction, such as Tibetan, main problem lack In this paper, we propose an improved distant supervised model Piecewise Convolutional Neural Network (PCNN) to expand Tibetan corpus. We add self-attention mechanism and soft-label method decrease wrong labels, use...

10.1109/access.2019.2955977 article EN cc-by IEEE Access 2019-01-01

This paper proposes a Tibetan automatic word segmentation approach, which is taking the advantage of case-auxiliary words and continuous feature. Meanwhile, we also conduct further investigation on chunk based words. Finally, an experiment performed to verify algorithm proposed in this paper, results prove method effective.

10.1109/iciecs.2009.5366542 article EN International Conference on Information Engineering and Computer Science 2009-12-01

Currently, research on question answering (QA) with deep learning methods is a hotspot in natural language processing. In addition, most of the mainly focused English or Chinese since there are large-scale open corpora, such as WikiQA DoubanQA. However, how to use QA low resource languages, like Tibetan becomes challenge. this paper, we propose hybrid network model for QA, which combines convolutional neural and long short memory (LSTM) extract effective features from small-scale corpora....

10.1109/access.2019.2911320 article EN cc-by-nc-nd IEEE Access 2019-01-01

English is one of the key subjects basic education in many countries; more and students tend to learn online. This paper takes middle school texts as research object proposes a method test knowledge graph construction. Through acquiring data, preprocessing corpus, designing feature vectors, this realizes extract points from tests based on SVM model construct an graph. It important standardization, automation systematization online learning.

10.4236/jcc.2021.99007 article EN Journal of Computer and Communications 2021-01-01

Pre-trained language models are trained on large-scale unsupervised data, and they can fine-tune the model only small-scale labeled datasets, achieve good results. Multilingual pre-trained be multiple languages, understand languages at same time. At present, search mainly focuses rich resources, while there is relatively little research low-resource such as minority public multilingual not work well for languages. Therefore, this paper constructs a named MiLMo that performs better tasks,...

10.1109/smc53992.2023.10393961 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023-10-01

In the process of learning English, students need to do a lot exercises improve English performance. The knowledge points are important students, yet how extract from automatically is difficult, which foundation graph construction for learning. this paper, we use SVM realize extraction junior high school exercises. Firstly, paper obtains amounts question data through analyzing electronic documents, and uses NLP tools segment, POS tagging named entity recognition. Secondly, based on model,...

10.1145/3241748.3241768 article EN 2018-01-01

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4744739 preprint EN 2024-01-01

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4747982 preprint EN 2024-01-01

Generating and editing dynamic 3D head avatars are crucial tasks in virtual reality film production. However, existing methods often suffer from facial distortions, inaccurate movements, limited fine-grained capabilities. To address these challenges, we present DynamicAvatars, a model that generates photorealistic, moving video clips parameters associated with positions expressions. Our approach enables precise through novel prompt-based model, which integrates user-provided prompts guiding...

10.48550/arxiv.2411.15732 preprint EN arXiv (Cornell University) 2024-11-24

In addressing challenges within the field of Natural Language Processing (NLP), supervised fine-tuning is an efficient technique that allows pre-trained Large Models to adapt specific tasks. This especially crucial for low-resource languages, such as Tibetan, where demand high-quality datasets particularly pronounced. paper introduces Tibetan Instruction-Following Dataset (TIFD), comprising 11,535 JSON objects, each with four attributes: a unique identifier, instructions, input, and output....

10.3724/2096-7004.di.2024.0010 article EN Data Intelligence 2024-12-01

10.1109/smc54092.2024.10831719 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2024-10-06

Cross-language text similarity calculation is a critical and fundamental problem in natural language processing. It widely used cross-language research, such as information retrieval. In this paper, we the LDA (Latent Dirichlet Allocation) model to calculate similarities of Tibetan Chinese texts at topic level. Through modelling forecasting, are mapped feature space topics. This method reduced dimensions vector improved speed efficiency computation.

10.1109/iceiec.2015.7284573 article EN 2015-05-01

This paper is an exploration to find a way get the person attributes in profiles. Considering those exists large volume of unstructured data, and it very difficult gain short time. So, we use method combing pattern SVM extract attributes. Firstly, collect many raw profiles websites by our configurable crawler. Secondly, statistic methods do pre-processing works include lexical analysis name recognition. Thirdly, build patterns, which can model Also generalize patterns features. Finally,...

10.1109/icsess.2015.7339037 article EN 2015-09-01

This paper researches on some key technologies of Tibetan automatic word segmentation. We propose a segmentation approach, which is taking the advantage case-auxiliary words and continuous feature. Meanwhile, resolution method overlapping ambiguity in proposed, based forward-backward scanning identification improved maximum probability algorithm. Finally, an experiment conducted, results prove algorithm effective.

10.1109/icinis.2011.43 article EN 2011-11-01

Tibetan-Chinese named entity extraction can effectively improve the performance of cross language question answering system, information retrieval, machine translation and other researches. In condition no practical Tibetan recognition system model, this paper proposes a method to extract entities based on comparable corpus naturally annotated resources from webs. The main work is in following: (1) construction. (2) Combining sentence length, word matching boundary term features, using...

10.1109/cidm.2014.7008680 article EN 2014-12-01

Tibetan-Chinese named entity extraction is the foundation of cross language information processing, and provides a basis for machine translation retrieval research. In this paper, we use multi-language links Wikipedia to obtain comparable corpus, combine sentence length, word matching boundary words together get parallel sentence. Then extract from corpus in three ways: (1) Extracting Natural labeling information. (2) Acquiring Tibetan entries Chinese entries. (3) Using sequence intersection...

10.4028/www.scientific.net/amm.571-572.1202 article EN Applied Mechanics and Materials 2014-06-10
Coming Soon ...