- Crystallization and Solubility Studies
- X-ray Diffraction in Crystallography
- Digital Games and Media
- Vehicle License Plate Recognition
- Digital Communication and Language
- Misinformation and Its Impacts
- Advanced Adaptive Filtering Techniques
- Music and Audio Processing
- Artificial Intelligence in Healthcare and Education
- Speech Recognition and Synthesis
- Nuclear Physics and Applications
- Scientific Computing and Data Management
- Advanced Steganography and Watermarking Techniques
- Advanced Data Storage Technologies
- Speech and Audio Processing
- Handwritten Text Recognition Techniques
- Internet Traffic Analysis and Secure E-voting
- Biomedical Text Mining and Ontologies
- Parallel Computing and Optimization Techniques
- Natural Language Processing Techniques
- Generative Adversarial Networks and Image Synthesis
- Image Processing and 3D Reconstruction
Inner Mongolia University of Technology
2024
Beijing Computing Center
2015
The rapid proliferation of large language models (LLMs) has created an urgent need for reliable methods to detect whether a text is generated by such models. In this paper, we propose SimMark, posthoc watermarking algorithm that makes LLMs' outputs traceable without requiring access the model's internal logits, enabling compatibility with wide range LLMs, including API-only By leveraging similarity semantic sentence embeddings and rejection sampling impose detectable statistical patterns...
Writing is an important carrier of cultural inheritance, and the digitization handwritten texts effective means to protect national culture. Compared Chinese English handwriting recognition, research on Mongolian recognition started relatively late achieved few results due characteristics script itself lack corpus. First, according characters, random erasing data augmentation algorithm was modified, a dual (DDA) proposed by combining improved with horizontal wave transformation (HWT) augment...
The core challenge of speech synthesis technology is how to convert text information into an audible audio form meet the needs users. In recent years, quality based on end-to-end models has been significantly improved. However, due characteristics Mongolian language and lack corpus, model achieved few results, there are still some problems with performance quality. First, phoneme was further improved a Bang-based pre-training constructed reduce error rate phonetic synthesized words. Second,...
Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed identify rhetorical psychological persuasion techniques embedded within memes. To tackle this problem, we introduced caption generation step assess the modality gap impact of additional semantic information from which improved result. Our best model utilizes GPT-4...
The cluster of IHEP computing center is a middle-sized system which provides 10 thousands CPU cores, 5 PB disk storage, and 40 GB/s IO throughput. Its 1000+ users come from variety HEP experiments. In such system, job classification an indispensable task. Although experienced administrator can classify by its pattern, it unpractical to millions jobs manually. We present how solve this problem with deep neural networks in supervised learning way. Firstly, we built training data set 320K...