B.-H Wang

ORCID: 0009-0007-9896-8042
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Advanced Data Processing Techniques
  • Advanced Computational Techniques and Applications
  • Music and Audio Processing
  • Fault Detection and Control Systems
  • Handwritten Text Recognition Techniques
  • Video Analysis and Summarization

Institute of Electrical and Electronics Engineers
2023

University of Memphis
2023

Antea Group (France)
2023

Engineering Systems (United States)
2023

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm internet data closely resembles human reading habits. Recent studies have shown that such aids multimodal in-context learning maintains capabilities large language models during fine-tuning. However, limited scale diversity current image-text restrict development models. In this paper, we introduce OmniCorpus, 10 billion-scale dataset. Using an...

10.48550/arxiv.2406.08418 preprint EN arXiv (Cornell University) 2024-06-12

Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages population of 671 million people. However, prevailing AI models suffer from significant lack representation texts, images, audio datasets SEA, compromising the quality for SEA languages. Evaluating challenging due to scarcity high-quality datasets, compounded by dominance English training data, raising concerns about potential misrepresentation. To address these...

10.48550/arxiv.2406.10118 preprint EN arXiv (Cornell University) 2024-06-14
Coming Soon ...