- Speech Recognition and Synthesis
- Service-Oriented Architecture and Web Services
- Software System Performance and Reliability
- Multimodal Machine Learning Applications
- Caching and Content Delivery
- Business Process Modeling and Analysis
- Neural Networks and Applications
- Recommender Systems and Techniques
- Graph Theory and Algorithms
- Music and Audio Processing
- Software Engineering Techniques and Practices
- Domain Adaptation and Few-Shot Learning
- Advanced Database Systems and Queries
- Web Data Mining and Analysis
- Data Quality and Management
- Software Engineering Research
- Speech and Audio Processing
- Human Pose and Action Recognition
- Advanced Software Engineering Methodologies
- Video Analysis and Summarization
- Knowledge Management and Sharing
- Speech and dialogue systems
- Data Stream Mining Techniques
Zhejiang Gongshang University
2020-2024
Zhejiang University
2023
Multi-media communications facilitate global interaction among people. However, despite researchers exploring cross-lingual translation techniques such as machine and audio speech to overcome language barriers, there is still a shortage of studies on visual speech. This lack research mainly due the absence datasets containing translated text pairs. In this paper, we present AVMuST-TED, first dataset for Audio-Visual Multilingual Speech Translation, derived from TED talks. Nonetheless, not...
Current video captioning efforts most focus on describing a single while the need for videos in groups has increased considerably. In this study, we propose new task, group captioning, which aims to infer desired content among of target and describe it with another related reference videos. This task requires model effectively summarize accurately distinguishing compared videos, becomes more difficult as length increases. To solve problem, 1) First, an efficient relational approximation...
At present, Mashup development has attracted much attention in the field of software engineering. It is focus this article to use existing open APIs meet needs developers. Therefore, how select most appropriate API for a specific user requirement crucial problem be solved. We propose Hybrid Open Selection Approach (HyOASAM), which consists two basic approaches: one user-story-driven discovery approach, and other multidimensional-information-matrix- (MIM-) based recommendation approach. The...
Conventional pipeline of multimodal learning consists three stages, including encoding, fusion, and decoding. Most existing methods under missing modality condition focus on the first stage aim to learn invariant representation or reconstruct features. However, these rely strong assumptions (i.e., all pre-defined modalities are available for each input sample during training number is fixed). To solve this problem, we propose a simple yet effective method called Interaction Augmented...