- Topic Modeling
- Advanced Neural Network Applications
- Advanced Data Storage Technologies
- RNA and protein synthesis mechanisms
- Natural Language Processing Techniques
- Machine Learning and Data Classification
- Advanced Data Compression Techniques
- Neural Networks and Applications
- Hydraulic and Pneumatic Systems
- Speech Recognition and Synthesis
- Multimodal Machine Learning Applications
- Image and Signal Denoising Methods
- Human Pose and Action Recognition
- RNA Research and Splicing
- RNA modifications and cancer
- Industrial Technology and Control Systems
- Ferroelectric and Negative Capacitance Devices
- Advanced Image and Video Retrieval Techniques
- Cloud Computing and Resource Management
- Power Systems and Technologies
Binghamton University
2023-2024
Tsinghua University
2024
Institut de Biologie systémique et synthétique
2024
Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, performance in high-performance computing (HPC) domain tasks has been less than optimal due to specialized expertise required interpret model responses. In response this challenge, we propose HPC-GPT, a novel LLaMA-based that supervised fine-tuning using generated QA (Question-Answer) instances for HPC domain. To evaluate its...
Abstract Summary Cis-acting mRNA elements play a key role in the regulation of stability and translation efficiency. Revealing interactions these their impact plays crucial understanding process, which supports development mRNA-based medicine or vaccines. Deep neural networks (DNN) can learn complex cis-regulatory codes from RNA sequences. However, extracting efficiently DNN remains significant challenge. Here, we propose method based on our toolkit NeuronMotif motif mutagenesis, not only...
Large transformer models have recently achieved great success across various domains. With a growing number of model parameters, large training today typically involves sharding, data parallelism, and parallelism. Thus, the throughput large-scale depends heavily on network bandwidth since combination sharding multiple parallelism strategies incurs costs. However, prior characterizations high-bandwidth DGX machines that use TFLOPS as metric may not reflect performance system with lower...
Large transformer models have recently achieved great success across various domains. With a growing number of model parameters, large training today typically involves sharding, data parallelism, and parallelism. Thus, the throughput large-scale depends heavily on network bandwidth since combination sharding multiple parallelism strategies incurs costs. However, prior characterizations high-bandwidth DGX machines that use TFLOPS as metric may not reflect performance system with lower...
Large transformer models have recently achieved great success across various domains. With a growing number of model parameters, large training today typically involves sharding, data parallelism, and parallelism. Thus, the throughput large-scale depends heavily on network bandwidth since combination sharding multiple parallelism strategies incurs costs. However, prior characterizations high-bandwidth DGX machines that use TFLOPS as metric may not reflect performance system with lower...
Heterogeneous hardware like Gaudi processor has been developed to enhance computations, especially matrix operations for Transformer-based large language models (LLMs) generative AI tasks. However, our analysis indicates that Transformers are not fully optimized on such emerging hardware, primarily due inadequate optimizations in non-matrix computational kernels Softmax and heterogeneous resource utilization, particularly when processing long sequences. To address these issues, we propose an...
Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements. The quadratic of the self-attention mechanism further exacerbates these challenges when dealing with long sequences large datasets. Specialized AI hardware accelerators, such as Habana GAUDI architecture, offer a promising solution to tackle issues. features Matrix Multiplication Engine (MME) cluster fully programmable Tensor...