Zhen Xie

ORCID: 0000-0003-3516-2192
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Advanced Neural Network Applications
  • Advanced Data Storage Technologies
  • RNA and protein synthesis mechanisms
  • Natural Language Processing Techniques
  • Machine Learning and Data Classification
  • Advanced Data Compression Techniques
  • Neural Networks and Applications
  • Hydraulic and Pneumatic Systems
  • Speech Recognition and Synthesis
  • Multimodal Machine Learning Applications
  • Image and Signal Denoising Methods
  • Human Pose and Action Recognition
  • RNA Research and Splicing
  • RNA modifications and cancer
  • Industrial Technology and Control Systems
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Image and Video Retrieval Techniques
  • Cloud Computing and Resource Management
  • Power Systems and Technologies

Binghamton University
2023-2024

Tsinghua University
2024

Institut de Biologie systémique et synthétique
2024

Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, performance in high-performance computing (HPC) domain tasks has been less than optimal due to specialized expertise required interpret model responses. In response this challenge, we propose HPC-GPT, a novel LLaMA-based that supervised fine-tuning using generated QA (Question-Answer) instances for HPC domain. To evaluate its...

10.1145/3624062.3624172 preprint EN cc-by 2023-11-10

Abstract Summary Cis-acting mRNA elements play a key role in the regulation of stability and translation efficiency. Revealing interactions these their impact plays crucial understanding process, which supports development mRNA-based medicine or vaccines. Deep neural networks (DNN) can learn complex cis-regulatory codes from RNA sequences. However, extracting efficiently DNN remains significant challenge. Here, we propose method based on our toolkit NeuronMotif motif mutagenesis, not only...

10.1093/bioinformatics/btae262 article EN cc-by Bioinformatics 2024-04-12

Large transformer models have recently achieved great success across various domains. With a growing number of model parameters, large training today typically involves sharding, data parallelism, and parallelism. Thus, the throughput large-scale depends heavily on network bandwidth since combination sharding multiple parallelism strategies incurs costs. However, prior characterizations high-bandwidth DGX machines that use TFLOPS as metric may not reflect performance system with lower...

10.1145/3639034 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2024-02-16

Large transformer models have recently achieved great success across various domains. With a growing number of model parameters, large training today typically involves sharding, data parallelism, and parallelism. Thus, the throughput large-scale depends heavily on network bandwidth since combination sharding multiple parallelism strategies incurs costs. However, prior characterizations high-bandwidth DGX machines that use TFLOPS as metric may not reflect performance system with lower...

10.1145/3652963.3655087 article EN 2024-06-01

Large transformer models have recently achieved great success across various domains. With a growing number of model parameters, large training today typically involves sharding, data parallelism, and parallelism. Thus, the throughput large-scale depends heavily on network bandwidth since combination sharding multiple parallelism strategies incurs costs. However, prior characterizations high-bandwidth DGX machines that use TFLOPS as metric may not reflect performance system with lower...

10.1145/3673660.3655087 article EN ACM SIGMETRICS Performance Evaluation Review 2024-06-11

Heterogeneous hardware like Gaudi processor has been developed to enhance computations, especially matrix operations for Transformer-based large language models (LLMs) generative AI tasks. However, our analysis indicates that Transformers are not fully optimized on such emerging hardware, primarily due inadequate optimizations in non-matrix computational kernels Softmax and heterogeneous resource utilization, particularly when processing long sequences. To address these issues, we propose an...

10.48550/arxiv.2412.19829 preprint EN arXiv (Cornell University) 2024-12-19

Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements. The quadratic of the self-attention mechanism further exacerbates these challenges when dealing with long sequences large datasets. Specialized AI hardware accelerators, such as Habana GAUDI architecture, offer a promising solution to tackle issues. features Matrix Multiplication Engine (MME) cluster fully programmable Tensor...

10.1145/3624062.3624257 article EN 2023-11-10
Coming Soon ...