NFDI4DS | UHH-SEMS - Publication Details

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

OPENALEX - Publications

Marah Abdin Sam Adé Jacobs Ammar Ahmad Awan Jyoti Aneja Ahmed Hassan Awadallah and 82 more

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such Mixtral 8x7B GPT-3.5 (e.g., phi-3-mini achieves 69% MMLU 8.38 MT-bench), despite being small enough to be deployed phone. The innovation lies entirely in our dataset for training, scaled-up version the one used phi-2, composed heavily filtered web data synthetic data. is also further...

10.48550/arxiv.2404.14219 preprint EN arXiv (Cornell University) 2024-04-22

Nemo: An Open-Source Transformer-Supercharged Benchmark for Fine-Grained Wildfire Smoke Detection

OPENALEX - Publications

Amirhessam Yazdi Heyang Qin Connor B. Jordan Lei Yang Feng Yan

Deep-learning (DL)-based object detection algorithms can greatly benefit the community at large in fighting fires, advancing climate intelligence, and reducing health complications caused by hazardous smoke particles. Existing DL-based techniques, which are mostly based on convolutional networks, have proven to be effective wildfire detection. However, there is still room for improvement. First, existing methods tend some commercial aspects, with limited publicly available data models. In...

10.3390/rs14163979 article EN cc-by Remote Sensing 2022-08-16

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

OPENALEX - Publications

Guanhua Wang Heyang Qin Sam Adé Jacobs Connor Holmes Samyam Rajbhandari and 4 more

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due its ease use, efficiency, and good scalability. However, when training low-bandwidth clusters, or at scale which forces batch size per GPU be small, ZeRO's effective throughput is limited because high communication volume from gathering weights in forward pass, backward averaging gradients. This paper introduces three reduction techniques, we collectively refer as...

10.48550/arxiv.2306.10209 preprint EN cc-by arXiv (Cornell University) 2023-01-01

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

OPENALEX - Publications

Zhewei Yao Reza Yazdani Aminabadi Olatunji Ruwase Samyam Rajbhandari Xiaoxia Wu and 14 more

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when at scale of billions parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes...

10.48550/arxiv.2308.01320 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

OPENALEX - Publications

Connor Holmes Masahiro Tanaka M. B. Wyatt Ammar Ahmad Awan Jeff Rasley and 6 more

The deployment and scaling of large language models (LLMs) have become critical as they permeate various applications, demanding high-throughput low-latency serving systems. Existing frameworks struggle to balance these requirements, especially for workloads with long prompts. This paper introduces DeepSpeed-FastGen, a system that employs Dynamic SplitFuse, novel prompt generation composition strategy, deliver up 2.3x higher effective throughput, 2x lower latency on average, 3.7x...

10.48550/arxiv.2401.08671 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Swift machine learning model serving scheduling

OPENALEX - Publications

Heyang Qin Syed Zawad Yanqi Zhou Lei Yang Zhao Dongfang and 1 more

The success of machine learning has prospered Machine-Learning-as-a-Service (MLaaS) - deploying trained (ML) models in cloud to provide low latency inference services at scale. To meet Service-Level-Objective (SLO), judicious parallelization both request and operation levels is utterly important. However, existing ML systems (e.g., Tensorflow) serving platforms SageMaker) are SLO-agnostic rely on users manually configure the parallelism. serving, this paper proposes a swift scheduling...

10.1145/3295500.3356164 article EN 2019-11-07

The Age of Correlated Features in Supervised Learning based Forecasting

OPENALEX - Publications

Md Kamran Chowdhury Shisher Heyang Qin Lei Yang Feng Yan Yin Sun

In this paper, we analyze the impact of information freshness on supervised learning based forecasting. these applications, a neural network is trained to predict time-varying target (e.g., solar power), multiple correlated features temperature, humidity, and cloud coverage). The are collected from different data sources subject heterogeneous ages. By using an information-theoretic approach, prove that minimum training loss function ages features, where not always monotonic. However, if...

10.1109/infocomwkshps51825.2021.9484640 article EN IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 2021-05-10

Reinforcement-Learning-Empowered MLaaS Scheduling for Serving Intelligent Internet of Things

OPENALEX - Publications

Heyang Qin Syed Zawad Yanqi Zhou Sanjay Padhi Lei Yang and 1 more

Machine learning (ML) has been embedded in many Internet of Things (IoT) applications (e.g., smart home and autonomous driving). Yet it is often infeasible to deploy ML models on IoT devices due resource limitation. Thus, deploying trained the cloud providing inference services becomes a plausible solution. To provide low-latency serving massive devices, natural promising approach use parallelism computation. However, existing systems Tensorflow) ML-serving platforms SageMaker) are...

10.1109/jiot.2020.2965103 article EN publisher-specific-oa IEEE Internet of Things Journal 2020-01-09

The Age of Correlated Features in Supervised Learning based Forecasting

OPENALEX - Publications

Kamran Chowdhury Shisher Heyang Qin Lei Yang Feng Yan Yin Sun

In this paper, we analyze the impact of information freshness on supervised learning based forecasting. these applications, a neural network is trained to predict time-varying target (e.g., solar power), multiple correlated features temperature, humidity, and cloud coverage). The are collected from different data sources subject heterogeneous ages. By using an information-theoretic approach, prove that minimum training loss function ages features, where not always monotonic. However, if...

10.48550/arxiv.2103.00092 preprint EN other-oa arXiv (Cornell University) 2021-01-01