Linnan Wang

ORCID: 0000-0001-6114-7098
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Machine Learning and Data Classification
  • Stochastic Gradient Optimization Techniques
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning and Algorithms
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Parallel Computing and Optimization Techniques
  • Ferroelectric and Negative Capacitance Devices
  • Privacy-Preserving Technologies in Data
  • Neural Networks and Applications
  • Advanced Bandit Algorithms Research
  • Advanced Data Storage Technologies
  • Advanced Thermodynamics and Statistical Mechanics
  • Image Processing and 3D Reconstruction
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Sparse and Compressive Sensing Techniques
  • Data Visualization and Analytics
  • Distributed systems and fault tolerance
  • Model Reduction and Neural Networks
  • Reinforcement Learning in Robotics
  • Interconnection Networks and Systems
  • Adversarial Robustness in Machine Learning

Brown University
2017-2021

Meta (Israel)
2021

John Brown University
2017-2020

Going deeper and wider in neural architectures improves the accuracy, while limited GPU DRAM places an undesired restriction on network design domain. Deep Learning (DL) practitioners either need change to less desired architectures, or nontrivially dissect a across multiGPUs. These distract DL from concentrating their original machine learning tasks. We present SuperNeurons: dynamic memory scheduling runtime enable training far beyond capacity. SuperNeurons features 3 optimizations,...

10.1145/3178487.3178491 preprint EN 2018-02-06

Neural Architecture Search (NAS) has shown great success in automating the design of neural networks, but prohibitive amount computations behind current NAS methods requires further investigations improving sample efficiency and network evaluation cost to get better results a shorter time. In this paper, we present novel scalable Monte Carlo Tree (MCTS) based agent, named AlphaX, tackle these two aspects. AlphaX improves search by adaptively balancing exploration exploitation at state level,...

10.1609/aaai.v34i06.6554 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Neural Architecture Search (NAS) has emerged as a promising technique for automatic neural network design. However, existing MCTS based NAS approaches often utilize manually designed action space, which is not directly related to the performance metric be optimized (e.g., accuracy), leading sample-inefficient explorations of architectures. To improve sample efficiency, this paper proposes Latent Action (LaNAS), learns actions recursively partition search space into good or bad regions that...

10.1109/tpami.2021.3071343 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

With the unprecedented development of compute capability and extension memory bandwidth on modern GPUs, parallel communication synchronization soon becomes a major concern for continuous performance scaling. This is especially case emerging big-data applications. Instead relying few heavily-loaded CTAs that may expose opportunities intra-CTA data reuse, current technology design trends suggest potential allocating more lightweighted processing individual tasks independently, as overheads...

10.1145/3205289.3205294 article EN 2018-06-12

High dimensional black-box optimization has broad applications but remains a challenging problem to solve. Given set of samples $\{\vx_i, y_i\}$, building global model (like Bayesian Optimization (BO)) suffers from the curse dimensionality in high-dimensional search space, while greedy may lead sub-optimality. By recursively splitting space into regions with high/low function values, recent works like LaNAS shows good performance Neural Architecture Search (NAS), reducing sample complexity...

10.48550/arxiv.2007.00708 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Efficient evaluation of a network architecture drawn from large search space remains key challenge in Neural Architecture Search (NAS). Vanilla NAS evaluates each by training scratch, which gives the true performance but is extremely time-consuming. Recently, one-shot substantially reduces computation cost only one supernetwork, a.k.a. supernet, to approximate every via weight-sharing. However, estimation can be very inaccurate due co-adaption among operations. In this paper, we propose...

10.48550/arxiv.2006.06863 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The performance and efficiency of distributed training Deep Neural Networks (DNN) highly depend on the gradient averaging among participating processes, a step bound by communication costs. There are two major approaches to reduce overhead: overlap communications with computations (lossless), or (lossy). lossless solution works well for linear neural architectures, e.g. VGG, AlexNet, but more recent networks such as ResNet Inception limit opportunity overlapping. Therefore, that amount data...

10.1145/3369583.3392681 article EN 2020-06-22

The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance Message Passing Interface (MPI) collective communications to be susceptible noise, adapt a complex mix hardware capabilities. designs state art MPI collectives heavily rely on synchronizations; these magnify noise across participating processes, resulting significant slowdown. Therefore, such design philosophy must reconsidered efficiently robustly run large-scale heterogeneous...

10.1145/3208040.3208054 article EN 2018-06-11

We consider the problem of how to reduce cost communication that is required for parallel training a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective operations, like broadcasts parameters or reductions partial gradient aggregations, which large messages quickly dominates overall execution time and limits scalability. To address this problem, we develop new technique referred as Linear Pipelining (LP). It...

10.1145/3126686.3126749 article EN 2017-10-23

We consider the problem of how to reduce cost communication that is required for parallel training a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective operations, like broadcasts parameters or reductions sub-gradient aggregations, which large messages quickly dominates overall execution time and limits scalability. To address this problem, we develop new technique referred as Linear Pipelining (LP). It tuned...

10.48550/arxiv.1611.04255 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Deploying deep learning models requires taking into consideration neural network metrics such as model size, inference latency, and #FLOPs, aside from accuracy. This results in designers leveraging multi-objective optimization to design effective networks multiple criteria. However, applying optimizations architecture search (NAS) is nontrivial because NAS tasks usually have a huge space, along with non-negligible searching cost. algorithms alleviate the GPU costs. In this work, we implement...

10.48550/arxiv.2406.00291 preprint EN arXiv (Cornell University) 2024-05-31

Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM CMA-ES greedily focus on promising regions search space may get trapped in local maxima. DOO VOOT balance exploration exploitation, but use partitioning strategies independent function to be optimized. Recently, LaMCTS empirically learns partition reward-sensitive manner for black-box optimization. In this...

10.48550/arxiv.2106.10544 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...