Shengyang Sun

ORCID: 0000-0003-3286-0585
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Gaussian Processes and Bayesian Inference
  • Advanced Memory and Neural Computing
  • CCD and CMOS Imaging Sensors
  • Machine Learning and Algorithms
  • Neural dynamics and brain function
  • Machine Learning and Data Classification
  • Domain Adaptation and Few-Shot Learning
  • Transportation Planning and Optimization
  • Transportation and Mobility Innovations
  • Statistical Methods and Inference
  • Energy, Environment, and Transportation Policies
  • Neuroscience and Neural Engineering
  • Neural Networks and Applications
  • Stochastic Gradient Optimization Techniques
  • Time Series Analysis and Forecasting
  • Generative Adversarial Networks and Image Synthesis
  • Ferroelectric and Negative Capacitance Devices
  • Anomaly Detection Techniques and Applications
  • Photoreceptor and optogenetics research
  • Model Reduction and Neural Networks
  • Vehicle emissions and performance
  • Bayesian Modeling and Causal Inference
  • Quantum Computing Algorithms and Architecture
  • Control Systems and Identification
  • Electric Vehicles and Infrastructure

Zhejiang University
2024

National University of Defense Technology
2018-2021

University of Toronto
2017-2021

Deutsche Gesellschaft für Internationale Zusammenarbeit
2020

Vector Institute
2019

Tsinghua University
2016-2017

China University of Petroleum, Beijing
2015

The low-resolution analog-to-digital convertor (ADC) is a promising solution to significantly reduce the power consumption of radio frequency circuits in massive multiple-input multiple-output (MIMO) systems. In this letter, we investigate uplink spectral efficiency (SE) MIMO systems with ADCs over Rician fading channels, where both perfect and imperfect channel state information are considered. By modeling quantization noise as an additive noise, derive tractable exact approximation...

10.1109/lcomm.2016.2535132 article EN IEEE Communications Letters 2016-02-26

Variational Bayesian neural networks (BNNs) perform variational inference over weights, but it is difficult to specify meaningful priors and approximate posteriors in a high-dimensional weight space. We introduce functional (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i.e. distributions functions. prove that the KL divergence between processes equals supremum of marginal divergences all finite sets inputs. Based this, we practical training...

10.48550/arxiv.1903.05779 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Variational Bayesian neural nets combine the flexibility of deep learning with uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent adaptive weight noise implicitly fits posterior to maximize evidence lower bound (ELBO). This insight allows us train full-covariance, fully factorized, matrix-variate Gaussian posteriors using noisy...

10.48550/arxiv.1712.02390 preprint EN other-oa arXiv (Cornell University) 2017-01-01

In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of methods and learning. ZhuSuan is built upon Tensorflow. Unlike existing learning libraries, are mainly designed deterministic neural networks supervised tasks, featured its root into inference, thus supporting various kinds models, including both traditional hierarchical models recent generative models. We use running examples to illustrate...

10.48550/arxiv.1709.05870 preprint EN other-oa arXiv (Cornell University) 2017-01-01

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under NVIDIA Open Model License Agreement, a permissive license that allows distribution, modification, use of its outputs. These perform competitively to on wide range evaluation benchmarks, were sized fit single DGX H100 with 8 GPUs when deployed in FP8 precision. believe community can benefit from these various research studies...

10.48550/arxiv.2406.11704 preprint EN arXiv (Cornell University) 2024-06-17

Recent progress in variational inference has paid much attention to the flexibility of posteriors. One promising direction is use implicit distributions, i.e., distributions without tractable densities as posterior. However, existing methods on posteriors still face challenges noisy estimation and computational infeasibility when applied models with high-dimensional latent variables. In this paper, we present a new approach named Kernel Implicit Variational Inference that addresses these...

10.48550/arxiv.1705.10119 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Recently there have been increasing interests in learning and inference with implicit distributions (i.e., without tractable densities). To this end, we develop a gradient estimator for based on Stein's identity spectral decomposition of kernel operators, where the eigenfunctions are approximated by Nyström method. Unlike previous works that only provide estimates at sample points, our approach directly function, thus allows simple principled out-of-sample extension. We theoretical results...

10.48550/arxiv.1806.02925 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The generalization properties of Gaussian processes depend heavily on the choice kernel, and this remains a dark art. We present Neural Kernel Network (NKN), flexible family kernels represented by neural network. NKN architecture is based composition rules for kernels, so that each unit network corresponds to valid kernel. It can compactly approximate compositional kernel structures such as those used Automatic Statistician (Lloyd et al., 2014), but because differentiable, it end-to-end...

10.48550/arxiv.1806.04326 preprint EN other-oa arXiv (Cornell University) 2018-01-01

10.1109/icme57554.2024.10688202 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2024-07-15

10.1016/s1570-6672(08)60060-4 article EN Journal of Transportation Systems Engineering and Information Technology 2009-04-01

Multiply-accumulate calculations using a memristor crossbar array is an important method to realize neuromorphic computing. However, the fabrication technology still immature, and it difficult fabricate large-scale arrays with high-yield, which restricts development of memristor-based computing technology. Therefore, cascading small-scale achieve computational ability that can be achieved by arrays, great significance for promoting application To address this issue, we present cascaded...

10.1109/access.2019.2915787 article EN cc-by-nc-nd IEEE Access 2019-01-01

This paper proposes Full-Parallel Convolutional Neural Networks (FP-CNN) for specific target recognition, which utilize the analog memristive array circuits to carry out vector-matrix multiplication, and generate multiple output feature maps in one single processing cycle. Compared with ReLU Tanh function, we adopt absolute activation function innovatively reduce network scale dramatically, can achieve 99% recognition accuracy rate only three layers. Furthermore, propose a performance...

10.1587/elex.16.20181034 article EN IEICE Electronics Express 2019-01-01

With the rapid development of VLSI industry, research intelligent applications moves towards IoT edge computing. While power consumption and area cost deep neural networks usually exceed hardware limitation devices. In this paper, we propose a low-power network architecture to address such problem. We simplify current popular convolutional structure, utilize memristor crossbar store weights execute convolution operation in parallel, present spiking networks. At same time, proposed...

10.1109/ijcnn.2018.8489441 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2018-07-01

As one of the most promising methods in next generation neuromorphic systems, memristor-based spiking neural networks (SNNs) show great advantages terms power efficiency, integration density, and biological plausibility. However, because nondifferentiability discrete spikes, it is difficult to train SNNs with gradient descent error backpropagation online. In this article, we propose an improved training algorithm for multilayer memristive SNN (MSNN) three spontaneously, supporting <italic...

10.1109/tcds.2021.3049487 article EN IEEE Transactions on Cognitive and Developmental Systems 2021-01-07

This paper proposes a method that renders the weights of neural network with quaternary synapses map into only four-level memristance memristive devices. We show this is capable operating negligible loss in classification accuracy when memristors utilized can store at least four unique values. Compared other state-of-the-art methods, presented achieve 98.65% under 0.60M parameters. Systematic error analysis shows still reach over 95% condition yield memristor crossbar array, 100 µV op-amp...

10.1587/elex.16.20190004 article EN IEICE Electronics Express 2019-01-01

Vector neural network (VNN) is one of the most important methods to process interval data. However, VNN, which contains a great number multiply-accumulate (MAC) operations, often adopts pure numerical calculation method, and thus difficult be miniaturized for embedded applications. In this paper, we propose memristor based vector-type backpropagation (MVTBP) architecture utilizes memristive arrays accelerate MAC operations Owing unique brain-like synaptic characteristics devices, e.g. ,...

10.1088/1674-1056/ab65b5 article EN Chinese Physics B 2019-12-27

The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem Kakade, Sridharan, Tewari (2008), which established via by viewing Gibbs classifiers as linear operators. goal this paper to extend bridge between state-of-the-art theory. We first demonstrate that one can match fast rate Catoni's bounds (Catoni, 2007) using shifted processes (Wegkamp, 2003; Lecué Mitchell, 2012; Zhivotovskiy Hanneke, 2018). then derive a...

10.48550/arxiv.1908.07585 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...