- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- VLSI and FPGA Design Techniques
- Low-power high-performance VLSI design
- Viral Infections and Vectors
- Viral Infections and Outbreaks Research
- VLSI and Analog Circuit Testing
- interferon and immune responses
- Radiation Effects in Electronics
- Viral gastroenteritis research and epidemiology
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Machine Learning and Algorithms
- Educational Technology and Assessment
- Advanced Sensor and Control Systems
- Numerical Methods and Algorithms
- Formal Methods in Verification
- Neuroscience and Neural Engineering
- Advanced Data Storage Technologies
- Protein Degradation and Inhibitors
- Advanced Bandit Algorithms Research
- Power Systems and Technologies
Xilinx (United States)
2019-2022
University of Utah
2020
Villanova University
2020
Ningbo University
2020
Cornell University
2014-2019
Washington University in St. Louis
2012-2016
Changchun Institute of Technology
2010
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it went from prototyping to deployment. A decade later, in this article, we assess the progress of deployment HLS technology and highlight successes several application domains, including deep learning, video transcoding, graph processing, genome sequencing. We also discuss challenges faced by today’s opportunities further research development, especially areas achieving high clock frequency, coping with...
During viral RNA synthesis, Ebola virus (EBOV) nucleoprotein (NP) alternates between an RNA-template-bound form and a template-free to provide the polymerase access template. In addition, newly synthesized NP must be prevented from indiscriminately binding noncognate RNAs. Here, we investigate molecular bases for these critical processes. We identify intrinsically disordered peptide derived EBOV VP35 (NPBP, residues 20-48) that binds with high affinity specificity, inhibits oligomerization,...
Modern high-level synthesis (HLS) tools greatly reduce the turn-around time of designing and implementing complex FPGA-based accelerators. They also expose various optimization opportunities, which cannot be easily explored at register-transfer level. With increasing adoption HLS design methodology continued advances optimization, there is a growing need for realistic benchmarks to (1) facilitate comparisons between tools, (2) evaluate stress-test new techniques, (3) establish meaningful...
Filoviruses, marburgvirus (MARV) and ebolavirus (EBOV), are causative agents of highly lethal hemorrhagic fever in humans. MARV EBOV share a common genome organization but show important differences replication complex formation, cell entry, host tropism, transcriptional regulation, immune evasion. Multifunctional filoviral viral protein (VP) 35 proteins inhibit innate responses. Recent studies suggest double-stranded (ds)RNA sequestration is potential mechanism that allows VP35 to...
Viral protein 35 (VP35), encoded by filoviruses, is a multifunctional dsRNA binding that plays important roles in viral replication, innate immune evasion, and pathogenesis. The nature of these proteins also presents opportunities to develop countermeasures target distinct functional regions. However, validation the establishment therapeutic approaches toward such proteins, particularly for nonenzymatic targets, are often challenging. Our previous work on filoviral VP35 defined conserved...
Suppression of innate immune responses during filoviral infection contributes to disease severity. Ebola (EBOV) and Marburg (MARV) viruses each encode a VP35 protein that suppresses RIG-I-like receptor signaling interferon-α/β (IFN-α/β) production by several mechanisms, including direct binding double stranded RNA (dsRNA). Here, we demonstrate in cell culture, MARV results greater upregulation IFN as compared EBOV infection. This correlates with differences the efficiencies which VP35s...
Rapidly emerging workloads require rapidly developed chips. The Celerity 16-nm open-source SoC was implemented in nine months using an architectural trifecta to minimize development time: a general-purpose tier comprised of Linux-capable RISC-V cores, massively parallel tiled manycore array that can be scaled arbitrary sizes, and specialization uses high-level synthesis (HLS) create algorithmic neural-network accelerator. These tiers are tied together with efficient heterogeneous remote...
Mainstream FPGA CAD tools provide an extensive collection of optimization options that have a significant impact on the quality final design. These together create enormous and complex design space cannot effectively be explored by human effort alone. Instead, we propose to search this parameter using autotuning, which is popular approach in compiler domain. Specifically, study effectiveness applying multi-armed bandit (MAB) technique automatically tune for complete compilation flow from RTL...
Current pipelining approach in high-level synthesis (HLS) achieves high performance for applications with regular and statically analyzable memory access patterns. However, it cannot effectively handle infrequent data-dependent structural data hazards because they are conservatively assumed to always occur the synthesized pipeline. To enable high-throughput of irregular loops, we study problem augmenting HLS application-specific dynamic hazard resolution, examine its implications on...
Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive iterations. However, existing HLS techniques provide inadequate support for irregular nests that contain dynamic-bound inner loops, where unrolling is either very expensive or not even applicable. To overcome this major limitation, we propose ElasticFlow, a novel architectural approach capable dynamically distributing loops an array processing...
Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive iterations. However, existing HLS techniques provide inadequate support for irregular nests that contain dynamic-bound inner loops, where unrolling is either very expensive or not even applicable. To overcome this major limitation, we propose ElasticFlow, a novel architectural approach capable dynamically distributing loops an array processing...
Hardware specialization is an increasingly common technique to enable improved performance and energy efficiency in spite of the diminished benefits technology scaling. This paper proposes a new approach called explicit loop (XLOOPS) based on idea elegantly encoding inter-iteration dependence patterns instruction set. XLOOPS supports variety data-and control-dependence for both single nested loops. The hardware/software abstraction requires only lightweight changes general-purpose compiler...
Modern FPGA synthesis tools typically apply a predetermined sequence of logic optimizations on the input network before carrying out technology mapping. While "known recipes" transformations often lead to improved mapping results, there remains nontrivial gap between quality metrics driving pre-mapping and those targeted by actual Needless mention, such miscorrelations would eventually result in suboptimal results. In this paper we propose PIMap, which couples under an iterative improvement...
Approximate logic synthesis generates inexact implementations of functions in exchange for better design qualities such as area, timing and power consumption. However, the error behavior approximate circuits (e.g., rate or magnitude) depends heavily on specific technique well input vectors, hindering end users from confidently adopting designs. In this paper, we propose a statistically certified framework using techniques stochastic optimization, integrate it into state-of-the-art...
Despite increasing adoption of high-level synthesis (HLS) for its design productivity advantage, success in achieving high quality-of-results out-of-the-box is often hindered by the inexactness common HLS optimizations. In particular, while scheduling forms algorithmic core to technology, current algorithms rely heavily on fundamentally inexact heuristics that make ad hoc local decisions and cannot accurately globally optimize over a rich set constraints. To tackle this challenge, we propose...
Loop pipelining is an important optimization in high-level synthesis (HLS) because it allows successive loop iterations to be overlapped during execution. While current HLS approach achieves high performance for loops with regular and statically analyzable program patterns, remains challenging pipeline irregular memory accesses, dependence unbalanced workload. The lack of support dynamic behaviors results conservatively synthesized pipelines that sacrifice maintaining presumed regularity. In...
Existing high-level synthesis (HLS) tools are mostly effective on algorithm-dominated programs that only use primitive data structures such as fixed size arrays and queues. However, many widely used priority queues, heaps, trees feature complex member methods with data-dependent work irregular memory access patterns. These can be inlined to their call sites, but this does not address the aforementioned issues may further complicate conventional HLS optimizations, resulting in a...
Approximate logic synthesis generates inexact implementations of functions in exchange for better design qualities such as area, timing and power consumption. However, the error behavior approximate circuits (e.g., rate or magnitude) depends heavily on specific technique well input vectors, hindering end users from confidently adopting designs. In this paper, we propose a statistically certified framework using techniques stochastic optimization, integrate it into state-of-the-art...
Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive iterations. While existing HLS techniques obtain good performance with low complexity for regular nests, they provide inadequate support effectively synthesizing irregular nests. For nests dynamic-bound inner loops, current require unrolling which is either very expensive in resource or even inapplicable due dynamic bounds. To address this major...
Speculative adders divide addition into subgroups and execute them in parallel for higher execution speed energy efficiency, but at the risk of generating incorrect results. In this paper, we propose a lightweight correlation-aware speculative (CASA) method, which exploits correlation between input data carry-in values observed real-life benchmarks to improve accuracy adders. Experimental results show that applying CASA method leads significant reduction error rate with only marginal...
Modern FPGA synthesis tools typically apply a predetermined sequence of logic optimizations on the input network before carrying out technology mapping. While “known recipes” transformations often lead to improved mapping results, there remains nontrivial gap between quality metrics driving pre-mapping and those targeted by actual Needless mention, such miscorrelations would eventually result in suboptimal results. In this article, we propose PIMap, which couples under an iterative...
The increasing popularity of compute acceleration for emerging domains such as artificial intelligence and computer vision has led to the growing need domain-specific accelerators, often implemented specialized processors that execute a set domain-optimized instructions. ability rapidly explore (1) various possibilities customized instruction set, (2) its corresponding micro-architectural features is critical achieve best quality-of-results (QoRs). However, this frequently hindered by manual...
We present the design and analysis of a novel analog reconfigurable substrate that enables fast efficient computation maximum flow on directed graphs. The is composed memristors standard circuit components, where on/off states crossbar switches encode graph topology. show upon convergence, steady-state voltages in capture solution to problem. also provide techniques minimize impacts variability non-ideal components quality, enabling practical implementation proposed substrate. Performance...