- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- VLSI and Analog Circuit Testing
- Interconnection Networks and Systems
- VLSI and FPGA Design Techniques
- Low-power high-performance VLSI design
- Advanced Memory and Neural Computing
- Advanced Data Storage Technologies
- Ferroelectric and Negative Capacitance Devices
- Advanced Vision and Imaging
- Scientific Computing and Data Management
- Video Coding and Compression Technologies
- Error Correcting Code Techniques
- Advanced Wireless Communication Techniques
- CCD and CMOS Imaging Sensors
- Genomics and Phylogenetic Studies
- Real-Time Systems Scheduling
- Advanced Image and Video Retrieval Techniques
- Wireless Communication Security Techniques
- Image and Video Quality Assessment
- Cardiovascular Function and Risk Factors
- Smart Agriculture and AI
- Wireless Signal Modulation Classification
University of Arizona
2015-2024
Politecnico di Milano
2022
University of Bremen
2022
University of California, Santa Barbara
2022
University of Patras
2022
Bridge University
2022
Özyeğin University
2022
Centre National de la Recherche Scientifique
2017
CY Cergy Paris Université
2017
École Nationale Supérieure de l'Électronique et de ses Applications
2017
The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). developing uniquely enables scientists throughout the diverse fields comprise address Grand Challenges new ways, stimulate facilitate cross-disciplinary research, promote computer science interactions, train next generation on use education. Meeting...
In bioinformatics, alignments are commonly performed in genome and protein sequence analysis for gene identification evolutionary similarities. There several approaches such analysis, each varying accuracy computational complexity. Smith-Waterman (SW) is by far the best algorithm its similarity scoring. However, execution time of this on general purpose processor based systems makes it impractical use life scientists. paper we take as a case study to explore architectural features Graphics...
This paper explores the pros and cons of reconfigurable computing in form FPGAs for high performance efficient computing. In particular, presents results a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), IBM’s Cell Broadband Engine (Cell BE), design implementation widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as base reference...
Heterogeneous systems-on-chip (SoCs) are highly favorable computing platforms due to their superior performance and energy efficiency potential compared homogeneous architectures. They can be further tailored a specific domain of applications by incorporating processing elements (PEs) that accelerate frequently used kernels in these applications. However, this is contingent upon optimizing the SoC for target utilizing its resources effectively at runtime. To end, system-level design -...
In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of architecture, system software, and application development with distinct plug-and-play integration points in unified compile time runtime workflow. We demonstrate utility CEDR Xilinx Zynq MPSoC-ZCU102 for evaluating performance pre-silicon hardware trade space SoC configuration, scheduling policy workload complexity...
Applications targeting FPGA integrated systems impose strict energy, channel width and delay constraints. We introduce the first many-objective clustering, MO-Pack, that targets these performance metrics concurrently. Detailed comparisons over state of art clustering strategies energy (P-T-VPack), (T-VPack), (iRAC), timing routability (T-RPack) show MO-Pack achieves its goals without increasing logic area.
Neuromorphic architectures have been introduced as platforms for energy efficient spiking neural network execution. The massive parallelism offered by these has also triggered interest from non-machine learning application domains. In order to lift the barriers entry hardware designers and developers we present RANC: a Reconfigurable Architecture Computing, an open-source highly flexible ecosystem that enables rapid experimentation with neuromorphic in both software via C++ simulation FPGA...
Performance-, power-, and energy-aware scheduling techniques play an essential role in optimally utilizing processing elements (PEs) of heterogeneous systems. List schedulers, a class low-complexity static have commonly been used execution scenarios. However, list schedulers are not suitable for runtime decision making, particularly when multiple concurrent applications interleaved dynamically. For such cases, the task times expectation idle PEs assumed by lead to inefficient system...
The power consumption of data centers and cloud systems have increased almost three times between 2007 2012. Over-provisioning techniques are typically used for meeting the peak workloads. In this paper we present an autonomic performance management method in order to dynamically match application requirements with "just-enough" system resources at runtime that lead significant reduction while quality service applications. Our solution offers following capabilities: 1) real-time monitoring...
The Gallager B (GaB), among the hard-decision class of low-density-parity-check (LDPC) algorithms, is an ideal candidate for designing high-throughput decoder hardware. However, GaB suffers from poor error-correction performance. We introduce a probabilistic (PGaB) algorithm that disturbs decisions made during decoding iterations randomly with probability value determined based on experimental studies. propose heuristic switches to PGaB after certain number and show our reduces average...
In this work, we propose a portable, Linux-based emulation framework to provide an ecosystem for hardware-software co-design of Domain-specific SoCs (DSSoCs) and enable their rapid evaluation during the pre-silicon design phase. This holistically targets three key challenges DSSoC design: accelerator integration, resource management, application development. We address these via flexible lightweight user-space runtime environment that enables easy integration new accelerators, scheduling...
Contextual Contrast Limited Adaptive Histogram Equalization (C-CLAHE) is an effective method for solving the noise amplification effect of adaptive histogram equalization (AHE), and enhancing visibility local details image. Even though C-CLAHE has a smaller memory foot print than CLAHE, complexity interpolation process increases computation demand dramatically. Therefore, FPGA based implementations have been limited to CLAHE only. In this study we introduce three key modifications C-CLAHE,...
Task scheduling for large scale computing systems is a challenging problem. From the users' perspective, main concern performance of submitted tasks, whereas, cloud service providers, reducing cost while providing required critical. Therefore, there need task mechanisms that balance requirements being energy efficient. We present time dependent Value Service (VoS) metric takes into consideration arrival evaluating value completing within its deadline and consumption constraint. consider...
Advances in VLSI technology have led to fabrication of chips with number transistors reaching a billion figure and projected be 10 the near future. Affordable fault tolerant solutions transparent applications minimal hardware overhead micro architecture are necessary mitigate component level errors for emerging system-on-chip (SoC) platforms. Paper addresses built-in self-testing detection, isolation recovery capabilities offer 100% system availability. We reduce complexity testing two-phase...
Creating an environment of ldquono doubtrdquo for mission success is essential to most critical embedded applications. With reconfigurable devices such as field programmable gate arrays (FPGAs), designers are provided with a seductive tool use basis sophisticated but highly reliable platforms. We propose two-level self-healing methodology increasing the probability in missions. Our proposed system first undertakes healing at node-level. Failing rectify node-level, network-level undertaken....
Low-cost FPGAs have comparable number of Configurable Logic Blocks (CLBs) with respect to resource-rich but much less routing tracks. For CAD tools, this situation increases the difficulty successfully mapping a circuit into low-cost FPGAs. Instead switching FPGAs, designers could employ depopulation-based clustering techniques which underuse CLBs, hence improve routability by spreading logic over architecture. However, all algorithms date increase critical path delay. In paper, we present...
We introduce a new metric, Value of Service (VoS), which enables resource management techniques for high-performance computing (HPC) systems to take into consideration the value completion time task and energy used compute that at given instant time. These functions have soft-threshold, where function begins decrease from its maximum value, hard-threshold, goes zero. Each has an associated importance factor express relative significance among tasks. define as weighted sum performance energy,...
Power-aware scheduling has become a critical research thrust for deploying exascale High Performance Computing (HPC) systems with limited power budget. Time-varying pricing of electricity respect to the market demand and dynamic HPC workloads can lead unpredictable operational cost, which complicates decisions further. For an oversubscribed system, value based heuristics have been shown be more productive option time-constrained tasks over priority deadline heuristics. However, higher...
In this article, we investigate limitations in the traditional value-based algorithms for a power-constrained HPC system and evaluate their impact on productivity. We expose trade-off between allocating system-wide power budget uniformly greedily under different constraints an oversubscribed system. experimentally demonstrate that, tightest constraint, mean productivity of greedy allocation is 38 percent higher than uniform whereas, intermediate has 6 allocation. then propose new algorithm...
Domain-specific systems-on-chip (DSSoCs) aim at bridging the gap between application-specific integrated circuits (ASICs) and general-purpose processors. Traditional operating system (OS) schedulers can undermine potential of DSSoCs since their execution times be orders magnitude larger than time task itself. To address this problem, we propose a dynamic adaptive scheduling (DAS) framework that combines benefits fast (low-overhead) scheduler slow (sophisticated, high-performance but...