Leonel Sousa

ORCID: 0000-0002-8066-221X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Video Coding and Compression Technologies
  • Cryptography and Residue Arithmetic
  • Coding theory and cryptography
  • Cryptographic Implementations and Security
  • Advanced Vision and Imaging
  • Advanced Data Compression Techniques
  • Distributed and Parallel Computing Systems
  • Cryptography and Data Security
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Cloud Computing and Resource Management
  • Low-power high-performance VLSI design
  • Image and Video Quality Assessment
  • Error Correcting Code Techniques
  • Advanced Wireless Communication Techniques
  • Genomics and Phylogenetic Studies
  • Advanced Memory and Neural Computing
  • Evolutionary Algorithms and Applications
  • CCD and CMOS Imaging Sensors
  • Microfluidic and Bio-sensing Technologies
  • Numerical Methods and Algorithms
  • Neuroscience and Neural Engineering
  • Machine Learning in Bioinformatics

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
2016-2025

University of Lisbon
2015-2024

Instituto Superior Técnico
2010-2024

Universidade de Brasília
2024

Instituto Politécnico de Lisboa
2009-2023

Institut National des Sciences Appliquées de Lyon
2022

Nvidia (United States)
2022

Institut national de recherche en informatique et en automatique
2022

Centre National de la Recherche Scientifique
2022

École Polytechnique
2022

Task scheduling is an essential aspect of parallel programming. Most heuristics for this NP-hard problem are based on a simple system model that assumes fully connected processors and concurrent interprocessor communication. Hence, contention communication resources not considered in task scheduling, yet it has strong influence the execution time program. This paper investigates incorporation awareness into scheduling. A new proposed, allowing us to capture both end-point network contention....

10.1109/tpds.2005.64 article EN IEEE Transactions on Parallel and Distributed Systems 2005-05-03

The Roofline model graphically represents the attainable upper bound performance of a computer architecture. This paper analyzes original and proposes novel approach to provide more insightful modeling modern architectures by introducing cache-awareness, thus significantly improving guidelines for application optimization. proposed was experimentally verified different taking advantage built-in hardware counters with curve fitness above 90%.

10.1109/l-ca.2013.6 article EN IEEE Computer Architecture Letters 2013-04-22

Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable parallel computing are proposed in to perform decoding on multicore architectures. To evaluate efficiency algorithms, decoders were developed recent multicores, such as off-the-shelf general-purpose x86 processors, Graphics Processing Units (GPUs), CELL Broadband Engine...

10.1109/tpds.2010.66 article EN IEEE Transactions on Parallel and Distributed Systems 2010-04-09

This paper proposes two architectures for the acceleration of Number Theoretic Transforms (NTTs) using a novel Montgomery-based butterfly. We first design custom NTT hardware accelerator Field-Programmable Gate Arrays (FPGAs). The butterfly architecture is expanded to Modular Arithmetic Logic Unit (MALU) and greater reuse easier programmability six-stage pipeline Linux-ready RISC-V core extended with instructions. performance proposed assessed on Xilinx Ultrascale+ FPGA an...

10.1109/tcsi.2022.3166550 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2022-04-27

This paper presents a prototype of platform for biomolecular recognition detection. The system is based on magnetoresistive biochip that performs biorecognition assays by detecting magnetically tagged targets. All the electronic circuitry addressing, driving and reading out signals from spin-valve or magnetic tunnel junctions sensors implemented using off-the-shelf components. Taking advantage digital signal processing techniques, acquired are processed in real time transmitted to analyzer...

10.3390/s90604119 article EN cc-by Sensors 2009-05-27

Cryptography plays a major role assuring security in computation and communication. In particular, public-key cryptography enables the asymmetrical ciphering of data along with authentication parties that are attempting to share data. The encryption is costly, thus it has motivated extensive research efficiently accelerate execution most relevant algorithms improve resistance against Side-Channel Attacks (SCAs), which leverage exposed features by cryptographic systems, such as power...

10.1109/mcas.2016.2614714 article EN IEEE Circuits and Systems Magazine 2016-01-01

Over the last years positioning systems have become increasingly pervasive, covering most of planet's surface. Although they are accurate enough for a large number uses, their precision, power consumption, and hardware requirements establish limits adoption in mobile devices. In this paper, energy consumption proposed deep learning-based millimeter wave method is assessed, being subsequently compared to state-of-the-art on outdoor systems. Requiring as low 0.4 mJ per position fix, when...

10.1109/jetcas.2020.2991024 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2020-04-28

Arithmetic plays a major role in computer?s performance and efficiency. Building new computing platforms supported by the traditional binary arithmetic silicon-based technologies to meet requirements of today?s applications is becoming increasingly more challenging, regardless whether we consider embedded devices or high-performance computers. As result, significant amount research effort has been devoted study nonconventional number systems investigate efficient circuits improved computer...

10.1109/mcas.2020.3027425 article EN IEEE Circuits and Systems Magazine 2021-01-01

A new simple and efficient method for avoiding useless computations in the video coding process is proposed. Experimental results show practical interest of reducing computation software coders power consumption hardware coders.

10.1049/el:20000272 article EN Electronics Letters 2000-02-17

Task scheduling is an important aspect of parallel programming. Most the heuristics for this NP-hard problem are based on a very simple system model target system. Experiments revealed inappropriateness classic to obtain accurate and efficient schedules real-systems. In order overcome shortcoming, new was proposed that considers contention communication resources. Even though accuracy efficiency improved with consideration contention, still not good enough. The crucial involvement processor...

10.1109/tpds.2006.40 article EN IEEE Transactions on Parallel and Distributed Systems 2006-03-01

This paper presents a new set of techniques for hardware implementations secure hash algorithm (SHA) functions. These consist mostly in operation rescheduling and reutilization, therefore, significantly decreasing the critical path required area. Throughputs from 1.3 Gbit/s to 1.8 were obtained SHA on Xilinx VIRTEX II Pro. Compared commercial cores previously published research, these figures correspond an improvement throughput/slice range 29% 59% SHA-1 54% 100% SHA-2. Experimental results...

10.1109/tvlsi.2008.2000450 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2008-07-29

Quantum-dot Cellular Automata (QCA) is a promising successor for CMOS transistor technology, while allowing the implementation of logic circuits using quantum devices, such as dots or single domain nano magnets, new set tools must be developed to assist design and process. Examples are QCADesigner handmade layout physical simulation, also majority optimization. Since no tool available assisting QCA generation, we propose automatically generate circuits. This tool, designated by QCA-Layout...

10.1109/norchp.2007.4481078 article EN NORCHIP 2007-11-01

Due to huge computational requirements, powerful Low-Density Parity-Check (LDPC) error correcting codes, discovered in the early 1960s, have only recently been adopted by emerging communication standards. LDPC decoders are supported VLSI technology, which delivers good parallel power with excellent throughputs, but at expense of significant costs.

10.1145/1542275.1542330 article EN 2009-06-08

Residue number systems (RNS) are non-weighted that allow to perform addition, subtraction and multiplication operations concurrently independently on each residue. The triple moduli set {2n−1, 2n, 2n+1} its respective extensions have gained unprecedent importance in RNS, mainly because of the simplicity arithmetic units for individual channels also converters from RNS. However, there is neither a perfect balance between various elements this nor an exact equivalence complexity Two...

10.1049/iet-cdt:20060059 article EN IET Computers & Digital Techniques 2007-09-04

We are currently faced with the situation where applications have increasing computational demands and there is a wide selection of parallel processor systems. In this paper we focus on exploiting fine-grain parallelism for demanding bioinformatics application - MrBayes its phylogenetic likelihood functions (PLF) using different architectures. Our experiments compare side-by-side scalability performance achieved general-purpose multi-core processors, cell/BE, graphics units (GPU). The...

10.1109/icpp.2009.30 article EN International Conference on Parallel Processing 2009-09-01

Acceleration of cryptographic applications on massive parallel computing platforms, such as Graphic Processing Units (GPUs), becomes a real challenge concerning practical implementations. In this paper, we propose algorithm for Elliptic Curve (EC) point multiplication in order to compute EC cryptography these platforms. The proposed approach relies the usage Residue Number System (RNS) extract parallelism high-precision integer arithmetic. Results suggest maximum throughput 9827...

10.1093/comjnl/bxr119 article EN The Computer Journal 2011-11-30

The moduli set {2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> + 1,2 - , 2 xmlns:xlink="http://www.w3.org/1999/xlink">2n+1</sup> -1} has been recently proposed for supporting residue number systems with dynamic ranges of 5n bits. In this brief, we suggest modifying to -1},in order enlarge the range 6n We propose a method that unifies design efficient reverse converters original and modified sets. A unified architecture was derived...

10.1109/tcsii.2012.2188456 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2012-03-22
Coming Soon ...