NFDI4DS | UHH-SEMS - Publication Details

José Nelson Amaral

ORCID: 0000-0002-9943-1809

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5048554669

Research Areas

Parallel Computing and Optimization Techniques
Distributed and Parallel Computing Systems
Advanced Data Storage Technologies
Distributed systems and fault tolerance
Software Testing and Debugging Techniques
Cloud Computing and Resource Management
Software Engineering Research
Embedded Systems Design Techniques
Logic, programming, and type systems
Interconnection Networks and Systems
Algorithms and Data Compression
Network Packet Processing and Optimization
Formal Methods in Verification
AI-based Problem Solving and Planning
Caching and Content Delivery
Advanced Neural Network Applications
Software System Performance and Reliability
VLSI and FPGA Design Techniques
Advanced Database Systems and Queries
Neural Networks and Applications
Software-Defined Networks and 5G
Optimization and Packing Problems
Scheduling and Optimization Algorithms
Security and Verification in Computing
Computational Physics and Python Applications

University of Alberta
2016-2025

University of California System
2021

Athabasca University
2004-2016

Universidade Estadual de Campinas (UNICAMP)
2015

IBM (Canada)
2014

The University of Texas at Austin
1992-2002

University of Delaware
1998-2002

Pontifícia Universidade Católica do Rio Grande do Sul
1990-2002

In defense of soundiness

OPENALEX - Publications

Benjamin Livshits Manu Sridharan Yannis Smaragdakis Ondřej Lhoták José Nelson Amaral and 5 more

Soundy is the new sound.

10.1145/2644805 article EN Communications of the ACM 2015-01-28

Evaluation of Blue Gene/Q hardware support for transactional memories

OPENALEX - Publications

Amy Wang Matthew Gaudet Peng Wu José Nelson Amaral Martin Ohmacht and 3 more

This paper describes an end-to-end system implementation of the transactional memory (TM) programming model on top hardware (HTM) Blue Gene/Q (BG/Q) machine. The TM supports most C/C++ constructs a best-effort HTM with help complete software stack including compiler, kernel, and runtime.

10.1145/2370816.2370836 article EN 2012-09-19

Syntax and sensibility: Using language models to detect and correct syntax errors

OPENALEX - Publications

Eddie Antonio Santos Joshua Charles Campbell Dhvani Patel Abram Hindle José Nelson Amaral

Syntax errors are made by novice and experienced programmers alike; however, lack the years of experience that help them quickly resolve these frustrating errors. Standard LR parsers little help, typically resolving syntax their precise location poorly. We propose a methodology locates where occur, suggests possible changes to token stream can fix error identified. This finds using language models trained on correct source code find tokens seem out place. Fixes synthesized consulting...

10.1109/saner.2018.8330219 article EN 2018-03-01

Syntax errors just aren't natural: improving error reporting with language models

OPENALEX - Publications

Joshua Charles Campbell Abram Hindle José Nelson Amaral

A frustrating aspect of software development is that compiler error messages often fail to locate the actual cause a syntax error. An errant semicolon or brace can result in many errors reported throughout file. We seek find source these by relying on consistency software: valid code usually repetitive and unsurprising. exploit this constructing simple N-gram language model lexed tokens. implemented an automatic Java syntax-error locator using corpus project itself evaluated its performance...

10.1145/2597073.2597102 article EN 2014-05-20

Methodological Principles for Reproducible Performance Evaluation in Cloud Computing

OPENALEX - Publications

Alessandro Vittorio Papadopoulos Laurens Versluis André Bauer Nikolas Herbst Jóakim von Kistowski and 5 more

The rapid adoption and the diversification of cloud computing technology exacerbate importance a sound experimental methodology for this domain. This work investigates how to measure report performance in cloud, well research community is already doing it. We propose set eight important methodological principles that combine best-practices from nearby fields with concepts applicable only clouds, new ideas about time-accuracy trade-off. show these are using practical use-case experiment. To...

10.1109/tse.2019.2927908 article EN publisher-specific-oa IEEE Transactions on Software Engineering 2019-07-10

Scalar Interpolation: A Better Balance between Vector and Scalar Execution for SuperScalar Architectures

OPENALEX - Publications

Reza Ghanbari Henry Kao João P. L. de Carvalho E. Maali Amiri José Nelson Amaral

10.1145/3696443.3708950 article EN 2025-02-22

Shared memory programming for large scale machines

OPENALEX - Publications

Christopher Barton Cǎlin Caşcaval George Almási Yili Zheng Montse Farreras and 2 more

This paper describes the design and implementation of a scalable run-time system an optimizing compiler for Unified Parallel C (UPC). An experimental evaluation on BlueGene/L®, distributed-memory machine, demonstrates that combination with runtime produces programs performance comparable to efficient MPI good scalability up hundreds thousands processors.Our solves problem maintaining shared object consistency efficiently in distributed memory machine. Our infrastructure simplifies code...

10.1145/1133255.1133995 article EN ACM SIGPLAN Notices 2006-06-11

The Truth, The Whole Truth, and Nothing But the Truth

OPENALEX - Publications

Stephen M. Blackburn Amer Diwan Matthias Hauswirth Peter F. Sweeney José Nelson Amaral and 14 more

An unsound claim can misdirect a field, encouraging the pursuit of unworthy ideas and abandonment promising ideas. inadequate description make it difficult to reason about claim, for example, determine whether is sound. Many practitioners will acknowledge threat claims or descriptions their field. We believe that this situation exacerbated, even encouraged, by lack systematic approach exploring, exposing, addressing source poor exposition. This article proposes framework identifies three...

10.1145/2983574 article EN ACM Transactions on Programming Languages and Systems 2016-10-13

Teaching Digital Design to Computing Science Students in a Single Academic Term

OPENALEX - Publications

José Nelson Amaral Paul Berube Paras Mehta

How should digital design be taught to computing science students in a single one-semester course? This work advocates the use of state-of-the-art tools and programmable devices presents series laboratory exercises help learn logic. Each exercise introduces new concepts produces complete stand-alone apparatus that is fun interesting use. These lead most challenging capstone designs for single-semester course which authors are aware. Fast progress made possible by providing with predesigned...

10.1109/te.2004.837048 article EN IEEE Transactions on Education 2005-02-01

Shared memory programming for large scale machines

OPENALEX - Publications

Christopher Barton Cǎlin Caşcaval George Almási Yili Zheng Montse Farreras and 2 more

10.1145/1133981.1133995 article EN 2006-06-11

Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms

OPENALEX - Publications

Timothy Furtak José Nelson Amaral Robert Niewiadomski

Most contemporary processors offer some version of Single Instruction Multiple Data (SIMD) machinery - vector registers and instructions to manipulate data stored in such registers. The central idea this paper is use these SIMD resources improve the performance tail recursive sorting algorithms. When number elements be sorted reaches a set threshold, loaded into registers, manipulated in-register, result back memory. Three implementations with two different machineries x86-64's SSE2 G5's...

10.1145/1248377.1248436 article EN 2007-06-09

Compiling Python to a hybrid execution environment

OPENALEX - Publications

Rahul Garg José Nelson Amaral

A new compilation framework enables the execution of numerical-intensive applications, written in Python, on a hybrid environment formed by CPU and GPU. This compiler automatically computes set memory locations that need to be transferred GPU, produces correct mapping between GPU address spaces. Thus, programming model implements virtual shared space. is implemented as combination unPython, an ahead-of-time from Python/NumPy C language, jit4GPU, just-in-time AMD CAL interface. Experimental...

10.1145/1735688.1735695 article EN 2010-03-14

On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks

OPENALEX - Publications

Jeeva Paudel Olivier Tardieu José Nelson Amaral

Improving the performance of work-stealing load-balancing algorithms in distributed shared-memory systems is challenging. These need to overcome high costs contention among workers, communication and remote data-references between nodes, their impact on locality preferences tasks. Prior research focus stealing from a victim that best exploits data locality, using special deques minimize local workers. This work explores selection tasks are favourable for migration across nodes memory...

10.1109/icpp.2013.19 article EN 2013-10-01

Designing genetic algorithms for the state assignment problem

OPENALEX - Publications

José Nelson Amaral Kagan Tumer Joydeep Ghosh

Finding the best state assignment for implementing a synchronous sequential circuit is important reducing silicon area or chip count in many digital designs. This problem (SAP) belongs to broader class of combinatorial optimization problems than well studied traveling salesman problem, which can be formulated as special case SAP. The search good solution considerably involved SAP due large number equivalent solutions, and no effective heuristic has been found so far cater all types circuits....

10.1109/21.370202 article EN IEEE Transactions on Systems Man and Cybernetics 1995-04-01

Minimum register instruction sequencing to reduce register spills in out-of-order issue superscalar architectures

OPENALEX - Publications

R. Govindarajan Hongbo Yang José Nelson Amaral Chihong Zhang Guang R. Gao

In this paper, we address the problem of generating an optimal instruction sequence S for a Directed Acyclic Graph (DAG), where is in terms number registers used. We call Minimum Register Instruction Sequence (MRIS) problem. The motivation revisiting MRIS stems from several modern architecture innovations/requirements that has put sequencing new context. develop efficient heuristic solution This based on notion lineage-a set instructions can definitely share single register. formation...

10.1109/tc.2003.1159750 article EN IEEE Transactions on Computers 2003-01-01

MPADS

OPENALEX - Publications

Stephen Curial Peng Zhao José Nelson Amaral Yaoqing Gao Shimin Cui and 2 more

This paper describes Memory-Pooling-Assisted Data Splitting (MPADS), a framework that combines data structure splitting with memory pooling --- Although it MPADS may call to mind padding, distintion of this is does not insert padding. relies on pointer analysis ensure safe and applicable type-unsafe language. makes no assumption about type safety. The can identify cases in which the transformation could lead incorrect code thus abandons those cases.

10.1145/1375634.1375649 article EN 2008-06-07

Using Hardware-Transactional-Memory Support to Implement Thread-Level Speculation

OPENALEX - Publications

Juan Salamanca José Nelson Amaral Guido Araújo

This paper presents a detailed analysis of the application Hardware Transactional Memory (HTM) support for loop parallelization with Thread-Level Speculation (TLS) and describes careful evaluation implementation TLS on HTM extensions available in such machines. The sample over described this also provides evidence that programming effort to implement is non-trivial. Thus an extension OpenMP both makes more accessible programmers allows easytuning parameters. As result, it several important...

10.1109/tpds.2017.2752169 article EN IEEE Transactions on Parallel and Distributed Systems 2017-09-14

Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions

OPENALEX - Publications

Victor Ferrari Rafael Sousa Marcio Pereira João P. L. de Carvalho José Nelson Amaral and 2 more

Convolution is one of the most computationally intensive operations that must be performed for machine learning model inference. A traditional approach to computing convolutions known as Im2Col + BLAS method. This article proposes SConv: a direct-convolution algorithm based on an MLIR/LLVM code-generation toolchain can integrated into machine-learning compilers. introduces: (a) Slicing Analysis (CSA)—a convolution-specific 3D cache-blocking analysis pass focuses tile reuse over cache...

10.1145/3625004 article EN ACM Transactions on Architecture and Code Optimization 2023-09-20

Forma

OPENALEX - Publications

Peng Zhao Shimin Cui Yaoqing Gao Raúl Silvera José Nelson Amaral

This article presents Forma , a practical, safe, and automatic data reshaping framework that reorganizes arrays to improve locality. splits large aggregated data-types into smaller ones Arrays of these types are then replaced by multiple the types. These new form natural streams have memory footprints, better locality, more suitable for hardware stream prefetching. consists field-sensitive alias analyzer, type checker, portable structure planner, an array reshaper. An extensive experimental...

10.1145/1290520.1290522 article EN ACM Transactions on Programming Languages and Systems 2007-11-01

KernelFaRer

OPENALEX - Publications

João P. L. de Carvalho Braedy Kuzma Ivan Korostelev José Nelson Amaral Christopher Barton and 2 more

Well-crafted libraries deliver much higher performance than code generated by sophisticated application programmers using advanced optimizing compilers. When a pattern for which well-tuned library implementation exists is found in the source of an application, highest performing solution to replace with call library. Idiom-recognition solutions past either required matching machinery that was outside compilation framework or provided very brittle would fail even minor variants code. This...

10.1145/3459010 article EN ACM Transactions on Architecture and Code Optimization 2021-06-28

Using machines to learn method-specific compilation strategies

OPENALEX - Publications

Ricardo Nabinger Sanchez José Nelson Amaral Duane Szafron Marius Pirvu Mark Stoodley

Support Vector Machines (SVMs) are used to discover method-specific compilation strategies in Testarossa, a commercial Just-in-Time (JiT) compiler employed the IBM® J9 Java™ Virtual Machine. The learning process explores large number of different generate data needed for training models. trained machine-learned model is integrated with predict plan that balances code quality and effort on per-method basis. plans outperform original Testarossa start-up performance, but not throughput which...

10.5555/2190025.2190072 article EN 2011-04-02

Improving communication in PGAS environments

OPENALEX - Publications

Michail Alvanos Montse Farreras Ettore Tiotto José Nelson Amaral Xavier Martorell

The goal of Partitioned Global Address Space (PGAS) languages is to improve programmer productivity in large scale parallel machines. However, PGAS programs may have many fine-grained shared accesses that lead performance degradation. Manual code transformations or compiler optimizations are required the with accesses. downside manual increased program complexity hinders productivity. On other hand, most fine-grain require knowledge physical data mapping and use loop constructs.

10.1145/2464996.2465006 article EN 2013-05-28

Coming Soon ...

ORKG

DBLP

CEUR

MyBinder

José Nelson Amaral