- Parallel Computing and Optimization Techniques
- Algorithms and Data Compression
- Artificial Intelligence in Healthcare and Education
- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Software Engineering Research
- Software Testing and Debugging Techniques
- Software Reliability and Analysis Research
- Topic Modeling
- Natural Language Processing Techniques
- Optimization and Search Problems
- Protein Structure and Dynamics
- Tensor decomposition and applications
- Distributed systems and fault tolerance
- Graph Theory and Algorithms
- Ethics and Social Impacts of AI
- Quantum Computing Algorithms and Architecture
- Distributed and Parallel Computing Systems
- Machine Learning and Algorithms
- RNA and protein synthesis mechanisms
- Adversarial Robustness in Machine Learning
- Advanced Graph Neural Networks
- Radiation Effects in Electronics
- Embedded Systems Design Techniques
Intel (United States)
2017-2025
Intel (United Kingdom)
2023-2024
Stony Brook University
2012-2022
State University of New York
2015
The ethical and societal implications of artificial intelligence systems raise concerns. In this article, we outline a novel process based on applied ethics, namely, Z-Inspection <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">®</sup> , to assess if an AI system is trustworthy. We use the definition trustworthy given by high-level European Commission's expert group AI. general inspection that can be variety domains where are used, such as...
This paper documents how an ethically aligned co-design methodology ensures trustworthiness in the early design phase of artificial intelligence (AI) system component for healthcare. The explains decisions made by deep learning networks analyzing images skin lesions. trustworthy AI developed here used a holistic approach rather than static ethical checklist and required multidisciplinary team experts working with designers their managers. Ethical, legal, technical issues potentially arising...
Artificial Intelligence (AI) has the potential to greatly improve delivery of healthcare and other services that advance population health wellbeing. However, use AI in also brings risks may cause unintended harm. To guide future developments AI, High-Level Expert Group on set up by European Commission (EC), recently published ethics guidelines for what it terms “trustworthy” AI. These are aimed at a variety stakeholders, especially guiding practitioners toward more ethical robust...
This article's main contributions are twofold: 1) to demonstrate how apply the general European Union's High-Level Expert Group's (EU HLEG) guidelines for trustworthy AI in practice domain of healthcare and 2) investigate research question what does "trustworthy AI" mean at time COVID-19 pandemic. To this end, we present results a post-hoc self-assessment evaluate trustworthiness an system predicting multiregional score conveying degree lung compromise patients, developed verified by...
Abstract Building artificial intelligence (AI) systems that adhere to ethical standards is a complex problem. Even though multitude of guidelines for the design and development such trustworthy AI exist, these focus on high-level abstract requirements systems, it often very difficult assess if specific system fulfills requirements. The Z-Inspection® process provides holistic dynamic framework evaluate trustworthiness at different stages lifecycle, including intended use, design, development....
State-of-the-art cache-oblivious parallel algorithms for dynamic programming (DP) problems usually guarantee asymptotically optimal cache performance without any tuning of parameters, but they often fail to exploit the theoretically best parallelism at same time. While these achieve cache-optimality through use a recursive divide-and-conquer (DAC) strategy, scheduling tasks granularity task dependency introduces artificial dependencies in addition those arising from defining recurrence...
We present AUTOGEN---an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions DP recurrences. AUTOGEN analyzes the set table locations accessed by when run on small size, and identifies access pattern corresponding provably correct solving recurrence. use to autodiscover several well-known problems. Our experimental results show...
Dynamic Programming (DP) problems arise in wide range of application areas spanning from logistics to computational biology. In this paper, we show how obtain high-performing parallel implementations for a class Problems by reducing them highly utilizable flexible kernels through cache-oblivious recursive divide- and-conquer(CORDAC). We implement CORDAC algorithms four non-trivial DP problems, namely the parenthesization problem, Floyd-Warshall's all-pairs shortest path (FW-APSP), sequence...
The analysis of high-dimensional sparse data is becoming increasingly popular in many important domains. However, real-world tensors are challenging to process due their irregular shapes and distributions. We propose the Adaptive Linearized Tensor Order (ALTO) format, a novel mode-agnostic (general) representation that keeps neighboring nonzero elements multi-dimensional space close each other memory. To generate indexing metadata, ALTO uses an adaptive bit encoding scheme trades off index...
In this paper, we demonstrate the ability of spatial architectures to significantly improve both runtime performance and energy efficiency on edit distance, a broadly used dynamic programming algorithm. Spatial are an emerging class application accelerators that consist network many small efficient processing elements can be exploited by large domain applications. utilize dataflow characteristics inherent pipeline parallelism within distance algorithm develop scalable implementations...
Iterative wavefront algorithms for evaluating dynamic programming recurrences exploit optimal parallelism but show poor cache performance. Tiled-iterative achieve complexity and high are cache-aware hence not portable cache-adaptive. On the other hand, standard cache-oblivious recursive divide-and-conquer have serial often low due to artificial dependencies among subtasks. Recently, we introduced (COW) algorithms, which do any dependencies, they too complicated develop, analyze, implement,...
High performance large scale graph analytics is essential to timely analyze relationships in big data sets. Conventional processor architectures suffer from inefficient resource usage and bad scaling on workloads. To enable efficient scalable analysis, Intel developed the Programmable Integrated Unified Memory Architecture (PIUMA). PIUMA consists of many multi-threaded cores, fine-grained memory network accesses, a globally shared address space powerful offload engines. This paper presents...
High performance large scale graph analytics are essential to timely analyze relationships in big data sets. Conventional processor architectures suffer from inefficient resource usage and bad scaling on those workloads. To enable efficient scalable analysis, Intel <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">®</sup> developed the Programmable Integrated Unified Memory Architecture (PIUMA) as a part of DARPA Hierarchical Identify Verify...
High-dimensional sparse data emerge in many critical application domains such as cybersecurity, healthcare, anomaly detection, and trend analysis. To quickly extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world tensors exhibit highly irregular shapes, distributions, sparsity, which pose significant challenges for making efficient use modern parallel...
We present A utogen —an algorithm that for a wide class of dynamic programming (DP) problems automatically discovers highly efficient cache-oblivious parallel recursive divide-and-conquer algorithms from inefficient iterative descriptions DP recurrences. analyzes the set table locations accessed by when run on small size and identifies access pattern corresponding provably correct solving recurrence. use to autodiscover several well-known problems. Our experimental results show...
Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework accelerating fundamental TD operations on massively parallel GPU architectures. In contrast to prior work, the proposed Blocked Linearized Coordinate (BLCO) format enables efficient out-of-memory computation of tensor algorithms using unified implementation that works single copy. Our adaptive blocking and linearization...
This report is a methodological reflection on Z-Inspection$^{\small{\circledR}}$. Z-Inspection$^{\small{\circledR}}$ holistic process used to evaluate the trustworthiness of AI-based technologies at different stages AI lifecycle. It focuses, in particular, identification and discussion ethical issues tensions through elaboration socio-technical scenarios. uses general European Union's High-Level Expert Group's (EU HLEG) guidelines for trustworthy AI. illustrates both researchers...
Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary identify correct configuration, along with their valid bounds rules, which are commonly referred as specifications. As systems grow complexity scale, number associated specifications required operation can become large prohibitively difficult manipulate manually. Due fast pace development, often case that not thoroughly checked or...
Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular accesses with poor locality. Intel's Programmable Integrated Unffied Memory Architecture (PIUMA) designed address these challenges for graph analytics. In this paper, detailed characterization of GCNs presented using the Open-Graph Benchmark (OGB) datasets determine viability PIUMA as potential solution GCN...
Community detection algorithms try to identify the underlying community structure (i.e., clearly distinguishable closely interacting groups of vertices) in a graph representing complex systems such as social networks, protein-protein interaction and World-Wide-Web. The Louvain algorithm iteratively moves vertices from one another construct disjoint sets form communities that within same have more edges themselves compared their connections outside community. A property is number vertex drops...
The state-of-the-art "trapezoidal decomposition algorithm" for stencil computations on modern multicore machines use recursive divide-and-conquer (DAC) to achieve asymptotically optimal cache complexity cache-obliviously. But the same DAC approach restricts parallelism by introducing artificial dependencies among subtasks in addition those arising from defining equations. As a result, trapezoidal algorithm has suboptimal parallelism.
Dynamic load-balancing in parallel algorithms typically requires locks and/or atomic instructions for correctness. We have shown that sometimes an optimistic parallelization approach can be used to avoid the use of and during dynamic load balancing. In this one allows potentially conflicting operations run with hope everything will without conflicts, if any occasional inconsistencies arise due able handle them hampering overall correctness program. implement two new types high-performance...
State-of-the-art cache-oblivious parallel algorithms for dynamic programming (DP) problems usually guarantee asymptotically optimal cache performance without any tuning of parameters, but they often fail to exploit the theoretically best parallelism at same time. While these achieve cache-optimality through use a recursive divide-and-conquer (DAC) strategy, scheduling tasks granularity task dependency introduces artificial dependencies in addition those arising from defining recurrence...