- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Teaching and Learning Programming
- Advanced Neural Network Applications
- Advanced Memory and Neural Computing
- Real-time simulation and control systems
- CCD and CMOS Imaging Sensors
- Distributed and Parallel Computing Systems
- Access Control and Trust
- Marxism and Critical Theory
- Machine Learning in Materials Science
- Security and Verification in Computing
- Computational Physics and Python Applications
- Political Conflict and Governance
- Meteorological Phenomena and Simulations
- Computational Drug Discovery Methods
- Politics and Society in Latin America
- Graph Theory and Algorithms
- Precipitation Measurement and Analysis
- Particle Detector Development and Performance
- Scientific Computing and Data Management
- Distributed systems and fault tolerance
- Cryptography and Data Security
MIT Lincoln Laboratory
2017-2024
Massachusetts Institute of Technology
2021-2024
Ohio Supercomputer Center
2012
The Ohio State University
2012
Indian Institute of Technology Kanpur
2010
Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, many data center applications. This paper updates survey AI processors two years. collects summarizes cur-rent commercial that have been publicly with peak performance power consumption numbers. The values are plotted on scatter graph, number dimensions observations trends this plot again...
GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC AI/ML researchers to advance the scientific discovery process. However, this also leads inefficient resource usage, as most workloads, including complicated models, are not able utilize resources their fullest extent - encouraging support for multi-tenancy. We propose MISO, a technique exploit Multi-Instance (MIG) capability on latest NVIDIA datacenter GPUs (e.g., A100, H100) dynamically...
As parallel applications become more complex, auto-tuning becomes desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage pool of Bayesian Optimization models find the near-optimal parameter setting 1.64× faster than state-of-the-art approaches.
MATLABR is a popular choice for algorithm development in signal and image processing. While traditionally this done using sequential MATLAB running on desktop systems, recent years have seen surge of interest parallel to take advantage multi-processor multi-core systems. In paper, we discuss three variations MATLAB, two which are available as commercial, supported products. We also consider with key computations speeded up multi-threaded GPGPUs. Two processing kernels (FFT convolution) full...
Knights Landing (KNL) is the code name for second-generation Intel Xeon Phi product family. KNL has generated significant interest in data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The vector processor design enables it to exploit much higher levels parallelism. At Lincoln Laboratory Supercomputing Center (LLSC), majority users are running applications such as MATLAB Octave. More recently, applications, UC Berkeley Caffe...
This paper presents a vision and description for query control, which is paradigm database access control. In this model, individual queries are examined before being executed either allowed or denied by pre-defined policy. Traditional view-based control requires the enforcer to view query, records, both. That may present difficulty when not contents itself. discussion of arises from our experience with privacy-preserving encrypted databases, in no single entity learns both contents. Query...
As research and deployment of AI grows, the computational burden to support sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form hardware acceleration is virtually a requirement. Recent large language require considerable resources deploy, resulting significant energy usage, potential carbon emissions, massive demand for GPUs other accelerators. However, this surge carries implications sustainability at...
The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving optimal operations within requires tuning OpenMP settings, process pinning, memory modes. This paper describes multiplication results for Matlab Octave over a variety combinations counts threads These indicate that using KMP_AFFINITY=granlarity=fine, taskset...
This paper is an update of the survey AI accelerators and processors from past four years, which now called Lincoln Computing Survey - LAICS (pronounced "lace"). As in this collects summarizes current commercial that have been publicly announced with peak performance power consumption numbers. The values are plotted on a scatter graph, number dimensions observations trends plot again discussed analyzed. Market segments highlighted plot, zoomed plots each segment also included. Finally, brief...
The BigDAWG polystore database system aims to address workloads dealing with large, heterogeneous datasets. need for such a is motivated by an increase in Big Data applications disparate types of data, from large scale analytics realtime data streams text-based records, each suited different storage engines. These often perform cross-engine queries on correlated resulting complex query planning, migration, and execution. One application medical built the Intel Science Technology Center...
In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This approach eliminates issues with overload resource contention when central parallel jobs. The incurs additional overhead due to inter-node message file transfers both sending receiving processes are not on same node. However, even cost, its benefits far greater overall cluster operation in addition performance enhancement communications For example,...
One of the more complex tasks for researchers using HPC systems is performance monitoring and tuning their applications.Developing a practice continuous improvement, both speed-up efficient use resources essential to long term success practitioner research project.Profiling tools provide nice view an application but often have steep learning curve rarely easy interpret resource utilization.Lower level such as top htop utilization those familiar comfortable with Linux barrier newer...
Cyber Physical Systems (CPS) are the conjoining of an entities' physical and computational elements.The development a typical CPS system follows sequence from conceptual modeling, testing in simulated (virtual) worlds, controlled (possibly laboratory) environments finally deployment.Throughout each (repeatable) stage, behavior entities, sensing situation assessment, computation control options have to be understood carefully represented through abstraction.The Group at Ohio State University,...
Cyber Physical Systems (CPS) are the conjoining of an entities' physical and computational elements. The development a typical CPS system follows sequence from conceptual modeling, testing in simulated (virtual) worlds, controlled (possibly laboratory) environments finally deployment. Throughout each (repeatable) stage, behavior entities, sensing situation assessment, computation control options have to be understood carefully represented through abstraction. Group at Ohio State University,...
Deep learning in molecular and materials sciences is limited by the lack of integration between applied science, artificial intelligence, high-performance computing. Bottlenecks with respect to amount training data, size complexity model architectures, scale compute infrastructure are all key factors limiting scaling deep for molecules materials. Here, we present $\textit{LitMatter}$, a lightweight framework methods. We train four graph neural network architectures on over 400 GPUs...
This paper is an update of the survey AI accelerators and processors from past four years, which now called Lincoln Computing Survey - LAICS (pronounced "lace"). As in this collects summarizes current commercial that have been publicly announced with peak performance power consumption numbers. The values are plotted on a scatter graph, number dimensions observations trends plot again discussed analyzed. Market segments highlighted plot, zoomed plots each segment also included. Finally, brief...
GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC AI/ML researchers to advance the scientific discovery process. However, this also leads inefficient resource usage, as most workloads, including complicated models, are not able utilize resources their fullest extent -- encouraging support for multi-tenancy. We propose MISO, a technique exploit Multi-Instance (MIG) capability on latest NVIDIA datacenter GPUs (e.g., A100, H100) dynamically...
Deep Learning has a dramatically increasing demand for compute resources and corresponding increase in the energy required to develop, explore, test model architectures various applications. Parameter tuning networks customarily involves training multiple models search over grid of parameter choices either randomly or exhaustively, strategies applying complex methods identify candidate require significant computation each possible architecture sampled spaces. However, these approaches...