- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Distributed systems and fault tolerance
- Embedded Systems Design Techniques
- Low-power high-performance VLSI design
- IoT and Edge/Fog Computing
- Radiation Effects in Electronics
- Advanced Memory and Neural Computing
- Software System Performance and Reliability
- Graph Theory and Algorithms
- Green IT and Sustainability
- Ferroelectric and Negative Capacitance Devices
- Caching and Content Delivery
- Algorithms and Data Compression
- Blockchain Technology Applications and Security
- Scientific Computing and Data Management
- Peer-to-Peer Network Technologies
- Computational Geometry and Mesh Generation
- Network Security and Intrusion Detection
- Data Stream Mining Techniques
- Anomaly Detection Techniques and Applications
- Advanced Neural Network Applications
Virginia Tech
2006-2024
Southwest Jiaotong University
2021
Queen's University Belfast
2012-2020
Crete University Press
2019
University of Crete
2009-2019
Queens University
2019
Foundation for Research and Technology Hellas
2009-2017
National and Kapodistrian University of Athens
1998-2016
FORTH Institute of Electronic Structure and Laser
2008-2012
FORTH Institute of Computer Science
2010-2011
Many cloud-based applications employ a data centers as central server to process that is generated by edge devices, such smartphones, tablets and wearables. This model places ever increasing demands on communication computational infrastructure with inevitable adverse effect Quality-of-Service Experience. The concept of Edge Computing predicated moving some this load towards the network harness capabilities are currently untapped in nodes, base stations, routers switches. position paper...
Current computing techniques using the cloud as a centralised server will become untenable billions of devices get connected to Internet. This raises need for fog computing, which leverages at edge network on nodes, such routers, base stations and switches, along with cloud. However, realise challenge managing nodes be addressed. paper is motivated address resource management challenge. We develop first framework manage namely Edge NOde Resource Management (ENORM) framework. Mechanisms...
Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) dynamic concurrency throttling (DCT) are two software tools (or knobs) reducing the power consumption of To date, few works have considered synergistic integration DVFS DCT in performance-constrained systems, and, to best our knowledge, no prior research developed application-aware simultaneous controllers real systems parallel programming frameworks. We present multi-dimensional, online...
Power-aware execution of parallel programs is now a primary concern in large-scale HPC environments. Prior research this area has explored models and algorithms based on dynamic voltage frequency scaling (DVFS) concurrency throttling (DCT) to achieve power-aware written single programming model, typically MPI or OpenMP. However, hybrid combining OpenMP are growing popularity as emerging systems have many nodes with several processors per node multiple cores process or. In th paper we present...
Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—are well established documented. However, with the increase in parallel, many-core, heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various runtime features. Unfortunately, despite fact that dozens different systems exist today are actively used parallel high-performance computing (HPC), no comprehensive overview or classification...
Fish fraud detection is mainly carried out using a genomic profiling approach requiring long and complex sample preparations assay running times. Rapid evaporative ionisation mass spectrometry (REIMS) can circumvent these issues without sacrificing loss in the quality of results. To demonstrate that REIMS be used as fast technique capable achieving accurate species identification need for any preparation. Additionally, we wanted to other aspects fish than speciation are detectable REIMS. 478...
With high-end systems featuring multicore/multithreaded processors and high component density, power-aware high-performance multithreading libraries become a critical element of the system software stack. Online power performance adaptation multithreaded code from within user-level runtime is relatively new unexplored area research. We present library framework for nearly optimal online codes low-power, execution. Our operates by regulating concurrency changing processors/threads...
Computing has recently reached an inflection point with the introduction of multi-core processors. On-chip thread-level parallelism is doubling approximately every other year. Concurrency lends itself naturally to allowing a program trade performance for power savings by regulating number active cores, however in several domains users are unwilling sacrifice save power. We present prediction model identifying energy-efficient operating points concurrency well-tuned multithreaded scientific...
Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay traditional computing systems for many years. This era, fueled by Moore's law and constant exponential improvement efficiency, is at its twilight: from tiny nodes Internet-of-Things, to large HPC centers, sub-picoJoule/operation energy efficiency essential practical realizations. To overcome power wall, shift paradigms now mandatory. In this paper we present driving motivations, roadmap,...
Much research on school bullying and victimization have outlined several individual, family, parameters that function as risk factors for developing further psychosocial psychopathological problems. Bullying are interrelated with symptoms of psychological trauma, well emotional/behavioural reactions, which can destabilize scholastic pathways children adolescents. The current study explored the various dimensions trauma (depressive symptoms, somatization, dissociation, avoidance behaviours)...
In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, accuracy predictors. None them, however, demonstrated highperformance without training new experiments in the presence unseen datasets. We propose a deep predictor, that estimates fractions second classification performance input datasets, training. contrast to...
We investigate how graph partitioning adversely affects the performance of analytics. demonstrate that induces extra work during traversal and partitions have markedly different connectivity than original graph. By consequence, increasing number reaches a tipping point after which overheads quickly dominate gains. Moreover, we show heuristic to balance CPU load between by balancing edges is inappropriate for range analyses. However, even when it appropriate, sub-optimal due skewed degree...
We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance allocation while transparently favoring locality. Streamflow enables over-head simultaneous by multiple threads and adapts to sequential at speeds comparable that of custom allocators. It favors the transparent exploitation temporal spatial object access locality, reduces allocator-induced cache conflicts false sharing, all using unified design based on segregated heaps. introduces an...
This paper addresses the problem of orchestrating and scheduling parallelism at multiple levels granularity on heterogeneous multicore processors. We present mechanisms policies for adaptive exploitation layered Cell Broadband Engine. Our combine event-driven task with malleable loop-level parallelism, which is exploited from runtime system whenever task-level leaves idle cores. a scheduler applications investigate its performance RAxML, an application infers large phylogenetic trees, using...
Asymmetric multi-core processors (AMPs) with general-purpose and specialized cores packaged on the same chip, are emerging as a leading paradigm for high-end computing. A large body of existing research explores use standalone AMPs in computationally challenging data-intensive applications. rapidly deployed high-performance accelerators clusters. In these settings, scheduling, communication I/O managed by generalpurpose (GPPs), while computation is off-loaded to AMPs. Design space...
Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown energy efficiency can be improved software-controlled execution schemes consider model power-aware capabilities system. However, such approaches have focused on identifying optimal resource utilization for one model, either memory or passing, in isolation....
Computational phylogeny is a challenging application even for the most powerful supercomputers. It also an ideal candidate benchmarking emerging multiprocessor architectures, because it exhibits fine- and coarse-grain parallelism at multiple levels. In this paper, we present porting, optimization, evaluation of RAxML on cell broadband engine. provably efficient, hill climbing algorithm computing phylogenetic trees, based maximum likelihood (ML) method. The engine, heterogeneous multi-core...
The use of asymmetric multi-core processors with on-chip computational accelerators is becoming common in a variety environments ranging from scientific computing to enterprise applications. focus current research has been on making efficient individual systems, and porting applications processors. In this paper, we take the next step by investigating multi-core-based especially popular Cell processor, cluster setting. We present CellMR, an scalable implementation MapReduce framework for...
Task-based dataflow programming models and runtimes emerge as promising candidates for multicore manycore architectures. These analyze dynamically task dependencies at runtime schedule independent tasks concurrently to the processing elements. In such models, cache locality, which is critical performance, becomes more challenging in presence of fine-grain tasks, architectures with many simple cores.
Modeling dynamical systems represents an important application class covering a wide range of disciplines including but not limited to biology, chemistry, finance, national security, and health care. Such applications typically involve large-scale, irregular graph processing, which makes them difficult scale due the evolutionary nature their workload, communication load imbalance. EpiSimdemics is such simulating epidemic diffusion in extremely large realistic social contact networks. It...