- Parallel Computing and Optimization Techniques
- Graph Theory and Algorithms
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Advanced Database Systems and Queries
- Advanced Graph Neural Networks
- Data Management and Algorithms
- Cloud Computing and Resource Management
- Software System Performance and Reliability
- Cognitive Functions and Memory
- Software Testing and Debugging Techniques
- Optimization and Search Problems
- Complex Network Analysis Techniques
- Network Packet Processing and Optimization
- Model-Driven Software Engineering Techniques
- Formal Methods in Verification
- Caching and Content Delivery
- Logic, programming, and type systems
- Scientific Computing and Data Management
- Data Visualization and Analytics
- Embedded Systems Design Techniques
- Topic Modeling
- Topological and Geometric Data Analysis
- Semantic Web and Ontologies
Oracle (United States)
2012-2023
Linköping University
2023
National Institute of Informatics
2023
Association for Computing Machinery
2023
Chinese University of Hong Kong
2023
Stanford University
2005-2014
The increasing importance of graph-data based applications is fueling the need for highly efficient and parallel implementations graph analysis software. In this paper we describe Green-Marl, a domain-specific language (DSL) whose high level constructs allow developers to their algorithms intuitively, but expose data-level parallelism inherent in algorithms. We also present our Green-Marl compiler which translates high-level algorithmic description written into an C++ implementation by...
The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, benchmarking practices evaluating graph data management systems. LDBC introduced a new choke-point driven methodology developing benchmark workloads, which combines user input with from expert systems architects, we outline. This paper describes the Social Network (SNB), presents database innovation in terms of query functionality...
We propose a concurrent relaxed balance AVL tree algorithm that is fast, scales well, and tolerates contention. It based on optimistic techniques adapted from software transactional memory, but takes advantage of specific knowledge the to reduce overheads avoid unnecessary retries. extend our with fast linearizable clone operation, which can be used for consistent iteration tree. Experimental evidence shows outperforms highly tuned skip list many access patterns, an average 39% higher...
Computing systems are becoming increasingly parallel and heterogeneous, therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models making decisions that can limit forward scalability. In previous work we proposed the use domain-specific languages (DSLs) provide high-level abstractions enable transformations performance code without...
Developing high-performance software is a difficult task that requires the use of low-level, architecture-specific programming models (e.g., OpenMP for CMPs, CUDA GPUs, MPI clusters). It typically not possible to write single application can run efficiently in different environments, leading multiple versions and increased complexity. Domain-Specific Languages (DSLs) are promising avenue enable programmers high-level abstractions still achieve good performance on variety hardware. This...
Transactional memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, priority inversion, convoying). For TM be adopted in long term, not only does it deliver on these promises, but needs scale a high number of processors. To date, proposals scalable have relegated livelock issues user-level contention managers. This paper presents first implementation directory-based distributed shared...
Graph-based approaches to data analysis have become more widespread, which has given need for a query language graphs. Such graph needs not only SQL-like functionality querying structured data, but also intrinsic support typical graph-style applications: reachability analysis, path finding and construction.
Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose models available today can yield high performance but are too low-level be accessible the average programmer. We propose leveraging domain-specific languages (DSLs) map high-level devices. To demonstrate potential of this approach we present OptiML, a DSL for machine learning. OptiML programs implicitly and achieve on with no...
In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having pay price in efficiency. To realize our vision abstraction regret , present LegoBase, query engine written the language Scala. The key technique regain efficiency apply generative programming: Scala code constitutes engine, despite its appearance, actually program generator emits specialized, low-level C code. We...
Transactional Memory (TM) simplifies parallel programming by allowing for execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software interface, not limited to simplistic instructions defining transaction boundaries. Without rich semantics, current cannot support basic features modern languages operating such as transparent library calls, conditional synchronization, system I/O,...
Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. derived from Java, but replaces its synchronization conditional waiting constructs simpler transactional alternatives.The watch statement allows programmers to specify fine-grained sets used retry for efficient conflict-driven wakeup even in memory systems limited number of contexts. supports open-nested transactions , which are necessary building both...
Transactional Memory (TM) simplifies parallel programming by allowing for execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software interface, not limited to simplistic instructions defining transaction boundaries. Without rich semantics, current cannot support basic features modern languages operating such as transparent library calls, conditional synchronization, system I/O,...
As heterogeneous parallel systems become dominant, application developers are being forced to turn an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these do little shield from the difficult problems parallelization, data decomposition and machine-specific details. Most programmersare having a time using effectively. To provide modelthat addresses productivity performance requirements for average programmer, we explore domainspecificapproach programming.
Domain-specific languages offer a solution to the performance and productivity issues in heterogeneous computing systems. The Delite compiler framework simplifies process of building embedded parallel DSLs. DSL developers can implement domain-specific operations by extending framework, which provides static optimizations code generation for hardware. runtime automatically schedules executes on
Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances distributed environments, their performance much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present fast system, namely PGX.D. We show that PGX.D outperforms other systems like GraphLab significantly (3x -- 90x). Furthermore, on 4 to 16 machines also faster an implementation optimized execution....
A dynamic graph is defined by an initial and a update stream consisting of edge insertions deletions. Identifying monitoring critical patterns in the important various application domains such as fraud detection, cyber security, emergency response. Given data query graph, continuous subgraph matching system reports positive matches for insertion negative deletion. Previous systems show significantly low throughput due to either repeated each or expensive overheads maintaining enormous...
Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. derived from Java, but replaces its synchronization conditional waiting constructs simpler transactional alternatives.The watch statement allows programmers to specify fine-grained sets used retry for efficient conflict-driven wakeup even in memory systems limited number of contexts. supports open-nested which are necessary building both application programs...
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware software TM implementations. However, lack of transaction-based programs makes it difficult to understand merits each proposal tune future implementations common case behavior real application. This work addresses this problem by analyzing transactional 35 multithreaded from a wide range...
RDF data are used to model knowledge in various areas such as life sciences, Semantic Web, bioinformatics, and social graphs. The size of real reaches billions triples. This calls for a framework efficiently processing data. core function is subgraph pattern matching. There have been two completely different directions supporting efficient One direction develop specialized query engines exploiting the properties last decade, while other isomorphism algorithms general, labeled graphs over 30...
For transactional memory (TM) to achieve widespread acceptance, transactions should not be limited the physical resources of any specific hardware implementation. TM systems guarantee correct execution even when exceed scheduling quanta, overflow capacity caches and memory, or include more independent nesting levels than what is supported in hardware. Existing proposals for virtualization are either incomplete rely on complex implementations, which an overkill if invoked infrequently common...
Transactional coherence and consistency (TCC) is a novel scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization, coherence, consistency. TCC has potential to simplify program development optimization by providing smooth transition from sequential programs. In this paper, we study implementation on chip-multiprocessors (CMPs). We explore design alternatives such granularity state tracking, double-buffering,...
We propose a concurrent relaxed balance AVL tree algorithm that is fast, scales well, and tolerates contention. It based on optimistic techniques adapted from software transactional memory, but takes advantage of specific knowledge the to reduce overheads avoid unnecessary retries. extend our with fast linearizable clone operation, which can be used for consistent iteration tree. Experimental evidence shows outperforms highly tuned skip list many access patterns, an average 39% higher...