- Face and Expression Recognition
- Neural Networks and Applications
- Peer-to-Peer Network Technologies
- Caching and Content Delivery
- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Topic Modeling
- Distributed systems and fault tolerance
- Distributed and Parallel Computing Systems
- Domain Adaptation and Few-Shot Learning
- Image Retrieval and Classification Techniques
- Generative Adversarial Networks and Image Synthesis
- Sparse and Compressive Sensing Techniques
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Machine Learning and Data Classification
- Advanced Database Systems and Queries
- Scientific Computing and Data Management
- Advanced Neural Network Applications
- Blind Source Separation Techniques
- Machine Learning and ELM
- Gaussian Processes and Bayesian Inference
- Machine Learning and Algorithms
- Time Series Analysis and Forecasting
- Advanced Image and Video Retrieval Techniques
University of Waterloo
2016-2025
Aja University of Medical Sciences
2023
University of Alberta
2022
Huawei Technologies (Sweden)
2021-2022
Actua
2012-2022
University of Shahrood
2020-2021
Vector Institute
2021
University of California, Berkeley
2009-2020
Berkeley College
2011-2020
KTH Royal Institute of Technology
2004-2018
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI. Sharing improves utilization avoids per-framework data replication. Mesos shares resources in fine-grained manner, allowing frameworks to achieve locality by taking turns reading stored on each machine. To support the sophisticated schedulers of today's introduces distributed two-level scheduling mechanism called resource offers. decides how many offer...
Spark SQL is a new module in Apache that integrates relational processing with Spark's functional programming API. Built on our experience Shark, lets programmers leverage the benefits of (e.g. declarative queries and optimized storage), users call complex analytics libraries machine learning). Compared to previous systems, makes two main additions. First, it offers much tighter integration between procedural processing, through DataFrame API code. Second, includes highly extensible...
We consider the problem of fair resource allocation in a system containing different types, where each user may have demands for resource. To address this problem, we propose Dominant Resource Fairness (DRF), generalization max-min fairness to multiple types. show that DRF, unlike other possible policies, satisfies several highly desirable properties. First, DRF incentivizes users share resources, by ensuring no is better off if resources are equally partitioned among them. Second,...
There have been many recent papers on data-oriented or content-centric network architectures. Despite the voluminous literature, surprisingly little clarity is emerging as most focus what differentiates them from other proposals. We begin this paper by identifying existing commonalities and important differences in these designs, then discuss some remaining research issues. After our review, we emerge skeptical (but open-minded) about value of approach to networking.
Information-Centric Networking (ICN) has seen a significant resurgence in recent years. ICN promises benefits to users and service providers along several dimensions (e.g., performance, security, mobility). These benefits, however, come at non-trivial cost as many proposals envision adding complexity the network by having routers serve content caches support nearest-replica routing. This paper is driven simple question of whether this additional justified if we can achieve these an...
Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication used for fault-tolerance. eliminates this bottleneck by pushing lineage, well-known technique, into the storage layer. The key challenge in making long-running lineage-based timely recovery case of failures. addresses issue introducing checkpointing algorithm that...
We consider the problem of separating consistency-related safety properties from availability and durability in distributed data stores via application a "bolt-on" shim layer that upgrades an underlying general-purpose store. This provides same consistency guarantees atop wide range widely deployed but often inflexible stores. As causal is one strongest models remain available during system partitions, we develop eventually consistent to provide convergent consistency. Accordingly, leverage...
With the increasing commoditization of computer vision, speech recognition and machine translation systems widespread deployment learning-based back-end technologies such as digital advertising intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels data computation, methodological advances in learning, innovations software architectures, broad accessibility these technologies. The...
Feature vectors provided by pre-trained deep artificial neural networks have become a dominant source for image representation in recent literature. Their contribution to the performance of analysis can be improved through fine-tuning. As an ultimate solution, one might even train network from scratch with domain-relevant images, highly desirable option which is generally impeded pathology lack labeled images and computational expense. In this study, we propose new network, namely KimiaNet,...
There have been several recent proposals for content-oriented network architectures whose underlying mechanisms are surprisingly similar in spirit, but which differ many details. In this paper we step back from the mechanistic details and focus only on area where these approaches a fundamental difference: naming. particular, some designs adopt hierarchical, human-readable names, whereas others use self-certifying names. When discussing architecture, three of most important requirements...
To minimize network latency and remain online during server failures partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over items. In this work, we consider the problem providing Highly Available Transactions (HATs): that do not suffer unavailability system partitions or incur high latency. We introduce a taxonomy highly available analyze existing ACID isolation consistency...
Minimizing coordination, or blocking communication between concurrently executing operations, is key to maximizing scalability, availability, and high performance in database systems. However, uninhibited coordination-free execution can compromise application correctness, consistency. When coordination necessary for correctness? The classic use of serializable transactions sufficient maintain correctness but not all applications, sacrificing potential scalability. In this paper, we develop a...
Middleboxes are ubiquitous in today's networks and perform a variety of important functions, including IDS, VPN, firewalling, WAN optimization. These functions differ vastly their requirements for hardware resources (e.g., CPU cycles memory bandwidth). Thus, depending on the they go through, different flows can consume amounts middlebox's resources. While there is much literature weighted fair sharing link bandwidth to isolate flows, it unclear how schedule multiple middlebox achieve similar...
Over the past decade a variety of network architectures have been proposed to address IP's limitations in terms flexible forwarding, security, and data distribution. Meanwhile, fueled by explosive growth video traffic HTTP infrastructure (e.g., CDNs, web caches), has became de-facto protocol for deploying new services applications. Given these developments, we argue that should be evaluated not only with respect IP, but also HTTP, could fertile ground (more so than IP) newly functionalities....
How can applications be built on eventually consistent infrastructure given no guarantee of safety?
Max-Min Fairness is a flexible resource allocation mechanism used in most datacenter schedulers. However, an increasing number of jobs have hard placement constraints, restricting the machines they can run on due to special hardware or software requirements. It unclear how define, and achieve, max-min fairness presence such constraints. We propose Constrained (CMMF), extension that supports show it only policy satisfying important property incentivizes users pool resources. Optimally...