Michael J. Franklin

ORCID: 0000-0003-3332-8574
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Advanced Data Storage Technologies
  • Data Quality and Management
  • Mobile Crowdsensing and Crowdsourcing
  • Data Stream Mining Techniques
  • Privacy-Preserving Technologies in Data
  • Machine Learning and Data Classification
  • Distributed and Parallel Computing Systems
  • Scientific Computing and Data Management
  • Peer-to-Peer Network Technologies
  • Semantic Web and Ontologies
  • Anomaly Detection Techniques and Applications
  • Time Series Analysis and Forecasting
  • Caching and Content Delivery
  • Graph Theory and Algorithms
  • Energy Efficient Wireless Sensor Networks
  • Rangeland and Wildlife Management
  • Machine Learning and Algorithms
  • Algorithms and Data Compression
  • Web Data Mining and Analysis
  • Big Data and Business Intelligence
  • Context-Aware Activity Recognition Systems

Wrightington Hospital
2025

Western Sydney University
2013-2025

University of Chicago
2008-2024

University of Wollongong
2006-2022

University of Illinois Chicago
2019-2020

University of California, Berkeley
2009-2018

University of Toronto
2013-2018

Agency for Toxic Substances and Disease Registry
2018

Global Affairs Canada
2018

Tsinghua University
2017

This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

10.1145/2934664 article EN Communications of the ACM 2016-10-28

We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often is physically acquired ( sampled ) delivered processing operators. By focusing on locations costs acquiring data, we able significantly reduce power consumption over traditional passive systems assume a priori existence data. simple extensions SQL controlling acquisition, show influence optimization, dissemination,...

10.1145/1061318.1061322 article EN ACM Transactions on Database Systems 2005-03-01

Spark SQL is a new module in Apache that integrates relational processing with Spark's functional programming API. Built on our experience Shark, lets programmers leverage the benefits of (e.g. declarative queries and optimized storage), users call complex analytics libraries machine learning). Compared to previous systems, makes two main additions. First, it offers much tighter integration between procedural processing, through DataFrame API code. Second, includes highly extensible...

10.1145/2723372.2742797 article EN 2015-05-27

Apache Spark is a popular open-source platform for large-scale data processing that well-suited iterative machine learning tasks. In this paper we present MLlib, Spark's distributed library. MLlib provides efficient functionality wide range of settings and includes several underlying statistical, optimization, linear algebra primitives. Shipped with Spark, supports languages high-level API leverages rich ecosystem to simplify the development end-to-end pipelines. has experienced rapid growth...

10.48550/arxiv.1505.06807 preprint EN other-oa arXiv (Cornell University) 2015-01-01

We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often is physically acquired (sampled) delivered processing operators. By focusing on locations costs acquiring data, we able significantly reduce power consumption over traditional passive systems assume a priori existence data. simple extensions SQL controlling acquisition, show influence optimization, dissemination,...

10.1145/872757.872817 article EN 2003-06-09

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor specialized that provide tailored programming abstractions and accelerate execution iterative algorithms. this paper we argue many advantages can be recovered a modern system. We introduce GraphX, an embedded framework built on top Apache Spark, widely used GraphX presents familiar composable abstraction is sufficient to express existing APIs, yet...

10.5555/2685048.2685096 article EN 2014-10-06

The development of relational database management systems served to focus the data community for decades, with spectacular results. In recent years, however, rapidly-expanding demands "data everywhere" have led a field comprised interesting and productive efforts, but without central or coordinated agenda. most acute information challenges today stem from organizations (e.g., enterprises, government agencies, libraries, "smart" homes) relying on large number diverse, interrelated sources,...

10.1145/1107499.1107502 article EN ACM SIGMOD Record 2005-12-01

Some queries cannot be answered by machines only. Processing such requires human input for providing information that is missing from the database, performing computationally difficult functions, and matching, ranking, or aggregating results based on fuzzy criteria. CrowdDB uses via crowdsourcing to process neither database systems nor search engines can adequately answer. It SQL both as a language posing complex way model data. While leverages many aspects of traditional systems, there are...

10.1145/1989323.1989331 article EN 2011-06-12

From social networks to targeted advertising, big graphs capture the structure in data and are central recent advances machine learning mining. Unfortunately, directly applying existing data-parallel tools graph computation tasks can be cumbersome inefficient. The need for intuitive, scalable has lead development of new graph-parallel systems (e.g., Pregel, PowerGraph) which designed efficiently execute algorithms. these do not address challenges construction transformation often just as...

10.1145/2484425.2484427 article EN 2013-06-23

Entity resolution is central to data integration and cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate expensive (and slow) way bring human insight into the process. Previous work has proposed batching verification tasks for presentation workers even with batching, human-only approach infeasible sets of moderate size, due large numbers matches be tested. Instead, we propose hybrid human-machine which...

10.14778/2350229.2350263 article EN Proceedings of the VLDB Endowment 2012-07-01

If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have intermittent connectivity, limited by severe power constraints, typically sample periodically push immediately, keeping no record historical information. limitations make systems inappropriate for...

10.1109/icde.2002.994774 article EN 2003-06-25

We show how the database community's notion of a generic query interface for data aggregation can be applied to ad-hoc networks sensor devices. As has been noted in network literature, is important as reduction tool; networking approaches, however, have focused on application specific solutions, whereas our in-network approach driven by general purpose, SQL-style that execute queries over any type while providing opportunities significant optimization. present variety techniques improve...

10.1109/mcsa.2002.1017485 article EN 2003-06-25

To compensate for the inherent unreliability of RFID data streams, most middleware systems employ a smoothing filter, sliding-window aggregate that interpolates lost readings. In this paper, we propose SMURF, first declarative, adaptive filter cleaning. SMURF models readings by viewing streams as statistical sample tags in physical world, and exploits techniques grounded sampling theory to drive its cleaning processes. Through use tools such binomial π-estimators, continuously adapts window...

10.5555/1182635.1164143 article EN Very Large Data Bases 2006-09-01

The increasing ability to interconnect computers through internet-working, wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centered applications based on data dissemination. These employ broadcast deliver very large client populations. We have proposed the Broadcast Disks paradigm [Zdon94, Acha95b] for organizing contents program managing resources in response such program. Our previous work focused exclusively "push-based" approach,...

10.1145/253260.253293 article EN 1997-01-01

Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages novel distributed memory abstraction to provide unified engine can run SQL queries and sophisticated functions (e.g. iterative machine learning) at scale, efficiently recovers from failures mid-query. This allows up 100X faster than Apache Hive, learning programs more Hadoop. Unlike previous systems, shows it possible achieve these speedups while retaining MapReduce-like...

10.1145/2463676.2465288 article EN 2013-06-22

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means managing them in convenient, integrated, or principled fashion. These arise enterprise and government management, digital libraries, "smart" homes personal management. We have proposed dataspaces as abstraction for these diverse applications DataSpace Support Platforms (DSSPs) systems that should be built to provide the required...

10.1145/1142351.1142352 article EN 2006-06-26
Coming Soon ...