NFDI4DS | UHH-SEMS - Publication Details

Michael J. Franklin

ORCID: 0000-0003-3332-8574

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102019638

Research Areas

Advanced Database Systems and Queries
Data Management and Algorithms
Cloud Computing and Resource Management
Distributed systems and fault tolerance
Advanced Data Storage Technologies
Data Quality and Management
Mobile Crowdsensing and Crowdsourcing
Data Stream Mining Techniques
Privacy-Preserving Technologies in Data
Machine Learning and Data Classification
Distributed and Parallel Computing Systems
Scientific Computing and Data Management
Peer-to-Peer Network Technologies
Semantic Web and Ontologies
Anomaly Detection Techniques and Applications
Time Series Analysis and Forecasting
Caching and Content Delivery
Graph Theory and Algorithms
Energy Efficient Wireless Sensor Networks
Rangeland and Wildlife Management
Machine Learning and Algorithms
Algorithms and Data Compression
Web Data Mining and Analysis
Big Data and Business Intelligence
Context-Aware Activity Recognition Systems

Wrightington Hospital
2025

Western Sydney University
2013-2025

University of Chicago
2008-2024

University of Wollongong
2006-2022

University of Illinois Chicago
2019-2020

University of California, Berkeley
2009-2018

University of Toronto
2013-2018

Agency for Toxic Substances and Disease Registry
2018

Global Affairs Canada
2018

Tsinghua University
2017

Apache Spark

OPENALEX - Publications

Matei Zaharia Reynold Xin Patrick Wendell Tathagata Das Michael Armbrust and 9 more

This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.

10.1145/2934664 article EN Communications of the ACM 2016-10-28

TinyDB: an acquisitional query processing system for sensor networks

OPENALEX - Publications

Samuel Madden Michael J. Franklin Joseph M. Hellerstein Wei Hong

We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often is physically acquired ( sampled ) delivered processing operators. By focusing on locations costs acquiring data, we able significantly reduce power consumption over traditional passive systems assume a priori existence data. simple extensions SQL controlling acquisition, show influence optimization, dissemination,...

10.1145/1061318.1061322 article EN ACM Transactions on Database Systems 2005-03-01

Spark SQL

OPENALEX - Publications

Michael Armbrust Reynold Xin Cheng Lian Yin Huai Davies Liu and 6 more

Spark SQL is a new module in Apache that integrates relational processing with Spark's functional programming API. Built on our experience Shark, lets programmers leverage the benefits of (e.g. declarative queries and optimized storage), users call complex analytics libraries machine learning). Compared to previous systems, makes two main additions. First, it offers much tighter integration between procedural processing, through DataFrame API code. Second, includes highly extensible...

10.1145/2723372.2742797 article EN 2015-05-27

MLlib: Machine Learning in Apache Spark

OPENALEX - Publications

Xiangrui Meng Joseph K. Bradley Burak Yavuz Evan Sparks Shivaram Venkataraman and 11 more

Apache Spark is a popular open-source platform for large-scale data processing that well-suited iterative machine learning tasks. In this paper we present MLlib, Spark's distributed library. MLlib provides efficient functionality wide range of settings and includes several underlying statistical, optimization, linear algebra primitives. Shipped with Spark, supports languages high-level API leverages rich ecosystem to simplify the development end-to-end pipelines. has experienced rapid growth...

10.48550/arxiv.1505.06807 preprint EN other-oa arXiv (Cornell University) 2015-01-01

The design of an acquisitional query processor for sensor networks

OPENALEX - Publications

Samuel Madden Michael J. Franklin Joseph M. Hellerstein Wei Hong

We discuss the design of an acquisitional query processor for data collection in sensor networks. Acquisitional issues are those that pertain to where, when, and how often is physically acquired (sampled) delivered processing operators. By focusing on locations costs acquiring data, we able significantly reduce power consumption over traditional passive systems assume a priori existence data. simple extensions SQL controlling acquisition, show influence optimization, dissemination,...

10.1145/872757.872817 article EN 2003-06-09

GraphX: graph processing in a distributed dataflow framework

OPENALEX - Publications

Joseph E. Gonzalez Reynold Xin Ankur Dave Daniel Crankshaw Michael J. Franklin and 1 more

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor specialized that provide tailored programming abstractions and accelerate execution iterative algorithms. this paper we argue many advantages can be recovered a modern system. We introduce GraphX, an embedded framework built on top Apache Spark, widely used GraphX presents familiar composable abstraction is sufficient to express existing APIs, yet...

10.5555/2685048.2685096 article EN 2014-10-06

TelegraphCQ

OPENALEX - Publications

Sirish Chandrasekaran Owen Cooper Amol Deshpande Michael J. Franklin Joseph M. Hellerstein and 5 more

No abstract available.

10.1145/872757.872857 article FR 2003-06-09

From databases to dataspaces

OPENALEX - Publications

Michael J. Franklin Alon Halevy David Maier

The development of relational database management systems served to focus the data community for decades, with spectacular results. In recent years, however, rapidly-expanding demands "data everywhere" have led a field comprised interesting and productive efforts, but without central or coordinated agenda. most acute information challenges today stem from organizations (e.g., enterprises, government agencies, libraries, "smart" homes) relying on large number diverse, interrelated sources,...

10.1145/1107499.1107502 article EN ACM SIGMOD Record 2005-12-01

CrowdDB

OPENALEX - Publications

Michael J. Franklin Donald Kossmann Tim Kraska Sukriti Ramesh Reynold Xin

Some queries cannot be answered by machines only. Processing such requires human input for providing information that is missing from the database, performing computationally difficult functions, and matching, ranking, or aggregating results based on fuzzy criteria. CrowdDB uses via crowdsourcing to process neither database systems nor search engines can adequately answer. It SQL both as a language posing complex way model data. While leverages many aspects of traditional systems, there are...

10.1145/1989323.1989331 article EN 2011-06-12

GraphX

OPENALEX - Publications

Reynold Xin Joseph E. Gonzalez Michael J. Franklin Ion Stoica

From social networks to targeted advertising, big graphs capture the structure in data and are central recent advances machine learning mining. Unfortunately, directly applying existing data-parallel tools graph computation tasks can be cumbersome inefficient. The need for intuitive, scalable has lead development of new graph-parallel systems (e.g., Pregel, PowerGraph) which designed efficiently execute algorithms. these do not address challenges construction transformation often just as...

10.1145/2484425.2484427 article EN 2013-06-23

CrowdER

OPENALEX - Publications

Jiannan Wang Tim Kraska Michael J. Franklin Jianhua Feng

Entity resolution is central to data integration and cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate expensive (and slow) way bring human insight into the process. Previous work has proposed batching verification tasks for presentation workers even with batching, human-only approach infeasible sets of moderate size, due large numbers matches be tested. Instead, we propose hybrid human-machine which...

10.14778/2350229.2350263 article EN Proceedings of the VLDB Endowment 2012-07-01

Fjording the stream: an architecture for queries over streaming sensor data

OPENALEX - Publications

Samuel Madden Michael J. Franklin

If industry visionaries are correct, our lives will soon be full of sensors, connected together in loose conglomerations via wireless networks, each monitoring and collecting data about the environment at large. These sensors behave very differently from traditional database sources: they have intermittent connectivity, limited by severe power constraints, typically sample periodically push immediately, keeping no record historical information. limitations make systems inappropriate for...

10.1109/icde.2002.994774 article EN 2003-06-25

Supporting aggregate queries over ad-hoc wireless sensor networks

OPENALEX - Publications

Samuel Madden Robert Szewczyk Michael J. Franklin David Culler

We show how the database community's notion of a generic query interface for data aggregation can be applied to ad-hoc networks sensor devices. As has been noted in network literature, is important as reduction tool; networking approaches, however, have focused on application specific solutions, whereas our in-network approach driven by general purpose, SQL-style that execute queries over any type while providing opportunities significant optimization. present variety techniques improve...

10.1109/mcsa.2002.1017485 article EN 2003-06-25

Adaptive cleaning for RFID data streams

OPENALEX - Publications

Shawn R. Jeffery Minos Garofalakis Michael J. Franklin

To compensate for the inherent unreliability of RFID data streams, most middleware systems employ a smoothing filter, sliding-window aggregate that interpolates lost readings. In this paper, we propose SMURF, first declarative, adaptive filter cleaning. SMURF models readings by viewing streams as statistical sample tags in physical world, and exploits techniques grounded sampling theory to drive its cleaning processes. Through use tools such binomial π-estimators, continuously adapts window...

10.5555/1182635.1164143 article EN Very Large Data Bases 2006-09-01

Balancing push and pull for data broadcast

OPENALEX - Publications

Swarup Acharya Michael J. Franklin Stanley B. Zdonik

The increasing ability to interconnect computers through internet-working, wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centered applications based on data dissemination. These employ broadcast deliver very large client populations. We have proposed the Broadcast Disks paradigm [Zdon94, Acha95b] for organizing contents program managing resources in response such program. Our previous work focused exclusively "push-based" approach,...

10.1145/253260.253293 article EN 1997-01-01

Shark

OPENALEX - Publications

Reynold Xin Josh Rosen Matei Zaharia Michael J. Franklin Scott Shenker and 1 more

Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages novel distributed memory abstraction to provide unified engine can run SQL queries and sophisticated functions (e.g. iterative machine learning) at scale, efficiently recovers from failures mid-query. This allows up 100X faster than Apache Hive, learning programs more Hadoop. Unlike previous systems, shows it possible achieve these speedups while retaining MapReduce-like...

10.1145/2463676.2465288 article EN 2013-06-22

Principles of dataspace systems

OPENALEX - Publications

Alon Halevy Michael J. Franklin David Maier

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means managing them in convenient, integrated, or principled fashion. These arise enterprise and government management, digital libraries, "smart" homes personal management. We have proposed dataspaces as abstraction for these diverse applications DataSpace Support Platforms (DSSPs) systems that should be built to provide the required...

10.1145/1142351.1142352 article EN 2006-06-26

Coming Soon ...