- Data Management and Algorithms
- Advanced Database Systems and Queries
- Geographic Information Systems Studies
- Data Mining Algorithms and Applications
- Graph Theory and Algorithms
- Data Visualization and Analytics
- Advanced Image and Video Retrieval Techniques
- Recommender Systems and Techniques
- Advanced Data Storage Technologies
- Remote-Sensing Image Classification
- Human Mobility and Location-Based Analysis
- Automated Road and Building Extraction
- Cloud Computing and Resource Management
- Parallel Computing and Optimization Techniques
- Advanced Clustering Algorithms Research
- Traffic Prediction and Management Techniques
- Constraint Satisfaction and Optimization
- Fire effects on ecosystems
- 3D Modeling in Geospatial Applications
- Remote Sensing in Agriculture
- Distributed and Parallel Computing Systems
- Numerical Methods and Algorithms
- Data Quality and Management
- Caching and Content Delivery
- Peer-to-Peer Network Technologies
University of California, Riverside
2016-2024
University of California System
2017-2022
Georgia Institute of Technology
2019
Aalborg University
2019
Hong Kong University of Science and Technology
2019
University of Hong Kong
2019
University of Minnesota
2011-2016
University of Minnesota System
2011-2016
Northwestern University
2015
Twin Cities Orthopedics
2011-2015
This paper describes SpatialHadoop; a full-fledged MapReduce framework with native support for spatial data. SpatialHadoop is comprehensive extension to Hadoop that injects data awareness in each layer, namely, the language, storage, MapReduce, and operations layers. In language adds simple expressive high level types operations. storage adapts traditional index structures, Grid, R-tree R+-tree, form two-level index. enriches layer by two new components, SpatialFileSplitter...
This paper proposes LARS, a location-aware recommender system that uses location-based ratings to produce recommendations. Traditional systems do not consider spatial properties of users nor items, on the other hand, supports taxonomy three novel classes ratings, namely, for non-spatial and items. LARS exploits user rating locations through partitioning, technique influences recommendations with spatially close querying in manner maximizes scalability while sacrificing recommendation...
Despite the increasing importance of data quality and rich theoretical practical contributions in all aspects cleaning, there is no single end-to-end off-the-shelf solution to (semi-)automate detection repairing violations w.r.t. a set heterogeneous ad-hoc constraints. In short, commodity platform similar general purpose DBMSs that can be easily customized deployed solve application-specific problems. this paper, we present NADEEF, an extensible, generalized easy-to-deploy cleaning platform....
This demo presents SpatialHadoop as the first full-fledged MapReduce framework with native support for spatial data. is a comprehensive extension to Hadoop that pushes data inside core functionality of Hadoop. runs existing programs is, yet, it achieves order(s) magnitude better performance than when dealing employs simple high level language, two-level index structure, basic components built layer, and three operations: range queries, k -NN join. Other operations can be similarly deployed...
This paper proposes LARS*, a location-aware recommender system that uses location-based ratings to produce recommendations. Traditional systems do not consider spatial properties of users nor items; on the other hand, supports taxonomy three novel classes ratings, namely, for non-spatial items, and items. LARS* exploits user rating locations through partitioning, technique influences recommendations with spatially close querying in manner maximizes scalability while sacrificing...
SpatialHadoop is an extended MapReduce framework that supports global indexing spatial partitions the data across machines providing orders of magnitude speedup, compared to traditional Hadoop. In this paper, we describe seven alternative partitioning techniques and experimentally study their effect on quality generated index performance range join queries. We found using a 1% sample enough produce high partitions. Also, total area reasonable measure indexes when running join. This will...
Hadoop, employing the MapReduce programming paradigm, has been widely accepted as standard framework for analyzing big data in distributed environments. Unfortunately, this rich was not truly exploited towards processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient algorithms various fundamental problems, namely, polygon union, skyline, convex hull, farthest pair, closest which present set key components other geometric...
Research and development of recommender systems has been a vibrant field for over decade, having produced proven methods “preference-aware” computing. Recommenders use community opinion histories to help users identify interesting items from considerably large search space (e.g., inventory Amazon [7], movies Netflix [9]). Personalization, recommendation, the “human side data-centric applications are even becoming important topics in data management [3]. A popular recommendation method used...
This paper introduces HadoopViz; a MapReduce-based framework for visualizing big spatial data. HadoopViz has three unique features that distinguish it from other techniques. (1) It exposes an extensible interface which allows users to define new visualization types, e.g., scatter plot, road network, or heat map, by defining five abstract functions, without delving into the implementation details of MapReduce algorithms. As is open source, algorithm designers focus on how data should be...
Remote sensing data collected by satellites are now made publicly available several space agencies. This is very useful for scientists pursuing research in applications including climate change, desertification, and land use change. The benefit of this comes from its richness as it provides an archived history over 15 years satellite observations natural phenomena such temperature vegetation. Unfortunately, the limited due to huge size archives (> 500TB) capabilities traditional...
Recently, MapReduce frameworks, e.g., Hadoop, have been used extensively in different applications that include tera-byte sorting, machine learning, and graph processing. With the huge volumes of spatial data coming from sources, there is an increasing demand to exploit efficiency coupled with flexibility framework, However, Hadoop falls short supporting efficiently as core unaware properties. This paper describes SpatialHadoop; a full-edged framework native support for data. SpatialHadoop...
With the huge amounts of spatial data collected everyday, MapReduce frameworks, such as Hadoop, have become a common choice to analyze big for scientists and people from industry. Users prefer use high level languages, Pig Latin, deal with Hadoop simplicity. Unfortunately, these languages are designed primitive non-spatial no support types or functions. This demonstration presents Pigeon, extension which provides functionality in Pig. Pigeon is implemented through user defined functions...
Main memories are becoming sufficiently large that most OLTP databases can be stored entirely in main memory, but this may not the best solution. workloads typically exhibit skewed access patterns where some records hot (frequently accessed) many cold (infrequently or never accessed). It is still more economical to store coldest on secondary storage such as flash. This paper introduces Siberia, a framework for managing data Microsoft Hekaton main-memory database engine. We discuss how...
Abstract The importance and complexity of spatial join operation resulted in the availability many algorithms, some which are tailored for big-data platforms like Hadoop Spark. choice among them is not trivial depends on different factors. This paper proposes first machine-learning-based framework query optimization can accommodate both characteristics datasets algorithms. main challenge how to develop portable cost models that once trained be applied any pair input datasets, because they...
This demo presents Sindbad; a location-based social networking system. Sindbad supports three new services beyond traditional services, namely, location-aware news feed, recommender, and ranking. These not only consider relevance for its users, but they also spatial relevance. Since systems have to deal with large number of messages, user mobility, efficiency scalability are important issues. To this end, encapsulates main inside the query processing engine PostgreSQL. Usage internal...
Real spatial data, e.g., detailed road networks, rivers, buildings, parks, are not really available in most of the world. This hinders practicality many research ideas that need a real data for testing experiments. Such is often governmental use, or at major software companies, but it prohibitively expensive to build buy academia individual researchers. demo presents TAREEG; web-service makes from anywhere world, fingertips every researcher individual. TAREEG gets all its by leveraging...
In this tutorial, we present the recent work in database community for handling Big Spatial Data. This topic became very hot due to explosion amount of spatial data generated by smart phones, satellites and medical devices, among others. tutorial goes beyond use existing systems as-is (e.g., Hadoop, Spark or Impala), digs deep into core components big indexing query processing) describe how they are designed handle data. During 90-minute review state-of-the-art area Data while classifying...
The recent explosion in the amount of spatial data calls for specialized systems to handle big data. In this paper, we discuss main features and components that needs be supported a system efficiently. We review work area according these four components, namely, language, indexing, query processing, visualization. describe each component, details, give examples how it is implemented existing work. After that, few case studies show they support components. This assists researchers...
In this tutorial, we present the recent work in database community for handling Big Spatial Data. This topic became very hot due to explosion amount of spatial data generated by smartphones, satellites and medical devices, among others. tutorial goes beyond use existing systems as-is (e.g., Hadoop, Spark or Impala), digs deep into core components big indexing query processing) describe how they are designed handle data. During 90-minute review state-of-the-art area Data while classifying...
In recent years several extensions of Hadoop system have been proposed for dealing with spatial data and SpatialHadoop belongs to this group. the MapReduce paradigm a task can be parallelized by partitioning into chunks performing same operation on them, eventually combining partial results at end. Thus, applied technique tremendously affect performance parallel execution, since it is key point obtaining balanced map tasks. However, when skewed distributed datasets are considered, using...
This paper introduces the open-source Beast system for scalable exploratory data science on big spatio-temporal data. is based well-established research and has been released to assist community with analyzing provides a set of extensible components that naturally integrate Spark build pipelines. can install in less than minute an existing cluster wide array features including loading vector raster represented standard file formats, synthetic generation benchmarking, load-balanced spatial...