Zhifeng Bao

ORCID: 0000-0003-2477-381X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Human Mobility and Location-Based Analysis
  • Geographic Information Systems Studies
  • Semantic Web and Ontologies
  • Web Data Mining and Analysis
  • Traffic Prediction and Management Techniques
  • Data Quality and Management
  • Transportation Planning and Optimization
  • Advanced Graph Neural Networks
  • Time Series Analysis and Forecasting
  • Recommender Systems and Techniques
  • Complex Network Analysis Techniques
  • Algorithms and Data Compression
  • Topic Modeling
  • Data Mining Algorithms and Applications
  • Caching and Content Delivery
  • Anomaly Detection Techniques and Applications
  • Graph Theory and Algorithms
  • Transportation and Mobility Innovations
  • Privacy-Preserving Technologies in Data
  • Data Visualization and Analytics
  • Data Stream Mining Techniques
  • Peer-to-Peer Network Technologies
  • Advanced Image and Video Retrieval Techniques

MIT University
2015-2025

RMIT University
2016-2025

The Royal Melbourne Hospital
2017-2025

Zhejiang University
2019

Nanjing University of Aeronautics and Astronautics
2019

ResearchWorks (United States)
2019

University of Tasmania
2014-2015

National University of Singapore
2007-2014

Institute for Infocomm Research
2013

Yanshan University
2012

In this modern era, traffic congestion has become a major source of severe negative economic and environmental impact for urban areas worldwide. One the most efficient ways to mitigate is through future prediction. The research field prediction evolved greatly ever since its inception in late 70s. Earlier studies mainly use classical statistical models such as ARIMA variants. Recently, researchers have started focus on machine learning because their power flexibility. As theoretical...

10.1109/tkde.2020.3001195 article EN IEEE Transactions on Knowledge and Data Engineering 2020-01-01

Inspired by the great success of information retrieval (IR) style keyword search on Web, XML has emerged recently. The difference between text database and results in three new challenges: (1) Identify user intention, i.e. identify node types that wants to for via. (2) Resolve ambiguity problems: a can appear as both tag name value some node; values different carry meanings. (3) As are sub-trees document, scoring function is needed estimate its relevance given query. However, existing...

10.1109/icde.2009.16 article EN Proceedings - International Conference on Data Engineering 2009-03-01

Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds storage processing capability machine, it calls for large-scale distributed environments. The faces challenges data locality aware partitioning, load balance, easy-to-use interface, versatility to...

10.1145/3183713.3183743 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Identifying the labels of points interest (POIs), aka POI labelling, provides significant benefits in location-based services. However, quality raw manually added by users or generated artificial algorithms cannot be guaranteed. Such low-quality decrease usability and result bad user experiences. In this paper, observing that crowdsourcing is a best-fit for computer-hard tasks, we leverage to improve labelling. To our best knowledge, first work on crowdsourced labelling tasks. particular,...

10.1109/icde.2016.7498229 article EN 2016-05-01

Detecting anomalous trajectory has become an important and fundamental concern in many real-world applications. However, most of the existing studies 1) cannot handle complexity variety data 2) do not support efficient anomaly detection online manner. To this end, we propose a novel model, namely Gaussian Mixture Variational Sequence AutoEncoder (GM-VSAE), to tackle these challenges. Our GM-VSAE model is able (1) capture complex sequential information enclosed trajectories, (2) discover...

10.1109/icde48307.2020.00087 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

In this paper, we study the problem of origin-destination (OD) travel time estimation where OD input consists an pair and a departure time. We propose novel neural network based prediction model that fully exploits important fact neglected by literature -- for past trip its is usually affiliated with trajectory it travels along, whereas does not exist during prediction. At training phase, our goal to design representations trajectory, such they are close each other in latent space. First,...

10.1145/3318464.3389771 article EN 2020-05-29

In this paper, we study the problem of large-scale trajectory data clustering, k -paths, which aims to efficiently identify "representative" paths in a road network. Unlike traditional clustering approaches that require multiple data-dependent hyperparameters, -paths can be used for visual exploration applications such as traffic monitoring, public transit planning, and site selection. By combining map matching with an efficient intermediate representation trajectories novel edge-based...

10.14778/3357377.3357380 article EN Proceedings of the VLDB Endowment 2019-09-01

Over the past few decades, a large number of algorithms have been developed for dimensionality reduction. Despite different motivations these algorithms, they can be interpreted by common framework known as graph embedding. In order to explore significant features data, some sparse regression proposed based on However, problem is that include two separate steps: (1) embedding learning and (2) regression. Thus their performance largely determined effectiveness constructed graph. this paper,...

10.1109/tip.2015.2405474 article EN IEEE Transactions on Image Processing 2015-02-19

This paper presents a new trajectory search engine called Torch for querying road network data. is able to efficiently process two types of typical queries (similarity and Boolean search), support wide variety similarity functions. Additionally, we propose function LORS in measure the more effective efficient manner. Indexing works as follows. First, each raw vehicle transformed set segments (edges) crossings (vertices) on network. Then lightweight edge vertex index LEVI built. Given query,...

10.1145/3209978.3209989 article EN 2018-06-27

We study the problem of index selection to maximize workload performance, which is critical database systems. In contrast existing methods, we seamlessly integrate recommendation rules and deep reinforcement learning, such that can recommend single-attribute multi-attribute indexes together for complex queries meanwhile support multiple-index access a table. Specifically, first propose five heuristic generate candidates. Then, formulate as learning task employ Deep Q Network (DQN) on it....

10.1145/3340531.3412106 article EN 2020-10-19

Accurate house prediction is of great significance to various real estate stakeholders such as owners, buyers, and investors. We propose a location-centered framework that differs from existing work in terms data profiling model. Regarding profiling, we make an important observation follows – besides the in-house features floor area, location plays critical role price prediction. Unfortunately, either overlooked it or had coarse grained measurement locations. Thereby, define capture...

10.1145/3501806 article EN ACM Transactions on Intelligent Systems and Technology 2022-01-05

Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling dynamic documents is an important problem that has received a lot research attention. Existing schemes, however, often sacrifice performance and introduce additional cost to facilitate arbitrary updates even when actually seldom get updated. Since line between static blurred in practice, we believe it design scheme compact efficient regardless whether are frequently updated or...

10.1145/1559845.1559921 article EN 2009-06-29

As business and enterprises generate exchange XML data more often, there is an increasing need for efficient processing of queries on data. Searching the occurrences a tree pattern query in database core operation processing. Prior works demonstrate that holistic twig matching algorithm technique to answer with parent-child (P-C) ancestor-descendant (A-D) relationships, as it can effectively control size intermediate results during However, languages (e.g., XPath XQuery) define axes...

10.1109/tkde.2010.126 article EN IEEE Transactions on Knowledge and Data Engineering 2010-08-24

In this paper, we propose a new location-aware pub/sub system, Elaps, that continuously monitors moving users subscribing to dynamic event streams from social media and E-commerce applications. Users are notified instantly when there is matching nearby. To the best of our knowledge, Elaps first take into account continuous queries against streams. Like existing works on query processing,Elaps employs concept safe region reduce communication overhead. However, unlike which assume data...

10.1145/2723372.2746481 article EN 2015-05-27

Real-time urban traffic speed estimation provides significant benefits in many real-world applications. However, existing information acquisition systems only obtain coarse-grained on a small number of roads but cannot acquire fine-grained every road. To address this problem, paper we study the which, given budget K, identifies K (called seeds) where real speeds these seeds can be obtained using crowdsourcing, and infers other non-seed roads) based seeds. This problem includes two...

10.1109/icde.2016.7498298 article EN 2016-05-01

With the proliferation of mobile devices, large collections geospatial data are becoming available, such as geo-tagged photos. Map rendering systems play an important role in presenting datasets to end users. We propose that should support following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. The first two constraints fundamental challenges a map exploration system, which aims efficiently select small set representative objects...

10.1145/3183713.3183738 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

In this work, we propose a robust road network representation learning framework called Toast, which comes to be cornerstone boost the performance of numerous demanding transport planning tasks. Specifically, first traffic context aware skip-gram module incorporate auxiliary tasks predicting target segment. Furthermore, trajectory-enhanced Transformer that utilizes trajectory data extract traveling semantics on networks. Apart from obtaining effective segment representations, also enables us...

10.1145/3459637.3482293 article EN 2021-10-26

In this paper, we study how to jointly predict travel demands and traffic flows for all regions of a city at future time interval. From an empirical analysis data, outline three desired properties, namely region-level correlations, temporal periodicity inter-traffic correlations. Then, propose comprehensive neural network based prediction model, where various effective embeddings or encodings are designed capture the aforementioned properties. First, design region two forms correlations:...

10.1109/icde51399.2021.00037 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

Inspired by the great success of information retrieval (IR) style keyword search on web, XML has emerged recently. The difference between text database and results in three new challenges: 1) Identify user intention, i.e., identify node types that wants to for via. 2) Resolve ambiguity problems: a can appear as both tag name value some node; values different carry meanings; with meanings. 3) As are subtrees document, scoring function is needed estimate its relevance given query. However,...

10.1109/tkde.2010.63 article EN IEEE Transactions on Knowledge and Data Engineering 2010-04-28

GPS enables mobile devices to continuously provide new opportunities improve our daily lives. For example, the data collected in applications created by Uber or Public Transport Authorities can be used plan transportation routes, estimate capacities, and proactively identify low coverage areas. In this paper, we study a kind of query-Reverse k Nearest Neighbor Search over Trajectories (RkNNT), which for route planning capacity estimation. Given set existing routes D <sub...

10.1109/tkde.2017.2776268 article EN IEEE Transactions on Knowledge and Data Engineering 2017-11-22

Triangle count is a critical parameter in mining relationships among people social networks. However, directly publishing the findings obtained from triangle counts may bring potential privacy concern, which raises great challenges and opportunities for privacy-preserving counting. In this paper, we choose to use differential protect counting large scale graphs. To reduce sensitivity caused graphs, propose novel graph projection method that can be used obtain an upper bound different...

10.1109/tkde.2021.3052827 article EN IEEE Transactions on Knowledge and Data Engineering 2022-10-06

Although many updatable learned indexes have been proposed in recent years, whether they can outperform traditional approaches on disk remains unknown. In this study, we revisit and implement four state-of-the-art disk, compare them against the B+-tree under a wide range of settings. Through our evaluation, make some key observations: 1) Overall, performs well across workload types datasets. 2) A index could or other for specific workload. For example, PGM achieves best performance...

10.1145/3589284 article EN Proceedings of the ACM on Management of Data 2023-06-13
Coming Soon ...