NFDI4DS | UHH-SEMS - Publication Details

Nan Tang

ORCID: 0000-0003-2832-0295

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101824160

Research Areas

Data Quality and Management
Advanced Database Systems and Queries
Data Management and Algorithms
Topic Modeling
Privacy-Preserving Technologies in Data
Data Mining Algorithms and Applications
Semantic Web and Ontologies
Scientific Computing and Data Management
Machine Learning and Data Classification
Natural Language Processing Techniques
Data Visualization and Analytics
Anomaly Detection Techniques and Applications
Web Data Mining and Analysis
Advanced Data Storage Technologies
Distributed systems and fault tolerance
Graph Theory and Algorithms
Research Data Management Practices
Software Engineering Research
Cloud Data Security Solutions
Advanced Computational Techniques and Applications
Video Analysis and Summarization
Advanced Text Analysis Techniques
Data Stream Mining Techniques
Machine Learning and Algorithms
Mobile Crowdsensing and Crowdsourcing

Hong Kong University of Science and Technology
2023-2025

University of Hong Kong
2023-2025

Chengdu University of Technology
2021-2024

Zhejiang Police College
2024

Shenzhen Children's Hospital
2024

Qatar Airways (Qatar)
2013-2023

Qatar Cardiovascular Research Center
2013-2023

South China University of Technology
2023

Hamad bin Khalifa University
2016-2022

University of Electronic Science and Technology of China
2020

Graph pattern matching

OPENALEX - Publications

Wenfei Fan Jianzhong Li Shuai Ma Nan Tang Yinghui Wu and 1 more

Graph pattern matching is typically defined in terms of subgraph isomorphism, which makes it an np-complete problem. Moreover, requires bijective functions, are often too restrictive to characterize patterns emerging applications. We propose a class graph patterns, edge denotes the connectivity data within predefined number hops. In addition, we define based on notion bounded simulation, extension simulation. show that with this revision, can be performed cubic-time, by providing such...

10.14778/1920841.1920878 article EN Proceedings of the VLDB Endowment 2010-09-01

NADEEF

OPENALEX - Publications

Michele Dallachiesa Amr Ebaid Ahmed Eldawy Ahmed K. Elmagarmid Ihab F. Ilyas and 2 more

Despite the increasing importance of data quality and rich theoretical practical contributions in all aspects cleaning, there is no single end-to-end off-the-shelf solution to (semi-)automate detection repairing violations w.r.t. a set heterogeneous ad-hoc constraints. In short, commodity platform similar general purpose DBMSs that can be easily customized deployed solve application-specific problems. this paper, we present NADEEF, an extensible, generalized easy-to-deploy cleaning platform....

10.1145/2463676.2465327 article EN 2013-06-22

KATARA

OPENALEX - Publications

Xu Chu John Morcos Ihab F. Ilyas Mourad Ouzzani Paolo Papotti and 2 more

Classical approaches to clean data have relied on using integrity constraints, statistics, or machine learning. These are known be limited in the cleaning accuracy, which can usually improved by consulting master and involving experts resolve ambiguity. The advent of knowledge bases KBs both general-purpose within enterprises, crowdsourcing marketplaces providing yet more opportunities achieve higher accuracy at a larger scale. We propose KATARA, base crowd powered system that, given table,...

10.1145/2723372.2749431 article EN 2015-05-27

Detecting data errors

OPENALEX - Publications

Ziawasch Abedjan Xu Chu Dong Deng Raul Castro Fernandez Ihab F. Ilyas and 4 more

Data cleaning has played a critical role in ensuring data quality for enterprise applications. Naturally, there been extensive research this area, and many algorithms have translated into tools to detect possibly repair certain classes of errors such as outliers, duplicates, missing values, violations integrity constraints. Since different types may coexist the same set, we often need run more than one kind tool. In paper, investigate two pragmatic questions: (1) are these robust enough...

10.14778/2994509.2994518 article EN Proceedings of the VLDB Endowment 2016-08-01

DeepEye: Towards Automatic Data Visualization

OPENALEX - Publications

Yuyu Luo Xuedi Qin Nan Tang Guoliang Li

Data visualization is invaluable for explaining the significance of data to people who are visually oriented. The central task automatic is, given a dataset, visualize its compelling stories by transforming (e.g., selecting attributes, grouping and binning values) deciding right type bar or line charts). We present DEEPEYE, novel system that tackles three problems: (1) Visualization recognition: visualization, it "good "bad"? (2) ranking: two visualizations, which one "better"? And (3)...

10.1109/icde.2018.00019 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2018-04-01

Making data visualization more efficient and effective: a survey

OPENALEX - Publications

Xuedi Qin Yuyu Luo Nan Tang Guoliang Li

10.1007/s00778-019-00588-3 article EN The VLDB Journal 2019-11-19

Distributed representations of tuples for entity resolution

OPENALEX - Publications

Muhammad Ebraheem Saravanan Thirumuruganathan Shafiq Joty Mourad Ouzzani Nan Tang

Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years all aspects of ER, there still high demand for democratizing ER - humans are heavily involved labeling data, performing feature engineering, tuning parameters, and defining blocking functions. With recent advances deep learning, particular distributed representation words (a.k.a. word embeddings), we present novel system, called DeepER, that achieves good accuracy, efficiency, as well ease-of-use...

10.14778/3236187.3236198 article EN Proceedings of the VLDB Endowment 2018-07-01

Towards certain fixes with editing rules and master data

OPENALEX - Publications

Wenfei Fan Jianzhong Li Shuai Ma Nan Tang Wenyuan Yu

A variety of integrity constraints have been studied for data cleaning. While these can detect the presence errors, they fall short guiding us to correct errors. Indeed, repairing based on may not find certain fixes that are absolutely correct, and worse, introduce new errors when data. We propose a method finding fixes, master data, notion regions , class editing rules . region is set attributes assured by users. Given tell what fix how update them. show be used in monitoring enrichment....

10.14778/1920841.1920867 article EN Proceedings of the VLDB Endowment 2010-09-01

Adding regular expressions to graph reachability and pattern queries

OPENALEX - Publications

Wenfei Fan Jianzhong Li Shuai Ma Nan Tang Yinghui Wu

It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such we propose class reachability queries and graph patterns, an edge specified with regular expression certain form, expressing the connectivity data via various types. In addition, define pattern matching based on revised notion simulation. On emerging applications as social networks, show that these are capable finding more sensible information than their traditional...

10.1109/icde.2011.5767858 article EN 2011-04-01

BigDansing

OPENALEX - Publications

Zuhair Khayyat Ihab F. Ilyas Alekh Jindal Samuel Madden Mourad Ouzzani and 4 more

Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling big datasets. This presents a serious impediment since data often involves costly computations such as enumerating pairs of tuples, handling inequality joins, dealing user-defined functions. In this paper, we present BigDansing, Big Cleansing system tackle efficiency, scalability, ease-of-use issues in cleansing. The can run top most common general purpose processing platforms,...

10.1145/2723372.2747646 article EN 2015-05-27

Distributed representations of tuples for entity resolution

OPENALEX - Publications

Muhammad Ebraheem Saravanan Thirumuruganathan Shafiq Joty Mourad Ouzzani Nan Tang

Despite the efforts in 70+ years all aspects of entity resolution (ER), there is still a high demand for democratizing ER - by reducing heavy human involvement labeling data, performing feature engineering, tuning parameters, and defining blocking functions. With recent advances deep learning, particular distributed representations words (a.k.a. word embeddings), we present novel system, called DeepER, that achieves good accuracy, efficiency, as well ease-of-use (i.e., much less efforts). We...

10.5555/3236187.3269461 article EN Very Large Data Bases 2018-07-01

Towards certain fixes with editing rules and master data

OPENALEX - Publications

Wenfei Fan Jianzhong Li Shuai Ma Nan Tang Wenyuan Yu

10.1007/s00778-011-0253-7 article EN The VLDB Journal 2011-10-29

Interaction between record matching and data repairing

OPENALEX - Publications

Wenfei Fan Jianzhong Li Shuai Ma Nan Tang Wenyuan Yu

Central to a data cleaning system are record matching and repairing. Matching aims identify tuples that refer the same real-world object, repairing is make database consistent by fixing errors in using constraints. These treated as separate processes current systems, based on heuristic solutions. This paper studies new problem, namely, interaction between We show can effectively help us matches, vice versa. To capture interaction, we propose uniform framework seamlessly unifies operations,...

10.1145/1989323.1989373 article EN 2011-06-12

Reinforcement Learning with Tree-LSTM for Join Order Selection

OPENALEX - Publications

Yu Xiang Guoliang Li Chengliang Chai Nan Tang

Join order selection (JOS) - the problem of finding optimal join for an SQL query is a primary focus database optimizers. The hard due to its large solution space. Exhaustively traversing space prohibitively expensive, which often combined with heuristic pruning. Despite decades-long effort, traditional optimizers still suffer from low scalability or accuracy when handling complicated queries. Recent attempts using deep reinforcement learning (DRL), by encoding trees fixed-length handtuned...

10.1109/icde48307.2020.00116 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

Synthesizing entity matching rules by examples

OPENALEX - Publications

Rohit Singh Venkata Vamsikrishna Meduri Ahmed K. Elmagarmid Samuel Madden Paolo Papotti and 3 more

Entity matching (EM) is a critical part of data integration. We study how to synthesize entity rules from positive-negative examples. The core our solution program synthesis , powerful tool automatically generate (or programs) that satisfy given high-level specification, via predefined grammar. This grammar describes General Boolean Formula ( GBF ) can include arbitrary attribute predicates combined by conjunctions (∧), disjunctions (∨) and negations (¬), expressive enough model EM problems,...

10.14778/3149193.3149199 article EN Proceedings of the VLDB Endowment 2017-10-01

Towards dependable data repairing with fixing rules

OPENALEX - Publications

Jiannan Wang Nan Tang

research-article Towards dependable data repairing with fixing rules Share on Authors: Jiannan Wang UC Berkeley, CA, USA USAView Profile , Nan Tang Qatar Computing Research Institute (QCRI), Doha, QatarView Authors Info & Claims SIGMOD '14: Proceedings of the 2014 ACM International Conference Management DataJune Pages 457–468https://doi.org/10.1145/2588555.2610494Online:18 June 2014Publication History 97citation671DownloadsMetricsTotal Citations97Total Downloads671Last 12 Months42Last 6...

10.1145/2588555.2610494 article EN 2014-06-18

Graph Stream Summarization

OPENALEX - Publications

Nan Tang Qing Chen Prasenjit Mitra

A graph stream, which refers to the with edges being updated sequentially in a form of has important applications cyber security and social networks. Due sheer volume highly dynamic nature streams, practical way handling them is by summarization. Given stream G, directed or undirected, problem summarization summarize G as SG much smaller (sublinear) space, linear construction time constant maintenance cost for each edge update, such that allows many queries over be approximately conducted...

10.1145/2882903.2915223 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

Raha

OPENALEX - Publications

Mohammad Mahdavi Ziawasch Abedjan Raul Castro Fernandez Samuel Madden Mourad Ouzzani and 2 more

Detecting erroneous values is a key step in data cleaning. Error detection algorithms usually require user to provide input configurations the form of rules or statistical parameters. However, providing complete, yet correct, set for each new dataset not trivial, as has know about both and error upfront. In this paper, we present Raha, configuration-free system. By generating limited number that cover various types errors, can generate an expressive feature vector tuple value. Leveraging...

10.1145/3299869.3324956 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

Natural Language to Visualization by Neural Machine Translation

OPENALEX - Publications

Yuyu Luo Nan Tang Guoliang Li Jiawei Tang Chengliang Chai and 1 more

Supporting the translation from natural language (NL) query to visualization (NL2VIS) can simplify creation of data visualizations because if successful, anyone generate by their tabular data. The state-of-the-art NL2VIS approaches (e.g., NL4DV and FlowSense) are based on semantic parsers heuristic algorithms, which not end-to-end designed for supporting (possibly) complex transformations. Deep neural network powered machine models have made great strides in many tasks, suggests that they...

10.1109/tvcg.2021.3114848 article EN IEEE Transactions on Visualization and Computer Graphics 2021-11-16

Deep learning analysis for rapid detection and classification of household plastics based on Raman spectroscopy

OPENALEX - Publications

Yazhou Qin Jiaxin Qiu Nan Tang Yingsheng He Fan Li

10.1016/j.saa.2024.123854 article EN Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy 2024-01-09

Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery

OPENALEX - Publications

Raul Castro Fernandez Essam Mansour Abdulhakim Qahtan Ahmed K. Elmagarmid Ihab F. Ilyas and 4 more

Employees that spend more time finding relevant data than analyzing it suffer from a discovery problem. The large volume of in enterprises, and sometimes the lack knowledge schemas aggravates this Similar to how we navigate Web, propose identify semantic links assist analysts their tasks. These relate tables each other, facilitate navigating schemas. They also external sources, such as ontologies dictionaries, help explain schema meaning. We materialize an enterprise graph, where they become...

10.1109/icde.2018.00093 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2018-04-01

Interactive and Deterministic Data Cleaning

OPENALEX - Publications

Jian He Enzo Veltri Donatello Santoro Guoliang Li Giansalvatore Mecca and 2 more

We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. Falcon does not rely on existence of a set pre-defined quality rules. On contrary, it encourages users explore data, identify possible problems, make updates fix them. Bootstrapped by one user update, guesses sql that can be used The main technical challenge addressed in this paper consists finding is minimal size at same time fixes largest...

10.1145/2882903.2915242 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

Querying shortest paths on time dependent road networks

OPENALEX - Publications

Yong Wang Guoliang Li Nan Tang

For real-world time dependent road networks (TDRNs), answering shortest path-based route queries and plans in real-time is highly desirable by many industrial applications. Unfortunately, traditional ( Dijkstra - or A *-like) algorithms are computationally expensive for such tasks on TDRNs. Naturally, indexes needed to meet the constraint required real In this paper, we propose a novel height-balanced tree-structured index, called TD-G-tree, which supports fast over The key idea use...

10.14778/3342263.3342265 article EN Proceedings of the VLDB Endowment 2019-07-01

DeepEye

OPENALEX - Publications

Yuyu Luo Xuedi Qin Nan Tang Guoliang Li Xinran Wang

Creating good visualizations for ordinary users is hard, even with the help of state-of-the-art interactive data visualization tools, such as Tableau, Qlik, because they require to understand and very well. DeepEye an innovative system that aims at helping everyone create simply like a Google search. Given dataset keyword query, understands query intent, generates ranks visualizations. The user can pick one she likes do further faceted navigation easily navigate candidate In this...

10.1145/3183713.3193545 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Coming Soon ...