- Advanced Database Systems and Queries
- Data Management and Algorithms
- Web Data Mining and Analysis
- Semantic Web and Ontologies
- Data Mining Algorithms and Applications
- Service-Oriented Architecture and Web Services
- Business Process Modeling and Analysis
- Cloud Computing and Resource Management
- Distributed and Parallel Computing Systems
- BIM and Construction Integration
- Face and Expression Recognition
- Data Quality and Management
- Distributed systems and fault tolerance
- Usability and User Interface Design
- Scientific Computing and Data Management
- Multimedia Communication and Technology
- Web Applications and Data Management
- Mobile Agent-Based Network Management
- Context-Aware Activity Recognition Systems
- Neural Networks and Applications
- Algorithms and Data Compression
- Multi-Agent Systems and Negotiation
- Web visibility and informetrics
- Advanced Computational Techniques and Applications
- Sparse and Compressive Sensing Techniques
University of Hong Kong
1999-2014
Hong Kong University of Science and Technology
2004-2014
University of Science and Technology
2004
University of Toronto
1980-1994
Systems Research Institute
1991
Esri (Canada)
1986-1990
Understanding the intent behind a user's query can help search engine to automatically route some corresponding vertical engines obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches predict mainly utilize machine learning techniques. However, it is difficult often requires many human efforts meet all these by...
Many tools have been developed to help users query, extract and integrate data from web pages generated dynamically databases, i.e., the Hidden Web. A key prerequisite for such is obtain schema of attributes retrieved data. In this paper, we describe a system called, DeLa, which reconstructs (part of) "hidden" back-end database. It does by sending queries through HTML forms, automatically generating regular expression wrappers objects result restoring into an annotated (labelled) table. The...
Many tools have been developed to help users query, extract and integrate data from web pages generated dynamically databases, i.e., the Hidden Web. A key prerequisite for such is obtain schema of attributes retrieved data. In this paper, we describe a system called, DeLa, which reconstructs (part of) "hidden" back-end database. It does by sending queries through HTML forms, automatically generating regular expression wrappers objects result restoring into an annotated (labelled) table. The...
Feature selection is an important component of text categorization. This technique can both increase a classifier's computation speed, and reduce the overfitting problem. Several feature methods, such as information gain mutual information, have been widely used. Although they greatly improve performance, common drawback, which that do not consider relationships among features. In this situation, where one feature's predictive power weakened by others, selected features tend to bias towards...
Online databases respond to a user query with result records encoded in HTML files. Data extraction, which is important for many applications, extracts the from files automatically. We present novel data extraction method, ODE (Ontology-assisted Extraction), automatically pages. first constructs an ontology domain according information matching between interfaces and pages different Web sites within same domain. Then, constructed used during identify section page align label values extracted...
Record matching, which identifies the records that represent same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, requires user to provide training data. These not applicable Web database scenario, where match query results dynamically generated on-the-fly. Such query-dependent and a prelearned method using examples from previous may fail on of new query. To address problem in we present unsupervised, online method,...
In this paper, regularization path algorithms were proposed as a novel approach to the model selection problem by exploring of possibly all solutions with respect some hyperparameter in an efficient way. This was later extended support vector regression (SVR) called epsiv -SVR. However, method requires that error parameter be set <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a</i> xmlns:xlink="http://www.w3.org/1999/xlink">priori</i> . is only...
Web databases generate query result pages based on a user's query. Automatically extracting the data from these is very important for many applications, such as integration, which need to cooperate with multiple web databases. We present novel extraction and alignment method called CTVS that combines both tag value similarity. automatically extracts by first identifying segmenting records (QRRs) in then aligning segmented QRRs into table, values same attribute are put column. Specifically,...
Although the concept of roles is becoming a popular research issue in object oriented databases and has been proven to be useful for dynamic evolving applications, it only described conceptually most previous work. Moreover, important issues such as semantics (e.g., message passing) are seldom discussed. Furthermore, none work investigated idea role player qualification, which models fact that not every qualified play particular role. We present data model roles. discuss each above...
We propose a novel algorithm, DSE (data-rich subtree extraction) to recognize and extract the data-rich section of an HTML page. apply algorithm as pre-processing "clean-up" step for two typical Web information retrieval problems: topic distillation extraction. Our experiments show that, test data sets used, can correctly identify sections pages with 100% accuracy. Therefore, it effectively reduce root set size problem thereby improving precision accuracy IETS algorithm. Furthermore, when...
ADOME, which stands for ADvanced Object Modeling Environment, is an approach to integrating data and knowledge management based on object oriented technology. Next generation information systems will require more flexible modeling capabilities than those provided by current DBMSs. In particular, integration of become increasingly important. this context, ADOME provides versatile role facilities that serve as "dynamic binders" between objects production rules, thereby facilitating...
To improve the effectiveness of office workers in their decision making, systems have been built to support (rather than replace) judgment. However, these model work a centralized environment, and/or they can only single worker. Office that is divided into specialized domains handled by different (where cooperation needed order accomplish work) not supported. In this paper, we will present supports problem solving logically distributed environment. (In some systems, information...
The choice of the kernel function which determines mapping between input space and feature is crucial importance to methods.The past few years have seen many efforts in learning either or matrix.In this paper, we address model selection issue by hyperparameter for a support vector machine (SVM).We trace solution path with respect without having train multiple times.Given value optimal obtained that value, find solutions neighborhood hyperparameters can be calculated exactly.However, does not...
User performance as well system related considerations should be part of the DBMS selection process. However, appropriate procedures and measures for determining user characteristics a are lacking. This paper describes methodology proposes set level data model language DBMS. The was applied to three DBMS's having different models languages results its application discussed.
Recently, a very appealing approach was proposed to compute the entire solution path for support vector classification (SVC) with low extra computational cost. This later extended regression (SVR) model called ε-SVR. However, method requires that error parameter ε be set priori, which is only possible if desired accuracy of approximation can specified in advance. In this paper, we show ε-SVR also piecewise linear respect ε. We further propose an efficient algorithm exploring two-dimensional...
To deal with the problem of too many results returned from an E-commerce Web database in response to a user query, this paper proposes novel approach rank query results. Based on we speculate how much cares about each attribute and assign corresponding weight it. Then, for tuple result, value is assigned score according its "desirableness" user. These scores are combined weights get final ranking tuple. Tuples top presented first. Our method domain independent requires no feedback....
Today's office is plagued by problems of rising costs and low productivity. Office automation information systems are proposed as a possible solution to many the handling office. The technology available market place ready for automation. However, several challenges need be met before remedy can applied effectively. Automated solutions different aspects problem integrated. Models techniques developed represent analyze flow in an Interfaces that easy use integrate capabilities. Finally, one...
While there are many difficulties in computerizing office tasks, two of the major ones a lack appropriate end-user facilities for specifying tasks and inadequate system-level support managing tasks. We investigating these issues within Office Task Manager (OTM) project at University Toronto. To address user-level aspects we believe that programming-by-example approach to task specification holds much promise providing workers with help them computerize their own activities. outline our such...