- Web Data Mining and Analysis
- Advanced Malware Detection Techniques
- Caching and Content Delivery
- Web Application Security Vulnerabilities
- Surface Chemistry and Catalysis
- Mobile and Web Applications
- Advanced Database Systems and Queries
- Algorithms and Data Compression
- Software Testing and Debugging Techniques
- Service-Oriented Architecture and Web Services
- Mobile Agent-Based Network Management
- Software Engineering Research
- Peer-to-Peer Network Technologies
- Web Applications and Data Management
- Spam and Phishing Detection
- Semantic Web and Ontologies
- Data Quality and Management
- Computability, Logic, AI Algorithms
- Web visibility and informetrics
- Scientific Computing and Data Management
- Business Process Modeling and Analysis
- Advanced Image and Video Retrieval Techniques
- Data Management and Algorithms
- Distributed and Parallel Computing Systems
- Industrial Automation and Control Systems
Universidade da Coruña
2004-2015
Semi-automatic wrapper generation tools aim to ease the task of building structured views over Web sources. But techniques presented date show several weaknesses when dealing with complex commercial sources today, especially constructing advanced navigational sequences for accessing data. We present Wargo, a semi-automatic tool, which has been used by non-programmer staff successfully wrap more than 700 in industrial applications.
The crawler engines of today cannot reach most the information contained in Web. A great amount valuable is "hidden" behind query forms online databases, and/or dynamically generated by technologies such as Javascript. This portion web usually known Deep Web or Hidden We have built DeepBot, a prototype hidden-web focused able to access content. DeepBot receives set domain definitions an input, each one describing specific data-collecting task and automatically identifies learns execute...
During the last years, significant attention has been paid to problem of building wrappers for extracting data from semistructured web sources. Nevertheless, since sources are autonomous, they may experience changes that invalidate wrappers. In this paper, we present new heuristics and algorithms address automatic wrapper maintenance. Our approach is based on collecting query results during operation using them later generate sets examples can be used induce a when source changes.
The problem of data extraction from the deep Web can be divided into two tasks: crawling client-side and server-side Web. objective is to define an architecture a set related techniques access information placed in This involves dealing with aspects such as JavaScript technology, nonstandard session maintenance mechanisms, client redirections, pop-up menus, etc. We use current browser APIs building blocks leverage them implement novel models algorithms
A substantial subset of the Web data follows some kind underlying structure. Nevertheless, HTML does not contain any schema or semantic information about it represents. program able to provide software applications with a structured view those semi-structured sources is usually called wrapper. Wrappers are accept query against source and return set results, thus enabling access in similar manner that from databases. significant problem this approach arises because may experiment changes...
In order to let software programs gain full benefit from semi-structured Web sources, wrapper must be built provide a "machine readable" view over them. A significant problem of this approach is that, since sources are autonomous, they may experience changes that invalidate the current wrapper. paper, we address by introducing novel heuristics and algorithms for automatically maintaining wrappers. our approach, system collects some query results during normal operation and, when source...
Web automation applications are widely used for different purposes such as B2B integration, web mashups, automated testing of applications, Internet metasearch or technology and business watch. One crucial part in intensive that require real time responses, is them to execute the navigation sequences shortest possible time. The approach building automatic component by using APIs conventional browsers, followed most current systems, not appropriate scenario, because it presents performance...