- Web Data Mining and Analysis
- Caching and Content Delivery
- Big Data and Business Intelligence
- Peer-to-Peer Network Technologies
- Topic Modeling
- Data Quality and Management
- Online Learning and Analytics
- Advanced Data Processing Techniques
- Competitive and Knowledge Intelligence
- Smart Agriculture and AI
- Natural Language Processing Techniques
- Air Quality Monitoring and Forecasting
University of Belgrade
2013-2019
In this paper, we present the development and characteristics of a specialized web-scale forum crawler. The main idea is to crawl relevant content from web with minimal server resource consumption, organize crawled into logical units, in order make it easier for further processing analysis. Forum posts contain information that are interest Although forums have different designs, built on technologies, they always identical logic navigation connects homepage particular through lists threads...
In this paper we present a Structure-driven Incremental Forum crawler (SInFo) that targets the latest content in crawling cycles. On Web forum, user generated is almost never changed or deleted, but it constantly added. There wide spectrum of forum technologies have different representations and navigational paths to lead content. Targeting not trivial task, since adding some new often results shifting old between pages. Ignoring way distributed sorted can repetitive visits pages with same...