Using neighbors to date web documents
Header
Sample (material)
Feature (linguistics)
DOI:
10.1145/1316902.1316924
Publication Date:
2007-11-16T10:58:50Z
AUTHORS (3)
ABSTRACT
Time has been successfully used as a feature in web information retrieval tasks. In this context, estimating document's inception date or last update is necessary task. Classic approaches have HTTP header fields to estimate time. The main problem with approach that it applicable small part of documents. work, we evaluate an alternative strategy based on neighborhood. Using random sample containing 10,000 URLs from the Yahoo! Directory, study each links and media assets determine its age. If only consider isolated documents, are able 52% them. Including neighborhood, more than 86% same sample. Also, find estimates differ significantly according type neighbors used. most reliable assets, while worst incoming links. These results experimentally evaluated real world application using different datasets.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (18)
CITATIONS (13)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....