Technical Papers Parallel Session-V: A framework for focused linked data crawler using context graphs

Abstract/Description

In this paper, we propose a framework for focused Linked Data (LD) crawler based on context graphs. A focused crawler searches for a specific subset of web, in our case it targets interlinked RDF data stores. The proposed crawler constructs set of context graphs for the given seed URIs by back crawling the web, and classifiers are trained to detect and assign documents to different categories based on the content type. These classifier help crawler in search and updating of context graphs automatically. The crawler are trained using supervised learning. Additionally, an extensive overview of existing LD crawlers is also provided along with its basic requirements, architecture, issues and challenges.

Location

C-10, AMAN CED

Session Theme

Technical Papers Parallel Session-V (Information Retrieval)

Session Type

Parallel Technical Session

Start Date

13-12-2015 4:10 PM

End Date

13-12-2015 4:30 PM

Share

COinS
 
Dec 13th, 4:10 PM Dec 13th, 4:30 PM

Technical Papers Parallel Session-V: A framework for focused linked data crawler using context graphs

C-10, AMAN CED

In this paper, we propose a framework for focused Linked Data (LD) crawler based on context graphs. A focused crawler searches for a specific subset of web, in our case it targets interlinked RDF data stores. The proposed crawler constructs set of context graphs for the given seed URIs by back crawling the web, and classifiers are trained to detect and assign documents to different categories based on the content type. These classifier help crawler in search and updating of context graphs automatically. The crawler are trained using supervised learning. Additionally, an extensive overview of existing LD crawlers is also provided along with its basic requirements, architecture, issues and challenges.