Digital Library

cab1

 
Title:      VFX: A VISION-BASED APPROACH TO FORUM DATA EXTRACTION
Author(s):      Chen Hui Ng, Choon Jin Ng and Tong Ming Lim
ISBN:      978-989-8533-90-6
Editors:      Piet Kommers and Guo Chao Peng
Year:      2019
Edition:      Single
Keywords:      Forum Data Extraction, Forum Visual Cues, Forum Layout Structure, Vision-Based Extraction
Type:      Full Paper
First Page:      317
Last Page:      324
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Rapid development of the Internet has dramatically increased information available on the World Wide Web. Amongst these vast sources of information, discussion forums may be useful for businesses and organizations to get a glimpse of customer opinions or to extract product information. Little existing work reported in the literature has systemically investigated the problem of extracting user posts from forum sites. Extracting forum posts accurately raises a few challenges. First, forum comes in a variety of templates and this makes it hard to formalize general rules to extract forum posts. Second, each post record might appear relatively different from each other. This introduces inconsistency in the Document Object Model (DOM) for comparisons. Third, each post in the forum can consist of complicated subtrees rather than a single node in the DOM tree. To tackle these challenges, a vision-based approach was introduced to automatically extract posts from a web forum page based on its visual cues. In this paper, we propose a visual-based forum extraction (VFX) algorithm that can extract user posts in any types of forum without the need to inspect its template structure in advance.
   

Social Media Links

Search

Login