Title:
|
VFX: A VISION-BASED APPROACH TO FORUM DATA
EXTRACTION |
Author(s):
|
Chen Hui Ng, Choon Jin Ng and Tong Ming Lim |
ISBN:
|
978-989-8533-90-6 |
Editors:
|
Piet Kommers and Guo Chao Peng |
Year:
|
2019 |
Edition:
|
Single |
Keywords:
|
Forum Data Extraction, Forum Visual Cues, Forum Layout Structure, Vision-Based Extraction |
Type:
|
Full Paper |
First Page:
|
317 |
Last Page:
|
324 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Rapid development of the Internet has dramatically increased information available on the World Wide Web. Amongst
these vast sources of information, discussion forums may be useful for businesses and organizations to get a glimpse of
customer opinions or to extract product information. Little existing work reported in the literature has systemically
investigated the problem of extracting user posts from forum sites. Extracting forum posts accurately raises a few
challenges. First, forum comes in a variety of templates and this makes it hard to formalize general rules to extract forum
posts. Second, each post record might appear relatively different from each other. This introduces inconsistency in the
Document Object Model (DOM) for comparisons. Third, each post in the forum can consist of complicated subtrees rather
than a single node in the DOM tree. To tackle these challenges, a vision-based approach was introduced to automatically
extract posts from a web forum page based on its visual cues. In this paper, we propose a visual-based forum extraction
(VFX) algorithm that can extract user posts in any types of forum without the need to inspect its template structure in
advance. |
|
|
|
|