Title:
|
LOG DATA PREPARATION FOR MINING WEB USAGE PATTERNS |
Author(s):
|
G. Castellano , A. M. Fanelli , M. A. Torsello |
ISBN:
|
978-972-8924-30-0 |
Editors:
|
Nuno Guimarães and Pedro Isaías |
Year:
|
2007 |
Edition:
|
Single |
Keywords:
|
Data cleaning, data filtering, data preprocessing, user sessions identification , Web usage mining. |
Type:
|
Full Paper |
First Page:
|
371 |
Last Page:
|
378 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In this paper we focus on log data preprocessing, the first step of a common Web Usage Mining process. In particular, we present LODAP (LOg DAta Preprocessor), a software tool which we designed and implemented in order to perform preprocessing of log data. The working scheme of LODAP embraces several steps. Firstly, log files are cleaned by removing irrelevant data. Then, the remaining requests are structured into user sessions, encoding the browsing behavior of users. Successively, the uninteresting sessions and the least visited pages are removed in order to reduce the size of data concerning the previously extracted user sessions. In addition, LODAP allows to create reports containing the results obtained in each step and information summaries mined from the analysis of the considered log files. During the preprocessing through LODAP, the analyst is guided by a sequence of panels representing the wizard-based interface which characterizes the tool. Each panel is a graphical window which offers a basic function of the preprocessor. Preliminary results on log files of a specific Web site show that the implemented tool can effectively reduce the log data size and identify user sessions encoding the user browsing behavior in a significant manner. |
|
|
|
|