Title:
|
DATA PREPROCESSING DEPENDENCY FOR WEB USAGE MINING BASED ON SEQUENCE RULE ANALYSIS |
Author(s):
|
Michal Munk , Jozef Kapusta , Peter vec |
ISBN:
|
978-972-8924-88-1 |
Editors:
|
Ajith P. Abraham |
Year:
|
2009 |
Edition:
|
Single |
Keywords:
|
Sequence rule analysis, web usage mining, data preprocessing |
Type:
|
Poster/Demonstration |
First Page:
|
179 |
Last Page:
|
181 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Systematic analysis of a portal with modifying content on regular basis represents a very important phase of its
development. Data for the analysis is provided by a web server log file. However, the analysis of the file log is time
consuming and so is data preprocessing from the file. Purging the data by excluding the search engines visits and perhaps
also visitors coming from NAT or proxy devices is very important. We also detect user sessions by defining time slots. In
this paper we are dealing with a problem which data preprocessing steps are required and define which of these steps can
be integrated and automated. We made an experiment and compared results of sequence rule analysis of four files
preprocessed in different levels. We tracked count of web accesses, count of costumers sequences, count of frequented
sequences, and proportion of discovered rules and values of confidence of discovered rules between the files. Experiment
results suggest that including the session time slots is very important for sequence rule analysis despite excluding search
engines robots. |
|
|
|
|