Digital Library

cab1

 
Title:      DATA PREPROCESSING DEPENDENCY FOR WEB USAGE MINING BASED ON SEQUENCE RULE ANALYSIS
Author(s):      Michal Munk , Jozef Kapusta , Peter Švec
ISBN:      978-972-8924-88-1
Editors:      Ajith P. Abraham
Year:      2009
Edition:      Single
Keywords:      Sequence rule analysis, web usage mining, data preprocessing
Type:      Poster/Demonstration
First Page:      179
Last Page:      181
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Systematic analysis of a portal with modifying content on regular basis represents a very important phase of its development. Data for the analysis is provided by a web server log file. However, the analysis of the file log is time consuming and so is data preprocessing from the file. Purging the data by excluding the search engines visits and perhaps also visitors coming from NAT or proxy devices is very important. We also detect user sessions by defining time slots. In this paper we are dealing with a problem which data preprocessing steps are required and define which of these steps can be integrated and automated. We made an experiment and compared results of sequence rule analysis of four files preprocessed in different levels. We tracked count of web accesses, count of costumers’ sequences, count of frequented sequences, and proportion of discovered rules and values of confidence of discovered rules between the files. Experiment results suggest that including the session time slots is very important for sequence rule analysis despite excluding search engines robots.
   

Social Media Links

Search

Login