Digital Library

cab1

 
Title:      SPAM FILTERING USING COMPRESSION
Author(s):      M. Farmer , G. Richard , F. Faure , M. Lopusniac
ISBN:      978-972-8924-49-2
Editors:      Sandeep Krishnamurthy and Pedro Isaías
Year:      2007
Edition:      Single
Keywords:      Spam, Kolmogorov Complexity, Compression, Clustering, K-Nearest Neighbors
Type:      Full Paper
First Page:      67
Last Page:      74
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      One of the most irrelevant side effects of the e-commerce technology is the development of spamming as an e-marketing technique. Spam emails (or unsolicited commercial emails) induce a burden for everybody having an electronic mailbox: detecting and filtering spam is then a challenging task and a lot of approaches have been developed to identify a spam before it is posted in the end user mailbox. In this paper, we focus on a relatively new approach whose foundations rely on the works of A. Kolmogorov. The main idea is to give a formal meaning to the notion of “information content” and to provide a measure of this content. Using such a quantitative approach, it becomes possible to define a distance which is a major tool for classification purpose. To validate our approach, we proceed in two steps: first we use the classical compression distance over a mix of spam and legitimate emails to check out if they can be properly clustered without any supervision. It has been the case, highlighting a kind of underlying structure for the spam emails. In a second step, we implement a k-nearest neighbors algorithm providing 85% as accuracy rate. Coupled with other anti-spam techniques, compression-based methods could bring a great help in the spam filtering challenge.
   

Social Media Links

Search

Login