MODELING MISSING DATA WITH MARKOV RANDOM FIELDS IN LARGE DATA SETS

Home

Document Info

Title:	MODELING MISSING DATA WITH MARKOV RANDOM FIELDS IN LARGE DATA SETS
Author(s):	Esa Junttila , Marko Salmenkivi
ISBN:	978-972-8924-40-9
Editors:	Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen)
Year:	2007
Edition:	Single
Keywords:	Spatial data, missing data, Bayesian methods, Markov random fields, linguistic data.
Type:	Full Paper
First Page:	73
Last Page:	80
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	A key issue in data analysis is the treatment of missing data. In spatial domain the assumption of autocorrelation is often employed to make inference about missing data from the observations made in nearby areas. Bayesian data analysis methods provide a well-founded framework for the statistical inference. Due to the computational requirements, they have been used almost exclusively in confirmatory data analysis. Our basic idea is to utilize Bayesian methods as a preprocessing phase in the KDD process. We employ relatively simple but reasonable models, and apply them to large data sets. The resulting posterior distributions can then be analyzed by other data mining methods. In particular, we analyze a large linguistic data set: 17,100 geographic distributions of Finnish dialect words.

	Go Back