Title:
|
MODELING MISSING DATA WITH MARKOV RANDOM FIELDS IN LARGE DATA SETS |
Author(s):
|
Esa Junttila , Marko Salmenkivi |
ISBN:
|
978-972-8924-40-9 |
Editors:
|
Jörg Roth, Jairo Gutiérrez and Ajith P. Abraham (series editors: Piet Kommers, Pedro Isaías and Nian-Shing Chen) |
Year:
|
2007 |
Edition:
|
Single |
Keywords:
|
Spatial data, missing data, Bayesian methods, Markov random fields, linguistic data. |
Type:
|
Full Paper |
First Page:
|
73 |
Last Page:
|
80 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
A key issue in data analysis is the treatment of missing data. In spatial domain the assumption of autocorrelation is often
employed to make inference about missing data from the observations made in nearby areas. Bayesian data analysis
methods provide a well-founded framework for the statistical inference. Due to the computational requirements, they
have been used almost exclusively in confirmatory data analysis. Our basic idea is to utilize Bayesian methods as a
preprocessing phase in the KDD process. We employ relatively simple but reasonable models, and apply them to large
data sets. The resulting posterior distributions can then be analyzed by other data mining methods. In particular, we
analyze a large linguistic data set: 17,100 geographic distributions of Finnish dialect words. |
|
|
|
|