Brock TR # CS-02-03 Abstract

Maximum consistency of incomplete data via non-invasive imputation    [PDF]
Günther Gediga and Ivo Düntsch, May 2002.

In this paper we describe an algorithm to impute missing values from given data alone, without representational or other assumptions, and analyse its performance. Our approach is based on non- numeric rule based data analysis. In contrast to statistical procedures, such analysis offers no straightforward way to define loss functions or a likelihood function; these are based on statistical pre- assumptions, which are not given in rule based data analysis. Therefore, other optimisation criteria must be used. A simple criterion is the demand that the rules of the system should have a maximum in terms of consistency, which means if we fill a missing entry with a value, we should result in a rule which is consistent with the other rules of the system. Our algorithm imputes missing values in an attribute vector x by presenting a list of possible values drawn from the set of all vectors y which do not contradict x , i.e. they have the same entries wherever both are defined.