Abstract

Motivation: Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performances are unsatisfactory for datasets with high rates of missingness, high measurement noises, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples.


Results:
We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.

Availability: The iMISS package is available for download here. The following algorithms are implemented in this package using C++: KNN, LLS, iLLS-O, iLLS-D, iKNN-D, iKNN-O. The last two are provided for comparison purpose.

Citation: Jianjun Hu, Haifeng Li, Michael S. Waterman, Xianghong Jasmine Zhou, Integrative Missing Value Estimation for Microarray Data. BMC Bioinformatics, 2006

Acknowledgement: This work is supported by the NIH grants R01GM074163, P50HG002790, the NSF grant
0515936, and a pilot grant from the Seaver foundation.