Salman Ahmed and Shaikh Hiroyuki. Top-k Outlier Detection from Uncertain Data. International Journal of Automation and Computing, vol. 11, no. 2, pp. 128-142, 2014. DOI: 10.1007/s11633-014-0775-8
Citation: Salman Ahmed and Shaikh Hiroyuki. Top-k Outlier Detection from Uncertain Data. International Journal of Automation and Computing, vol. 11, no. 2, pp. 128-142, 2014. DOI: 10.1007/s11633-014-0775-8

Top-k Outlier Detection from Uncertain Data

  • Uncertain data are common due to the increasing usage of sensors, radio frequency identification (RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclusion of noise, inconsistent supply voltage and delay or loss of data in transfer. In order to manage, query or mine such data, data uncertainty needs to be considered. Hence, this paper studies the problem of top-k distance-based outlier detection from uncertain data objects. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. The naive approach of distance-based outlier detection makes use of nested loop. This approach is very costly due to the expensive distance function between two uncertain objects. Therefore, a populated-cells list (PC-list) approach of outlier detection is proposed. Using the PC-list, the proposed top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. Two approximate top-k outlier detection algorithms are presented to further increase the efficiency of the top-k outlier detection algorithm. An extensive empirical study on synthetic and real datasets is also presented to prove the accuracy, efficiency and scalability of the proposed algorithms.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return