EDBT/ICDT 2009 Proceedings

Estimating the Number of Frequent Itemsets in a Large Database

Authors

Ruoming Jin (Kent State University, USA)
Scott McCallen (Kent State University, USA)
Yuri Breitbart (Kent State University, USA)
David Fuhry (Kent State University, USA)
Dong Wang (Kent State University, USA)

Abstract

Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling estimator. We discover and prove several fundamental but also rather surprising properties of the sampling estimator. We also propose a novel algorithm to estimate the number of frequent itemsets without using sampling. Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.

Session

EDBT Research Session 15: Data Mining (Wednesday, March 25, 16:00—17:30)

EDBT/ICDT 2009 Joint Conference

Electronic Conference Proceedings

Estimating the Number of Frequent Itemsets in a Large Database

Authors

Abstract

Session