Estimating the Number of Frequent Itemsets in a Large Database
Authors
- Ruoming Jin (Kent State University, USA)
- Scott McCallen (Kent State University, USA)
- Yuri Breitbart (Kent State University, USA)
- David Fuhry (Kent State University, USA)
- Dong Wang (Kent State University, USA)
Abstract
Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling estimator. We discover and prove several fundamental but also rather surprising properties of the sampling estimator. We also propose a novel algorithm to estimate the number of frequent itemsets without using sampling. Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.
Session
EDBT Research Session 15: Data Mining (Wednesday, March 25, 16:00—17:30)