Technical Library

A SQL-Based Implementation for the Discovery of Frequent Itemsets

Document Type

Thesis

Christopher R. Reader, Grand Valley State University

Advisors

Dr. Jamal Alsabbagh, alsabbaj@gvsu.edu

Committee Members

Dr. Yonglei Tao, taoy@gvsu.edu; Dr. Christian Trefftz, trefftzc@gvsu.edu

Embargo Period

8-13-2010

Abstract

An important problem in data mining is the discovery of frequent itemsets. Briefly, given a large set of transactions, each of which contains items drawn from a possible set of N item types, we are interested in counting those subsets (from among the possible 2N subsets) that occur frequently (i.e. above a given threshold) in the transactions. The problem is computationally expensive since the value of N can be in the thousands (e.g. words in a document or items in a large retail chain). In order to manage the computational complexity, algorithms for discovering frequent itemsets rely, in one way or another, on the Apriori principle which states that all the subsets of a frequent itemset must themselves be frequent.

The vast majority of published algorithms take as input a flat file of transactions and use algorithm-specific data structures and optimization techniques. This research explores an implementation using SQL on relational data. The motivation is to capitalize on the data storage and query optimization capabilities of a typical relational database management system.

A tightly-coupled implementation of the Apriori and Apriori TID algorithms has been done as part of this research. The performance of this implementation has been compared empirically with a classical published implementation using several available standard datasets.

ScholarWorks Citation

Reader, Christopher R., "A SQL-Based Implementation for the Discovery of Frequent Itemsets" (2010). Technical Library. 1.
https://scholarworks.gvsu.edu/cistechlib/1

This document is currently not available here.

COinS

ScholarWorks@GVSU

Technical Library

A SQL-Based Implementation for the Discovery of Frequent Itemsets

Document Type

Advisors

Committee Members

Embargo Period

Abstract

ScholarWorks Citation

Browse

Author Information

ScholarWorks@GVSU

Technical Library

A SQL-Based Implementation for the Discovery of Frequent Itemsets

Document Type

Authors

Advisors

Committee Members

Embargo Period

Abstract

ScholarWorks Citation

Share

Browse

Author Information