| carpenter | Linux executable | (113 kb) |
| carpenter.exe | Windows console executable | (121 kb) |
| carpenter.zip | C sources, version 1.47, 2012.01.20 | (148 kb) |
| carpenter.tar.gz | (126 kb) | |
| census.zip | census data set (from the UCI ML repository) | (390 kb) |
| census | shell script used for the conversion | (1 kb) |
A program to find closed frequent item sets with the carpenter algorithm (Pan et al. 2003), which enumerates transaction sets, in contrast to many other frequent item set mining algorithms, which enumerate item sets. Such an approach can be highly competitive in special cases, namely if there are few transactions and (very) many items, which is a common situation in biological data sets, like gene expression data. For other data sets, however, it is not a recommendable approach.
The program can also find maximal item sets, but the filtering of the closed item sets is a quick hack and not particularly efficient.
This implementation offers a variant based on transaction identifier lists according to the description in (Pan et al. 2003), although with several optimizations due to which it significantly outperforms the implementation of the Gemini package, which is provided by the authors of (Pan et al. 2003).
The default algorithm, however, is a variant based on an item occurrence counter table, which bears some vague resemblance to the horizontal approach in the RERII algorithm (Cong et al. 2004). This algorithm is the default, because it usually outperforms the variant based on transaction identifier lists.
The improved carpenter algorithm used in this program is described in the following paper:
Some other references:
More information about frequent item set mining, implementations of other algorithms as well as test data sets can be found at the Frequent Itemset Mining Implementations Repository.