Lecture "Frequent Pattern Mining"

Contents

Since the beginning of the 90s, frequent pattern mining has become one of the most actively researched topics in data mining and knowledge discovery in databases. The starting point was market basket analysis and especially the task to mine transactional data, which describe the shopping behavior of customers of supermarkets, mail-order companies and online shops, for products that are frequently bought together. For this task, which became generally known as frequent item set mining, a large number of efficient algorithms were developed, wich are based on sophisticated data structures and clever processing schemes. Among them, Apriori, Eclat, and FP-growth are most widely known. Extensions from item sets to item sequences are fairly straightforward, but open up exciting new application areas, like genome mining and temporal pattern extraction from data describing, for instance, alarms occurring in telecommunication networks. Recent developments tackled the more complex tasks of mining frequent trees and general frequent subgraphs, which have applications in biochemistry, web mining, citation network and program flow analysis. These tasks are more complicated, since the structure of the data introduces new problems, the solution of which is trivial for item sets. This lecture introduces the basic algorithms for frequent item set mining and their extensions to frequent sequences. Then the core problems of mining structured data like trees and general graphs are studied and the core approaches to solve these problems are presented. It closes with a discussion of an application of frequent subgraph mining to the analysis of high throughput screening data for drug discovery and design.