Contents |
|
|
Since the beginning of the 90s, frequent pattern mining has become
one of the most actively researched topics in data mining and knowledge
discovery in databases. The starting point was market basket analysis
and especially the task to mine transactional data, which describe the
shopping behavior of customers of supermarkets, mail-order companies
and online shops, for products that are frequently bought together.
For this task, which became generally known as frequent item set mining,
a large number of efficient algorithms were developed, wich are based
on sophisticated data structures and clever processing schemes. Among
them, Apriori, Eclat, and FP-growth are most widely known. Extensions
from item sets to item sequences are fairly straightforward, but open
up exciting new application areas, like genome mining and temporal
pattern extraction from data describing, for instance, alarms occurring
in telecommunication networks. Recent developments tackled the more
complex tasks of mining frequent trees and general frequent subgraphs,
which have applications in biochemistry, web mining, citation network
and program flow analysis. These tasks are more complicated, since the
structure of the data introduces new problems, the solution of which
is trivial for item sets. This lecture introduces the basic algorithms
for frequent item set mining and their extensions to frequent sequences.
Then the core problems of mining structured data like trees and general
graphs are studied and the core approaches to solve these problems are
presented. It closes with a discussion of an application of frequent
subgraph mining to the analysis of high throughput screening data for
drug discovery and design.