\documentclass[12pt,a4paper]{article}
\pagestyle{empty}
\oddsidemargin 2.1mm
\textwidth 155mm
\topmargin -15mm
\textheight 240mm
\begin{document}
%-----------------------------------------------------------------------
\noindent
{\bf Data Mining 2: Frequent Pattern Mining} \hfill Summer 2017 \\
Christian Borgelt \hfill from 09.05.2017
%-----------------------------------------------------------------------
\vskip3ex
\centerline{\bf Organisational Rules for Exam Admission}
\medskip\noindent
The lecture is accompanied by exercise sheets.
At the beginning of each exercise lesson, the exercises are
{\em voted for}. By voting for an exercise, one expresses one's
willingness to present something about it. (Suggestions for a
solution will be discussed, they need not be correct right away.)
To the exam will be admitted, who
\begin{enumerate}\itemsep0pt\parskip2pt
\item voted for {\em at least\/} half of the exercises {\em and}
\item presented something for {\em at least\/} two exercises.
\end{enumerate}
%-----------------------------------------------------------------------
\vskip3ex
\centerline{\bf Exercise Sheet 1}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 1 \quad\rm Use of Frequent Patterns}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
How could information in the form of frequent item sets
or association rules be used for marketing purposes etc.?
\item[b)\hfill]
What are potential applications of frequent item set mining? \\
Try to find others than those mentioned on the motivation slide
of the lecture!
\item[c)\hfill]
Does its frequency already make a pattern
or a rule interesting? \\
If not, why not? Try to find counterexamples!
\item[d)\hfill]
What are advantages of representing frequent patterns as
association rules compared to representing them as
frequent item sets? What are disadvantages?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 2 \quad\rm Basic Notions}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
Why are transactions databases represented as vectors/arrays? \\
Why can't we use simple sets?
What are alternative representations? \\
What are advantages/disadvantages of alternative representations?
\item[b)\hfill]
Why is the item base often not given explicitly?
How can it be obtained?
\item[c)\hfill]
How can absolute and relative (minimum) support
be translated into each other? \\
What additional information is needed for the transformation?
\item[d)\hfill]
In frequent item set mining we consider only whether an item
is present in or absent from a transaction. However, in market
basket data, items have multiplicities (a customer buys 3 cartons
of milk, 5 apples, 2 cans of soup etc.). \\
How can this be handled?
Can it be handled by preprocessing? If yes, how?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 3 \quad\rm Properties of Support}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
What does it mean that support
is anti-monotone or downward closed? \\
How can this be expressed as a formula?
\item[b)\hfill]
What relation of item set covers corresponds
to the anti-monotonicity of support?
\item[c)\hfill]
What is the apriori property?
How is it stated formally? How does it follow?
What does the contraposition of the apriori property say?
What does it suggest?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 4 \quad\rm Properties of Support}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
Demonstrate that the support of an item set
and one of its supersets can be equal!
(That is, the support of the superset need
not be actually smaller.)
How can one describe the condition under which this holds?
Can there be multiple different supersets
that have the same support?
\item[b)\hfill]
Let $I$ be an item set and $a$ and $b$ two items
that are not in $I$. \\
Let $s(I) = s(I \cup \{a\}) = s(I \cup \{b\})$.
What is the support of $I \cup \{a,b\}$?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 5 \quad\rm Properties of Support}
Find an example where one can determine from the support of
(several) item sets $I_k$ that $\bigcup_k I_k$ must be frequent
w.r.t. $s_{\min} \ge 3$ without accessing the transaction database
(you may, however, use the size~$n$ of the transaction
database). \\
Do {\em not\/} use the solution/assumptions of Exercise~4b!
%-----------------------------------------------------------------------
\subsubsection*{Exercise 6 \quad\rm Searching for Frequent Patterns}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
What is the size of the search space
of frequent item set mining? \\
How does it grow with the number of items?
\item[b)\hfill]
Why is an exhaustive search through all possible
item sets not advisable? \\
What is the (asymptotic) time complexity
of an exhaustive search? \\
Is the pruned/reduced search fundamentally more efficient? \\
If yes, in what way? If no, why may it be useful nevertheless?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 7 \quad\rm Searching for Frequent Patterns}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
How many subsets of size $k-1$ does an item set
with $k$ items have? \\
Why is this a problem for the candidate generation?
\item[b)\hfill]
Show why any frequent item set of size $k+1$ can be formed in
$j = (k+1)k/2$ possible ways with the original version of the
Apriori algorithm!
\item[c)\hfill]
In how many ways can an {\em in\/}frequent item set
of size $k+1$ be generated? \\
Does this number differ from the answer for frequent item sets?
Why?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 8 \quad\rm Searching for Frequent Patterns}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
What structure of the search space are we exploiting
for the search? \\
Why is this structure advantageous
given the properties of support?
\item[b)\hfill]
What is a Hasse diagram? What does it represent? \\
For what are we using Hasse diagrams?
\item[c)\hfill]
How is the Hasse diagram transformed to improve the search? \\
What is the purpose of this transformation?
\item[d)\hfill]
What do we achieve by assigning unique parent item sets? \\
Could we also assign unique child item sets?
\item[e)\hfill]
Are there any item sets
that always have a unique parent item set, \\
even in the unmodified Hasse diagram?
\end{enumerate}
%-----------------------------------------------------------------------
\end{document}