\documentclass[12pt,a4paper]{article}
\pagestyle{empty}
\oddsidemargin 2.1mm
\textwidth 155mm
\topmargin -15mm
\textheight 240mm
\begin{document}
%-----------------------------------------------------------------------
\noindent
{\bf Data Mining 2: Frequent Pattern Mining} \hfill Summer 2018 \\
Christian Borgelt \hfill from 25.06.2018
%-----------------------------------------------------------------------
\vskip3ex
\centerline{\bf Exercise Sheet 8}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 28 \quad\rm Closed and Maximal Item Sets}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
Suppose we tried to find all closed (or all maximal) (frequent)
item sets by first finding all frequent item sets and then
filtering the result according to the defining conditions for
closed (or maximal) item sets. Is this a good idea? Justify
your answer!
\item[b)\hfill]
What do the notions {\em head\/} and {\em tail\/} of a search
tree node (or a subproblem) mean? Is any of them related to
the {\em prefix\/} of a subproblem?
\item[c)\hfill]
Given that we use a global item order, is there a (fixed)
relationship between the items in the head and those in the tail?
\item[d)\hfill]
Does the relation {\em head\/} $\cup$ {\em tail\/} ${}= B$ hold
for all search tree nodes? If no, are there any search tree nodes
for which this holds or any specific conditions that must be
satisfied so that this holds?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 29 \quad\rm Excluded Items}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
What is the set of {\em excluded items\/} (relative to a search
tree node or subproblem)? How can one characterize this set
based on the subproblem splits we carry out in the
divide-and-conquer algorithm? What could be called, in analogy,
the set of {\em included items}?
\item[b)\hfill]
Suppose the divide-and-conquer scheme
for finding frequent item sets \\
is extended with perfect extension pruning, \\
so that subproblems are described by triplets $S = (T,P,X)$. \\
Is $P \cup X$ always a closed item set?
Is it always a maximal item set? \\
Justify your answer!
\item[c)\hfill]
Is the following statement true? If the tail of a search tree
node is empty, its head is a maximal item set.
Justify your answer!
\item[d)\hfill]
Is the following statement true? If no item in the tail of a
search tree node has the same support as the head, the head
is a closed item set.
Justify your answer!
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 30 \quad\rm Closed and Maximal Item Sets}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
How can one check the defining condition of closed item sets
with a {\em horizontal\/} transaction (database) representation?
\item[b)\hfill]
How can one check the defining condition of closed item sets
with a {\em vertical\/} transaction (database) representation?
\item[c)\hfill]
What is an alternative to checking the defining condition? \\
How is the check performed in this alternative approach? \\
What are possibilities (e.g.\ data structures) to implement
this alternative?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Exercise 31 \quad\rm Closed and Maximal Item Sets:
Pruning}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
In the preceding exercises we considered how one can ascertain
whether an item set is closed (or maximal) and thus how we can
avoid reporting non-closed (or non-maximal) item sets. Is this
simple filtering the only advantage? Or can we simplify/prune
the search with the help of these methods as well? If yes, how?
\item[b)\hfill]
How is perfect extension pruning applied in the search for
closed item sets? \\ Why is it guaranteed that we do not lose
closed item sets due to such pruning?
\item[c)\hfill]
What is meant by ``head union tail pruning'' for finding
maximal item sets \\ (two variants)?
\end{enumerate}
%-----------------------------------------------------------------------
\subsubsection*{Additional Exercise \quad\rm Alternatives to Support}
\begin{enumerate}\itemsep0pt
\item[a)\hfill]
Find some measure(s) other than support
on the partial order $(2^B,\subseteq)$ \\
that is/are anti-monotone (or downward closed)!
\item[b)\hfill]
We define the {\em carrier\/}~$L_T(I)$ of an item set~$I$
w.r.t.\ a transaction database~$T$ as
\begin{eqnarray*}
L_T(I) & = & \{ k \in \{1,\ldots,n\} \mid
I \cap t_k \neq \emptyset \} \\
& = & \{ k \in \{1,\ldots,n\} \mid
\exists i \in I\!: i \in t_k \} \\
& = & \textstyle \bigcup_{i \in I} K_T(\{i\}).
\end{eqnarray*}
The {\em extent\/}~$r_T(I)$ of an item set~$I$ w.r.t.\ a
transaction database~$T$ is the size of its carrier,
that is, $r_T(I) = |L_T(I)|$. Is the extent anti-monotone
(or downward closed)? If not, what behavior does it exhibit?
\item[c)\hfill]
Could one combine support and extent to obtain a more meaningful
measure for the association of the items in an item set?
(Hint: Look up the measure called {\em Jaccard index\/}
and consider how it could be generalized.)
\end{enumerate}
%-----------------------------------------------------------------------
\end{document}