Hiding Co-Occurring Frequent Itemsets
Authors
- Osman Abul, TOBB University of Economics and Technology, Ankara (Turkey)
Abstract
Knowledge hiding, hiding rules/patterns that are inferable from
published data and attributed sensitive, is extensively studied in the
literature in the context of frequent itemsets and association rules
mining from transactional data. The research in this thread is focused
mainly on developing sophisticated methods that achieve less
distortion in data quality. With this work, we extend frequent itemset
hiding to co-occurring frequent itemset hiding problem. Co-occurring
frequent itemsets are those itemsets that co-exist in the
output of frequent itemset mining. What is different from the classical
frequent hiding is the new sensitivity definition: an itemset set is
sensitive if its itemsets appear altogether within the frequent itemset
mining results. In other words, co-occurrence is defined with
reference to the mining results but not to the raw input dataset, and
thus it is a kind of meta-knowledge. Our notion of co-occurrence is
also very different from association rules as itemsets in an association
rule need to be frequently present in the same set of transactions,
but the co-occurrence need not necessarily require the joint
occurrence in the same set of transactions.
In this paper, we briefly review the frequent itemset/association
hiding problems and define the co-occurrence hiding along with
the real world motivations. We explore its fundamental properties
and show that frequent itemset hiding is a special case of the
co-occurring frequent itemsets hiding. As a solution, we propose
a two-stage sanitization framework, essentially a reduction, where
an instance of the frequent itemset hiding is constructed in the first
stage and the instance is solved in the second stage. Since the task
is shown to be NP-Hard and the reduction is one-to-many, we propose
heuristics only for the first stage as the second stage is a well-established
field. Finally, an experimental evaluation is carried out
on a couple of datasets, and the results are presented.
