A method for mining top-rank-k frequent closed itemsets

Mining frequent closed itemsets (FCIs) is important in mining non-redundant (minimal) association rules. Therefore, many algorithms have been developed for mining FCIs with reduced mining time and memory usage. For mining FCIs, algorithms use the minimum support threshold, minSup, to prune itemsets. However, using a fixed minSup is not suitable for mining top-rank-k FCIs. A large threshold will lead to a small number of generated FCIs, leading to insufficient FCIs to query when k is large. On the other hand, a small minSup will generate a huge number of generated FCIs, leading to large runtimes and high memory usage. In this paper, we propose a method for mining top-rank-k FCIs without using a fixed minimum support threshold. A strategy is first used to eliminate 1-items that cannot generate FCIs belonging to top-rank-k FCIs. Next, based on the set of candidate 1-items, we propose TRK-FCI, a DCI-Plus-based algorithm, for mining top-rank-k FCIs. In the process of mining top-rank-k FCIs, TRK-FCI automatically increases minSup according to the mined FCIs, efficiently pruning itemsets that cannot belong to top-rank-k FCIs. We also modify the dynamic bit vector (DBV) structure and apply it to reduce memory usage and runtime in the TRK-FCI-DBV algorithm. Experimental results show that TRK-FCI-DBV is more efficient than TRK-FCI for various databases.

Title: A method for mining top-rank-k frequent closed itemsets
Authors: Nguyen, Loan T. T.
Trinh, Truc
Nguyen Ngoc Thanh
Keywords: DCI-Plus;dynamic bit vectors,frequent closed itemsets;top-rank-k frequent closed itemsets
Issue Date: 2017
Publisher: IOS PRESS, NIEUWE HEMWEG 6B, 1013 BG AMSTERDAM, NETHERLANDS
Citation: ISIKNOWLEDGE
Abstract: Mining frequent closed itemsets (FCIs) is important in mining non-redundant (minimal) association rules. Therefore, many algorithms have been developed for mining FCIs with reduced mining time and memory usage. For mining FCIs, algorithms use the minimum support threshold, minSup, to prune itemsets. However, using a fixed minSup is not suitable for mining top-rank-k FCIs. A large threshold will lead to a small number of generated FCIs, leading to insufficient FCIs to query when k is large. On the other hand, a small minSup will generate a huge number of generated FCIs, leading to large runtimes and high memory usage. In this paper, we propose a method for mining top-rank-k FCIs without using a fixed minimum support threshold. A strategy is first used to eliminate 1-items that cannot generate FCIs belonging to top-rank-k FCIs. Next, based on the set of candidate 1-items, we propose TRK-FCI, a DCI-Plus-based algorithm, for mining top-rank-k FCIs. In the process of mining top-rank-k FCIs, TRK-FCI automatically increases minSup according to the mined FCIs, efficiently pruning itemsets that cannot belong to top-rank-k FCIs. We also modify the dynamic bit vector (DBV) structure and apply it to reduce memory usage and runtime in the TRK-FCI-DBV algorithm. Experimental results show that TRK-FCI-DBV is more efficient than TRK-FCI for various databases.
Description: TNS07012 ; JOURNAL OF INTELLIGENT & FUZZY SYSTEMS Volume: 32 Issue: 2 Pages: 1297-1305 Published: 2017
URI: http://repository.vnu.edu.vn/handle/VNU_123/28816
ISSN: 1064-1246
1875-8967
Appears in Collections:Bài báo của ĐHQGHN trong Web of Science

Nhận xét

Bài đăng phổ biến từ blog này

Đá Silic

Xây dựng mạng lưới quan trắc môi trường nước, bùn đáy tại thượng nguồn hệ thống sông Hồng : Luận văn ThS. Khoa học môi trường và bảo vệ môi trường: 60 44 03

Finding semantic similarity in Vietnamese