“Frequent subtree mining”的意思、由来-开放百科全书

In computer science, frequent subtree mining is the problem of finding all patterns in a given database whose support (a metric related to its number of occurrences in other subtrees) is over a given threshold.^[1] It is a more general form of the maximum agreement subtree problem.^[2]

Definition

Frequent subtree mining is the problem of trying to find all of the patterns whose "support" is over a certain user-specified level, where "support" is calculated as the number of trees in a database which have at least one subtree isomorphic to a given pattern.^[3]

Formal definition

Algorithms

In 2002, Mohammed J. Zaki introduced TreeMiner, an efficient algorithm for solving the frequent subtree mining problem, which used a "scope list" to represent tree nodes and which was contrasted with PatternMatcher, an algorithm based on pattern matching.^[4]

Applications

Domains in which frequent subtree mining is useful tend to involve complex relationships between data entities: for instance, the analysis of XML documents often requires frequent subtree mining.^[1] Another domain where this is useful is the web usage mining problem: since the actions taken by users when visiting a web site can be recorded and categorized in many different ways, complex databases of trees need to be analyzed with frequent subtree mining.^[4] Other domains in which frequent subtree mining is useful include computational biology,^[5]^[6] RNA structure analysis,^[6] pattern recognition,^[11] bioinformatics,^[7] and analysis of the KEGG GLYCAN database.^[8]

Challenges

Checking whether a pattern (or a transaction) supports a given subgraph is an NP-complete problem, since it is an NP-complete instance of the subgraph isomorphism problem.^[9] Furthermore, due to combinatorial explosion, according to Lei et al., "mining all frequent subtree patterns becomes infeasible for a large and dense tree database".^[10]

References

1. ^¹²{{cite journal|last1=Chi|first1=Yun|last2=Muntz|first2=Richard R.|last3=Nijssen|first3=Siegfried|last4=Kok|first4=Joost N.|title=Frequent Subtree Mining - An Overview|journal=Fundamenta Informaticae|date=28 June 2005|volume=66|pages=161–198|url=http://iospress.metapress.com/content/h18hd7r56yjk6jyb/|accessdate=3 July 2014}}
2. ^{{Cite journal|title = EvoMiner: frequent subtree mining in phylogenetic databases|last = Deepak|first = Akshay|date = July 2013|journal = Knowledge and Information Systems|volume = 41|issue = 3|pages = 559–590|doi = 10.1007/s10115-013-0676-0|last2 = Fernández-Baca|first2 = David|last3 = Tirthapura|first3 = Srikanta|last4 = Sanderson|first4 = Michael J.|last5 = McMahon|first5 = Michelle M.|url = http://lib.dr.iastate.edu/cs_techreports/11}}
3. ^Dai, H., Srikant, R. and Zhang, C. (2004). "[https://link.springer.com/book/10.1007%2Fb97861 Advances in Knowledge Discovery and Data Mining.]" 8th Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, May 26–28, 2004, Proceedings. 1st ed. p. 65.
4. ^¹{{Cite book|url = //dl.acm.org/citation.cfm?id=775058|title = Efficiently mining frequent trees in a forest|last = Zaki|first = Mohammed J.|journal = Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|accessdate = 16 June 2014|year = 2002|pages = 71–80|doi = 10.1145/775047.775058|isbn = 978-1581135671}}
5. ^Deepak, Akshay, David Fernández-Baca, Srikanta Tirthapura, Michael J. Sanderson, and Michelle M. McMahon. "[https://link.springer.com/article/10.1007/s10115-013-0676-0 EvoMiner: frequent subtree mining in phylogenetic databases]." Knowledge and Information Systems (2011): 1-32.
6. ^¹Chi, Yun, Yirong Yang, and Richard R. Muntz. "[https://link.springer.com/article/10.1007/s10115-004-0180-7 Canonical forms for labelled trees and their applications in frequent subtree mining]." Knowledge and Information Systems 8, no. 2 (2005): 203–234.
7. ^{{Cite conference |last1=Xiao |first1=Yongqiao |last2=Yao |first2=Jenq-Foung |last3=Li |first3=Zhigang |last4=Dunham |first4=Margaret H. |date=2003 |title=Efficient data mining for maximal frequent subtrees |book-title=Third IEEE International Conference on Data Mining |conference=ICDM 2003 |pages=379–386 |doi=10.1109/ICDM.2003.1250943}}
8. ^{{Cite book|title = Glycome Informatics: Methods and Applications|last = Aoki-Kinoshita|first = Kiyoko F.|publisher = CRC Press|year = 2009|isbn = 9781420083347|location = |pages = 141}}
9. ^¹{{Cite journal|url = http://www.cs.ucla.edu/tech-report/2003-reports/030043.pdf|title = Mining Frequent Rooted Trees and Free Trees Using Canonical Forms|last = Chi|first = Yun|journal = Knowledge and Information Systems|accessdate = 16 June 2014|year = 2004|last2 = Yang|first2 = Yirong|last3 = Muntz|first3 = Richard R.}}
10. ^{{cite conference |last1=Zou |first1=Lei |last2=Lu |first2=Yansheng |last3=Zhang |first3=Huaming |last4=Hu |first4=Rong |date=2006 |title=Mining Frequent Induced Subtree Patterns with Subtree-Constraint |book-title=Sixth IEEE International Conference on Data Mining Workshops |conference=ICDM Workshops 2006 |pages=3–7 |doi=10.1109/ICDMW.2006.112}}