词条 | Frequent subtree mining |
释义 |
In computer science, frequent subtree mining is the problem of finding all patterns in a given database whose support (a metric related to its number of occurrences in other subtrees) is over a given threshold.[1] It is a more general form of the maximum agreement subtree problem.[2] DefinitionFrequent subtree mining is the problem of trying to find all of the patterns whose "support" is over a certain user-specified level, where "support" is calculated as the number of trees in a database which have at least one subtree isomorphic to a given pattern.[3] Formal definitionThe problem of frequent subtree mining has been formally defined as:[1] Given a threshold minfreq, a class of trees , a transitive subtree relation between trees , a finite set of trees , the frequent subtree mining problem is the problem of finding all trees such that no two trees in are isomorphic and where {{mvar|d}} is an anti-monotone function such that if then AlgorithmsIn 2002, Mohammed J. Zaki introduced TreeMiner, an efficient algorithm for solving the frequent subtree mining problem, which used a "scope list" to represent tree nodes and which was contrasted with PatternMatcher, an algorithm based on pattern matching.[4] ApplicationsDomains in which frequent subtree mining is useful tend to involve complex relationships between data entities: for instance, the analysis of XML documents often requires frequent subtree mining.[1] Another domain where this is useful is the web usage mining problem: since the actions taken by users when visiting a web site can be recorded and categorized in many different ways, complex databases of trees need to be analyzed with frequent subtree mining.[4] Other domains in which frequent subtree mining is useful include computational biology,[5][6] RNA structure analysis,[6] pattern recognition,[11] bioinformatics,[7] and analysis of the KEGG GLYCAN database.[8] ChallengesChecking whether a pattern (or a transaction) supports a given subgraph is an NP-complete problem, since it is an NP-complete instance of the subgraph isomorphism problem.[9] Furthermore, due to combinatorial explosion, according to Lei et al., "mining all frequent subtree patterns becomes infeasible for a large and dense tree database".[10] References1. ^1 2 {{cite journal|last1=Chi|first1=Yun|last2=Muntz|first2=Richard R.|last3=Nijssen|first3=Siegfried|last4=Kok|first4=Joost N.|title=Frequent Subtree Mining - An Overview|journal=Fundamenta Informaticae|date=28 June 2005|volume=66|pages=161–198|url=http://iospress.metapress.com/content/h18hd7r56yjk6jyb/|accessdate=3 July 2014}} 2. ^{{Cite journal|title = EvoMiner: frequent subtree mining in phylogenetic databases|last = Deepak|first = Akshay|date = July 2013|journal = Knowledge and Information Systems|volume = 41|issue = 3|pages = 559–590|doi = 10.1007/s10115-013-0676-0|last2 = Fernández-Baca|first2 = David|last3 = Tirthapura|first3 = Srikanta|last4 = Sanderson|first4 = Michael J.|last5 = McMahon|first5 = Michelle M.|url = http://lib.dr.iastate.edu/cs_techreports/11}} 3. ^Dai, H., Srikant, R. and Zhang, C. (2004). "[https://link.springer.com/book/10.1007%2Fb97861 Advances in Knowledge Discovery and Data Mining.]" 8th Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, May 26–28, 2004, Proceedings. 1st ed. p. 65. 4. ^1 {{Cite book|url = //dl.acm.org/citation.cfm?id=775058|title = Efficiently mining frequent trees in a forest|last = Zaki|first = Mohammed J.|journal = Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|accessdate = 16 June 2014|year = 2002|pages = 71–80|doi = 10.1145/775047.775058|isbn = 978-1581135671}} 5. ^Deepak, Akshay, David Fernández-Baca, Srikanta Tirthapura, Michael J. Sanderson, and Michelle M. McMahon. "[https://link.springer.com/article/10.1007/s10115-013-0676-0 EvoMiner: frequent subtree mining in phylogenetic databases]." Knowledge and Information Systems (2011): 1-32. 6. ^1 Chi, Yun, Yirong Yang, and Richard R. Muntz. "[https://link.springer.com/article/10.1007/s10115-004-0180-7 Canonical forms for labelled trees and their applications in frequent subtree mining]." Knowledge and Information Systems 8, no. 2 (2005): 203–234. 7. ^{{Cite conference |last1=Xiao |first1=Yongqiao |last2=Yao |first2=Jenq-Foung |last3=Li |first3=Zhigang |last4=Dunham |first4=Margaret H. |date=2003 |title=Efficient data mining for maximal frequent subtrees |book-title=Third IEEE International Conference on Data Mining |conference=ICDM 2003 |pages=379–386 |doi=10.1109/ICDM.2003.1250943}} 8. ^{{Cite book|title = Glycome Informatics: Methods and Applications|last = Aoki-Kinoshita|first = Kiyoko F.|publisher = CRC Press|year = 2009|isbn = 9781420083347|location = |pages = 141}} 9. ^1 {{Cite journal|url = http://www.cs.ucla.edu/tech-report/2003-reports/030043.pdf|title = Mining Frequent Rooted Trees and Free Trees Using Canonical Forms|last = Chi|first = Yun|journal = Knowledge and Information Systems|accessdate = 16 June 2014|year = 2004|last2 = Yang|first2 = Yirong|last3 = Muntz|first3 = Richard R.}} 10. ^{{cite conference |last1=Zou |first1=Lei |last2=Lu |first2=Yansheng |last3=Zhang |first3=Huaming |last4=Hu |first4=Rong |date=2006 |title=Mining Frequent Induced Subtree Patterns with Subtree-Constraint |book-title=Sixth IEEE International Conference on Data Mining Workshops |conference=ICDM Workshops 2006 |pages=3–7 |doi=10.1109/ICDMW.2006.112}} 1 : Computational problems in graph theory |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。