“Classifier chains”的意思、由来-开放百科全书

Problem transformation methods transform a multi-label classification problem in one or more single-label classification problems.^[2] In such a way existing single-label classification algorithms such as SVM and Naive Bayes can be used without modification.

Several problem transformation methods exist. One of them is Binary Relevance method (BR). Given a set of labels

and a data set with instances of the form

where

is a feature vector and

is a set of labels assigned to the instance. BR transforms the data set into

data sets and learns

binary classifiers

for each label

. During this process the information about dependencies between labels is not preserved. This can lead to a situation where a set of labels is assigned to an instance although these labels never co-occur together in the data set. Thus, information about label co-occurrence can help to assign correct label combinations. Loss of this information can in some cases lead to decrease of the classification performance.^[3]

Other approach, which takes into account label correlations is Label Powerset method (LP). Each different combination of labels in a data set is considered to be a single label. After transformation a single-label classifier

is trained where

is the power set of all labels in

. The main drawback of this approach is that the number of label combinations grows exponentionally with the number of labels. For example, a multi-label data set with 10 labels can have up to

label combinations. This increases the run-time of classification.

Classifier Chains method is based on the BR method and it is efficient even on a big number of labels. Furthermore, it considers dependencies between labels.

Method description

For a given a set of labels

Classifier Chain model (CC) learns

classifiers as in Binary Relevance method. All classifiers are linked in a chain through feature space.

Given a data set where

-th instance has the form

where

is a subset of labels,

is a set of features. The data set is transformed in

data sets where instances of the

-th data set has the form

. If the

-th label was assigned to the instance then

, otherwise it is

. Thus, classifiers build a chain where each of them learns binary classification of a single label. The features given to each classifier are extended with binary values that indicate which of previous labels were assigned to the instance.

By classifying new instances the labels are again predicted by building a chain of classifiers. The classification begins with first classifier

and proceeds to the last one

by passing label information between classifiers through the feature space. Hence, the inter-label dependency is preserved. However, the result can vary for different order of chains. For example, if a label often co-occur with some other label only instances of one of the labels, which comes later in the label order, will have information about other one in its feature vector. In order to solve this problem and increase accuracy it is possible to use ensemble of classifiers.^[4]

In Ensemble of Classifier Chains (ECC) several CC classifiers can be trained with random order of chains (i.e. random order of labels) on a random subset of data set. Labels of a new instance are predicted by each classifier separately. After that, the total number of predictions or "votes" is counted for each label. The label is accepted if it was predicted by a percentage of classifiers that is bigger than some threshold value.

Another extension of CC, related to ECC, is the Monte Carlo CC (MCC),^[5] which employs Monte Carlo methods for finding a good chain sequence and performing efficient inference. Other variants of CC, using different random search methods or considering different dependence structure of classifiers, have been proposed in literature.^[6]^[7]^[8]

References

1. ^{{cite journal|last=Read|first=Jesse|author2=Bernhard Pfahringer |author3=Geoff Holmes |author4=Eibe Frank |year=2009|title=Classifier Chains for Multi-label Classification|journal=Proc 13th European Conference on Principles and Practice of Knowledge Discovery in Databases and 20th European Conference on Machine Learning|volume=2009|url=http://www.cs.waikato.ac.nz/~ml/publications/2009/chains.pdf}}
2. ^{{cite journal|last=Tsoumakas|first=Grigorios|author2=Ioannis Katakis|year=2007|title=Multi-label classification: An overview|journal=Int J Data Warehousing and Mining|volume=2007|issue=3|pages=1–13|url=http://lpis.csd.auth.gr/publications/tsoumakas-ijdwm.pdf|doi=10.4018/jdwm.2007070101}}
3. ^{{cite journal|last=Dembczynski|first=Krzysztof|author2=Willem Waegeman |author3=Weiwei Cheng |author4=Eyke Hüllermeier |year=2010|title=On label dependence in multi-label classification|journal=Workshop Proceedings of Learning from Multi-Label Data|volume=2010|pages=5–12|url=http://www.mathematik.uni-marburg.de/~eyke/publications/mld10.pdf}}
4. ^{{cite journal|last=Rokach|first=Lior|year=2010|title=Ensemble-based classifiers|journal=Artif. Intell. Rev.|publisher=ACM|location=Norwell, MA, USA|pages=1–39|volume=33|issue=1–2|url=http://www.ise.bgu.ac.il/faculty/liorr/AI.pdf|doi=10.1007/s10462-009-9124-7}}
5. ^{{Cite journal|title = Efficient monte carlo methods for multi-dimensional learning with classifier chains|url = http://www.sciencedirect.com/science/article/pii/S0031320313004160|journal = Pattern Recognition|date = 2014-03-01|pages = 1535–1546|volume = 47|series = Handwriting Recognition and other PR Applications|issue = 3|doi = 10.1016/j.patcog.2013.10.006|first = Jesse|last = Read|first2 = Luca|last2 = Martino|first3 = David|last3 = Luengo|arxiv = 1211.2190}}
6. ^{{Cite journal|title = Simpler is Better: A Novel Genetic Algorithm to Induce Compact Multi-label Chain Classifiers|publisher = ACM|journal = Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation|date = 2015-01-01|location = New York, NY, USA|isbn = 978-1-4503-3472-3|pages = 559–566|series = GECCO '15|doi = 10.1145/2739480.2754650|first = Eduardo C.|last = Gonçalves|first2 = Alexandre|last2 = Plastino|first3 = Alex A.|last3 = Freitas}}
7. ^{{Cite journal|title = Scalable multi-output label prediction: From classifier chains to classifier trellises|url = http://www.sciencedirect.com/science/article/pii/S0031320315000084|journal = Pattern Recognition|date = 2015-06-01|pages = 2096–2109|volume = 48|issue = 6|doi = 10.1016/j.patcog.2015.01.004|first = Jesse|last = Read|first2 = Luca|last2 = Martino|first3 = Pablo M.|last3 = Olmos|first4 = David|last4 = Luengo|arxiv = 1501.04870}}
8. ^{{Cite journal|last=Soufan|first=Othman|last2=Ba-Alawi|first2=Wail|last3=Afeef|first3=Moataz|last4=Essack|first4=Magbubah|last5=Kalnis|first5=Panos|last6=Bajic|first6=Vladimir B.|date=2016-11-10|title=DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning|journal=Journal of Cheminformatics|volume=8|pages=64|doi=10.1186/s13321-016-0177-8|issn=1758-2946}}

Problem transformation

Method description

References

External links