“Manifold regularization”的意思、由来-开放百科全书

In machine learning, Manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition.

Manifold regularizer

Motivation

Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions. In particular, manifold regularization extends the technique of Tikhonov regularization as applied to Reproducing kernel Hilbert spaces (RKHSs). Under standard Tikhonov regularization on RKHSs, a learning algorithm attempts to learn a function

from among a hypothesis space of functions

. The hypothesis space is an RKHS, meaning that it is associated with a kernel

, and so every candidate function

has a norm

, which represents the complexity of the candidate function in the hypothesis space. When the algorithm considers a candidate function, it takes its norm into account in order to penalize complex functions.

Formally, given a set of labeled training data

with

and a loss function

, a learning algorithm using Tikhonov regularization will attempt to solve the expression

where

is a hyperparameter that controls how much the algorithm will prefer simpler functions to functions that fit the data better.

Manifold regularization adds a second regularization term, the intrinsic regularizer, to the ambient regularizer used in standard Tikhonov regularization. Under the manifold assumption in machine learning, the data in question do not come from the entire input space

, but instead from a nonlinear manifold

. The geometry of this manifold, the intrinsic space, is used to determine the regularization norm.^[1]

Laplacian norm

There are many possible choices for

. Many natural choices involve the gradient on the manifold

, which can provide a measure of how smooth a target function is. A smooth function should change slowly where the input data are dense; that is, the gradient

should be small where the marginal probability density

, the probability density of a randomly drawn data point appearing at

, is large. This gives one appropriate choice for the intrinsic regularizer:

In practice, this norm cannot be computed directly because the marginal distribution

is unknown, but it can be estimated from the provided data. In particular, if the distances between input points are interpreted as a graph, then the Laplacian matrix of the graph can help to estimate the marginal distribution. Suppose that the input data include

labeled examples (pairs of an input

and a label

) and

unlabeled examples (inputs without associated labels). Define

to be a matrix of edge weights for a graph, where

is a distance measure between the data points

and

. Define

to be a diagonal matrix with

and

to be the Laplacian matrix

. Then, as the number of data points

increases,

converges to the Laplace-Beltrami operator

, which is the divergence of the gradient

.^[2]^[3] Then, if

is a vector of the values of

at the data,

, the intrinsic norm can be estimated:

As the number of data points

increases, this empirical definition of

converges to the definition when

is known.^[1]

Solving the regularization problem

Using the weights

and

for the ambient and intrinsic regularizers, the final expression to be solved becomes:

As with other kernel methods,

may be an infinite-dimensional space, so if the regularization expression cannot be solved explicitly, it is impossible to search the entire space for a solution. Instead, a representer theorem shows that under certain conditions on the choice of the norm

, the optimal solution

must be a linear combination of the kernel centered at each of the input points: for some weights

Using this result, it is possible to search for the optimal solution

by searching the finite-dimensional space defined by the possible choices of

.^[1]

Applications

Manifold regularization can extend a variety of algorithms that can be expressed using Tikhonov regularization, by choosing an appropriate loss function

and hypothesis space

. Two commonly used examples are the families of support vector machines and regularized least squares algorithms. (Regularized least squares includes the ridge regression algorithm; the related algorithms of LASSO and elastic net regularization can be expressed as support vector machines.^[4]^[5]) The extended versions of these algorithms are called Laplacian Regularized Least Squares (abbreviated LapRLS) and Laplacian Support Vector Machines (LapSVM), respectively.^[1]

Laplacian Regularized Least Squares (LapRLS)

Regularized least squares (RLS) is a family of regression algorithms: algorithms that predict a value

for its inputs

, with the goal that the predicted values should be close to the true labels for the data. In particular, RLS is designed to minimize the mean squared error between the predicted values and the true labels, subject to regularization. Ridge regression is one form of RLS; in general, RLS is the same as ridge regression combined with the kernel method.{{Citation needed|reason=Kernel ridge regression can be seen to have the same form as RLS in a general RKHS, but it is difficult to find a source that discusses the connection in detail.|date=December 2015}} The problem statement for RLS results from choosing the loss function

in Tikhonov regularization to be the mean squared error:

Thanks to the representer theorem, the solution can be written as a weighted sum of the kernel evaluated at the data points:

where

is defined to be the kernel matrix, with

, and

is the vector of data labels.

Adding a Laplacian term for manifold regularization gives the Laplacian RLS statement:

and this yields an expression for the vector

. Letting

be the kernel matrix as above,

be the vector of data labels, and

be the

block matrix

Laplacian Support Vector Machines (LapSVM)

Support vector machines (SVMs) are a family of algorithms often used for classifying data into two or more groups, or classes. Intuitively, an SVM draws a boundary between classes so that the closest labeled examples to the boundary are as far away as possible. This can be directly expressed as a linear program, but it is also equivalent to Tikhonov regularization with the hinge loss function,

Adding the intrinsic regularization term to this expression gives the LapSVM problem statement:

Again, the representer theorem allows the solution to be expressed in terms of the kernel evaluated at the data points:

can be found by writing the problem as a linear program and solving the dual problem. Again letting

be the kernel matrix and

be the block matrix

, the solution can be shown to be

LapSVM has been applied to problems including geographical imaging,^[16]^[17]^[18]

Limitations

Software

See also

References

1. ^¹²³⁴⁵{{Cite journal| volume = 7| pages = 2399–2434| last1 = Belkin| first1 = Mikhail| last2 = Niyogi| first2 = Partha| last3 = Sindhwani| first3 = Vikas| title = Manifold regularization: A geometric framework for learning from labeled and unlabeled examples| journal = The Journal of Machine Learning Research| accessdate = 2015-12-02| date = 2006| url = http://dl.acm.org/citation.cfm?id=1248632}}
2. ^{{Cite book| publisher = Springer| pages = 470–485| last1 = Hein| first1 = Matthias| last2 = Audibert| first2 = Jean-Yves| last3 = Von Luxburg| first3 = Ulrike| title = Learning theory| volume = 3559| chapter = From graphs to manifolds–weak and strong pointwise consistency of graph laplacians| date = 2005| doi = 10.1007/11503415_32| series = Lecture Notes in Computer Science| isbn = 978-3-540-26556-6| citeseerx = 10.1.1.103.82}}
3. ^{{Cite book| publisher = Springer| pages = 486–500| last1 = Belkin| first1 = Mikhail| last2 = Niyogi| first2 = Partha| title = Learning theory| volume = 3559| chapter = Towards a theoretical foundation for Laplacian-based manifold methods| date = 2005| doi = 10.1007/11503415_33| series = Lecture Notes in Computer Science| isbn = 978-3-540-26556-6| citeseerx = 10.1.1.127.795}}
4. ^{{cite book|title=An Equivalence between the Lasso and Support Vector Machines|last=Jaggi|first=Martin|editor-last1=Suykens|editor-first1=Johan|editor-last2=Signoretto|editor-first2=Marco|editor-last3=Argyriou|editor-first3=Andreas|year=2014|publisher=Chapman and Hall/CRC}}
5. ^{{cite conference|last1=Zhou|first1=Quan|last2=Chen|first2=Wenlin|last3=Song|first3=Shiji|last4=Gardner|first4=Jacob|last5=Weinberger|first5=Kilian|last6=Chen|first6=Yixin|title=A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing|url=https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9856|conference=Association for the Advancement of Artificial Intelligence}}
6. ^{{Cite conference| publisher = Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999| volume = 21| pages = 988| last1 = Pan| first1 = Jeffrey Junfeng| last2 = Yang| first2 = Qiang| last3 = Chang| first3 = Hong| last4 = Yeung| first4 = Dit-Yan| title = A manifold regularization approach to calibration reduction for sensor-network based tracking| booktitle = Proceedings of the national conference on artificial intelligence| accessdate = 2015-12-02| date = 2006| url = http://www.aaai.org/Papers/AAAI/2006/AAAI06-155.pdf}}
7. ^{{Cite conference| publisher = IEEE| pages = 1628–1631| last1 = Zhang| first1 = Daoqiang| last2 = Shen| first2 = Dinggang| title = Semi-supervised multimodal classification of Alzheimer's disease| booktitle = Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on| accessdate = 2015-12-15| date = 2011| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5872715}}
8. ^{{Cite book| publisher = Springer| pages = 264–271| last1 = Park| first1 = Sang Hyun| last2 = Gao| first2 = Yaozong| last3 = Shi| first3 = Yinghuan| last4 = Shen| first4 = Dinggang| title = Machine Learning in Medical Imaging| volume = 8679| chapter = Interactive Prostate Segmentation Based on Adaptive Feature Selection and Manifold Regularization| date = 2014| doi = 10.1007/978-3-319-10581-9_33| series = Lecture Notes in Computer Science| isbn = 978-3-319-10580-2}}
9. ^{{Cite journal| last = Pillai| first = Sudeep| title = Semi-supervised Object Detector Learning from Minimal Labels| accessdate = 2015-12-15| url = http://people.csail.mit.edu/spillai/data/papers/ssl-cv-project-paper.pdf}}
10. ^{{Cite journal| volume = 11| issue = 1| pages = 416–419| last1 = Wan| first1 = Songjing| last2 = Wu| first2 = Di| last3 = Liu| first3 = Kangsheng| title = Semi-Supervised Machine Learning Algorithm in Near Infrared Spectral Calibration: A Case Study on Diesel Fuels| journal = Advanced Science Letters| accessdate = 2015-12-15| date = 2012| url = http://www.ingentaconnect.com/content/asp/asl/2012/00000011/00000001/art00076| doi=10.1166/asl.2012.3044}}
11. ^{{Cite journal| volume = 8| issue = 4| pages = 1011–1018| last1 = Wang| first1 = Ziqiang| last2 = Sun| first2 = Xia| last3 = Zhang| first3 = Lijie| last4 = Qian| first4 = Xu| title = Document Classification based on Optimal Laprls| journal = Journal of Software| accessdate = 2015-12-15| date = 2013| url = http://ojs.academypublisher.com/index.php/jsw/article/view/8009| doi=10.4304/jsw.8.4.1011-1018}}
12. ^{{Cite journal| volume = 4| issue = Suppl 2| pages = –6| last1 = Xia| first1 = Zheng| last2 = Wu| first2 = Ling-Yun| last3 = Zhou| first3 = Xiaobo| last4 = Wong| first4 = Stephen TC| title = Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces| journal = BMC Systems Biology| accessdate = 2015-12-15| date = 2010| url = http://www.biomedcentral.com/qc/1752-0509/4/S2/S6/}}
13. ^{{Cite conference| publisher = ACM| pages = 161–168| last1 = Cheng| first1 = Li| last2 = Vishwanathan| first2 = S. V. N.| title = Learning to compress images and videos| booktitle = Proceedings of the 24th international conference on Machine learning| accessdate = 2015-12-16| date = 2007| url = http://dl.acm.org/citation.cfm?id=1273517}}
14. ^{{Cite journal| volume = 48| issue = 1–3| pages = 115–136| last1 = Lin| first1 = Yi| last2 = Wahba| first2 = Grace| last3 = Zhang| first3 = Hao| last4 = Lee| first4 = Yoonkyung|author4-link= Yoonkyung Lee | title = Statistical properties and adaptive tuning of support vector machines| journal = Machine Learning| date = 2002| doi=10.1023/A:1013951620650}}
15. ^{{Cite journal| volume = 6| pages = 69–87| last1 = Wahba| first1 = Grace| last2 = others| title = Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV| journal = Advances in Kernel Methods-Support Vector Learning| date = 1999| citeseerx = 10.1.1.53.2114}}
16. ^{{Cite journal| volume = 48| issue = 11| pages = 4110–4121| last1 = Kim| first1 = Wonkook| last2 = Crawford| first2 = Melba M.| title = Adaptive classification for hyperspectral image data using manifold regularization kernel machines| journal = Geoscience and Remote Sensing, IEEE Transactions on| accessdate = 2015-12-02| date = 2010| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5599864}}
17. ^{{Cite journal| volume = 31| issue = 1| pages = 45–54| last1 = Camps-Valls| first1 = Gustavo| last2 = Tuia| first2 = Devis| last3 = Bruzzone| first3 = Lorenzo| last4 = Atli Benediktsson| first4 = Jon| title = Advances in hyperspectral image classification: Earth monitoring with statistical learning methods| journal = Signal Processing Magazine, IEEE| accessdate = 2015-12-16| date = 2014| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6678612| doi=10.1109/msp.2013.2279179| arxiv = 1310.5107| bibcode = 2014ISPM...31...45C}}
18. ^{{Cite conference| publisher = IEEE| pages = 1521–1524| last1 = Gómez-Chova| first1 = Luis| last2 = Camps-Valls| first2 = Gustavo| last3 = Muñoz-Marí| first3 = Jordi| last4 = Calpe| first4 = Javier| title = Semi-supervised cloud screening with Laplacian SVM| booktitle = Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International| accessdate = 2015-12-16| date = 2007| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4423098}}
19. ^{{Cite book| publisher = Springer| pages = 82–90| last1 = Cheng| first1 = Bo| last2 = Zhang| first2 = Daoqiang| last3 = Shen| first3 = Dinggang| title = Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012| volume = 7510| chapter = Domain transfer learning for MCI conversion prediction| date = 2012| doi = 10.1007/978-3-642-33415-3_11| series = Lecture Notes in Computer Science| isbn = 978-3-642-33414-6}}
20. ^{{Cite journal| volume = 37| issue = 8| pages = 4155–4172| last1 = Jamieson| first1 = Andrew R.| last2 = Giger| first2 = Maryellen L.| last3 = Drukker| first3 = Karen| last4 = Pesce| first4 = Lorenzo L.| title = Enhancement of breast CADx with unlabeled dataa)| journal = Medical Physics| date = 2010| doi=10.1118/1.3455704| pmid = 20879576| pmc = 2921421| bibcode = 2010MedPh..37.4155J}}
21. ^{{Cite journal| volume = 1| issue = 2| pages = 151–155| last1 = Wu| first1 = Jiang| last2 = Diao| first2 = Yuan-Bo| last3 = Li| first3 = Meng-Long| last4 = Fang| first4 = Ya-Ping| last5 = Ma| first5 = Dai-Chuan| title = A semi-supervised learning based method: Laplacian support vector machine used in diabetes disease diagnosis| journal = Interdisciplinary Sciences: Computational Life Sciences| date = 2009| doi=10.1007/s12539-009-0016-2| pmid = 20640829}}
22. ^{{Cite journal| volume = 4| issue = 17| last1 = Wang| first1 = Ziqiang| last2 = Zhou| first2 = Zhiqiang| last3 = Sun| first3 = Xia| last4 = Qian| first4 = Xu| last5 = Sun| first5 = Lijun| title = Enhanced LapSVM Algorithm for Face Recognition.| journal = International Journal of Advancements in Computing Technology| accessdate = 2015-12-16| date = 2012| url = http://search.ebscohost.com/login.aspx?direct=true&profile=ehost&scope=site&authtype=crawler&jrnl=20058039&AN=98908455&h=8QzzRizi2IKxCZ4EHJjzxbGY%2FQazcifd58fcAGEG17GiFk0wZE59DrEge0xfEGhXRqsBaMwuBNyenVSP6sjwsA%3D%3D&crl=c}}
23. ^{{Cite journal| volume = 38| issue = 8| pages = 10199–10204| last1 = Zhao| first1 = Xiukuan| last2 = Li| first2 = Min| last3 = Xu| first3 = Jinwu| last4 = Song| first4 = Gangbing| title = An effective procedure exploiting unlabeled data to build monitoring system| journal = Expert Systems with Applications| accessdate = 2015-12-16| date = 2011| url = http://www.sciencedirect.com/science/article/pii/S0957417411002843| doi=10.1016/j.eswa.2011.02.078}}
24. ^{{Cite journal| volume = 7| issue = 1| pages = 22–26| last1 = Zhong| first1 = Ji-Ying| last2 = Lei| first2 = Xu| last3 = Yao| first3 = D.| title = Semi-supervised learning based on manifold in BCI| journal = Journal of Electronics Science and Technology of China| accessdate = 2015-12-16| date = 2009| url = http://www.journal.uestc.edu.cn/archives/2009/1/7/22-2677907.pdf}}
25. ^{{Cite journal| last = Zhu| first = Xiaojin| title = Semi-supervised learning literature survey| date = 2005| citeseerx = 10.1.1.99.9681}}
26. ^{{Cite conference| publisher = ACM| pages = 976–983| last1 = Sindhwani| first1 = Vikas| last2 = Rosenberg| first2 = David S.| title = An RKHS for multi-view learning and manifold co-regularization| booktitle = Proceedings of the 25th international conference on Machine learning| accessdate = 2015-12-02| date = 2008| url = http://dl.acm.org/citation.cfm?id=1390279}}
27. ^{{Cite book| pages = 393–407| last1 = Goldberg| first1 = Andrew| last2 = Li| first2 = Ming| last3 = Zhu| first3 = Xiaojin| title = Online manifold regularization: A new learning setting and empirical study| journal = Machine Learning and Knowledge Discovery in Databases| volume = 5211| accessdate = 2015-12-02| date = 2008| url = http://www.springerlink.com/index/ln1805476103536p.pdf| doi = 10.1007/978-3-540-87479-9_44| series = Lecture Notes in Computer Science| isbn = 978-3-540-87478-2}}