词条 | Cluster-weighted modeling |
释义 |
In data mining, cluster-weighted modeling (CWM) is an algorithm-based approach to non-linear prediction of outputs (dependent variables) from inputs (independent variables) based on density estimation using a set of models (clusters) that are each notionally appropriate in a sub-region of the input space. The overall approach works in jointly input-output space and an initial version was proposed by Neil Gershenfeld.[1][2] Basic form of modelThe procedure for cluster-weighted modeling of an input-output problem can be outlined as follows.[2] In order to construct predicted values for an output variable y from an input variable x, the modeling and calibration procedure arrives at a joint probability density function, p(y,x). Here the "variables" might be uni-variate, multivariate or time-series. For convenience, any model parameters are not indicated in the notation here and several different treatments of these are possible, including setting them to fixed values as a step in the calibration or treating them using a Bayesian analysis. The required predicted values are obtained by constructing the conditional probability density p(y|x) from which the prediction using the conditional expected value can be obtained, with the conditional variance providing an indication of uncertainty. The important step of the modeling is that p(y|x) is assumed to take the following form, as a mixture model: where n is the number of clusters and {wj} are weights that sum to one. The functions pj(y,x) are joint probability density functions that relate to each of the n clusters. These functions are modeled using a decomposition into a conditional and a marginal density: where:
In the same way as for regression analysis, it will be important to consider preliminary data transformations as part of the overall modeling strategy if the core components of the model are to be simple regression models for the cluster-wise condition densities, and normal distributions for the cluster-weighting densities pj(x). General versionsThe basic CWM algorithm gives a single output cluster for each input cluster. However, CWM can be extended to multiple clusters which are still associated with the same input cluster.[3] Each cluster in CWM is localized to a Gaussian input region, and this contains its own trainable local model.[4] It is recognized as a versatile inference algorithm which provides simplicity, generality, and flexibility; even when a feedforward layered network might be preferred, it is sometimes used as a "second opinion" on the nature of the training problem.[6] The original form proposed by Gershenfeld describes two innovations:
CWM can be used to classify media in printer applications, using at least two parameters to generate an output that has a joint dependency on the input parameters.[6] References1. ^{{cite journal | last1 = Gershenfeld | first1 = N | year = 1997 | title = Nonlinear Inference and Cluster-Weighted Modeling | url = | journal = Annals of the New York Academy of Sciences | volume = 808 | issue = | pages = 18–24 | doi = 10.1111/j.1749-6632.1997.tb51651.x }} 2. ^1 {{cite journal | last1 = Gershenfeld | first1 = N. | last2 = Schoner | last3 = Metois | first3 = E. | year = 1999 | title = Cluster-weighted modelling for time-series analysis | url = http://www.nature.com/nature/journal/v397/n6717/pdf/397329a0.pdf | journal = Nature | volume = 397 | issue = 6717| pages = 329–332 | doi=10.1038/16873}} 3. ^{{cite journal|last=Feldkamp|first=L.A.|author2=Prokhorov, D.V. |author3=Feldkamp, T.M. |date=2001|title=Cluster-weighted modeling with multiclusters|journal=International Joint Conference on Neural Networks|volume=3|issue=1|pages=1710–1714|url=http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/7474/20319/00938419.pdf?temp=x}} 4. ^{{cite journal|last=Boyden|first=Edward S.|title=Tree-based Cluster Weighted Modeling: Towards A Massively Parallel Real-Time Digital Stradivarius|publisher=MIT Media Lab|location=Cambridge, MA|url=http://edboyden.org/violin.pdf}} 5. ^1 {{cite journal|last=Prokhorov|first=A New Approach to Cluster-Weighted Modeling Danil V.|author2=Lee A. Feldkamp |author3=Timothy M. Feldkamp |title=A New Approach to Cluster-Weighted Modeling|publisher=Ford Research Laboratory|location=Dearborn, MI|url=http://home.comcast.net/~dvp/cwm.pdf}} 6. ^{{cite journal|last=Gao|first=Jun|author2=Ross R. Allen|date=2003-07-24|title=CLUSTER-WEIGHTED MODELING FOR MEDIA CLASSIFICATION|publisher=World Intellectual Property Organization|location=Palo Alto, CA |url=http://www.wipo.int/pctdb/en/wo.jsp?wo=2003059630}} 3 : Multivariate statistics|Cluster analysis algorithms|Estimation of densities |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。