“Chou's invariance theorem”的意思、由来-开放百科全书

Chou's invariance theorem, named after Kuo-Chen Chou, was developed to address a problem raised in bioinformatics and cheminformatics related to multivariate statistics. Where a distance that would, in standard statistical theory, be defined as a Mahalanobis distance cannot be defined in this way because the relevant covariance matrix is singular. One effective approach to solve this problem would be to reduce the dimension of the multivariate space until the relevant covariance matrix is invertible or well defined. This can be achievable by simply omitting one or more of the original components until the matrix concerned is no longer singular. Chou's invariance theorem says that it does not matter which of the components or coordinates are selected for removal because exactly the same final value would be obtained.

Background

When using Mahalanobis distance or covariant discriminant to calculate the similarity of two proteins based on their amino acid compositions, to avoid the divergence problem due to the normalization condition imposed to their 20 constituent components, a dimension-reduced operation is needed by leaving out one of the 20 components and making the remaining 19 components completely independent. However, which one of the 20 components should be removed? Will the result be different by removing a different component? The same problems also occur when the calculation is based on (20 + λ)-D (dimensional) pseudo amino acid composition, where λ is an integer.

Generally speaking, to calculate the Mahalanobis distance or covariant discriminant between two vectors each with Ω normalized components, the dimension-reduced operation is needed and hence the aforementioned problems are always to occur. To address these problems, the Chou's Invariance Theorem was developed in 1995.

Essence

According to the Chou’s invariance theorem, the outcome of the Mahalanobis distance or covariant discriminant will remain the same regardless of which one of the components is left out. Accordingly, any one of the constituent normalized components can be left out to overcome the divergence problem without changing the final result for Mahalanobis distance or covariant discriminant.

Proof

The rigorous mathematical proof for the theorem was given in the appendix of a paper by Chou, ^[2] or appendix E of a review paper by Chou and Zhang ^[3]

Applications

The theorem has been used in predicting protein subcellular localization,^[4] identifying apoptosis protein subcellular location,^[5] predicting protein structural classification,^[6]^[7] as well as identifying various other important attributes for proteins.

References

1. ^¹{{cite journal | author = Chou KC | title = A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space | journal = Proteins | volume = 21 | issue = 4 | pages = 319–44 |date=April 1995 | pmid = 7567954 | doi = 10.1002/prot.340210406 }}
2. ^¹ Chou, K.C.& Zhang, C.T. Review: Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology, 1995, 30, 275-349. https://dx.doi.org/10.3109/10409239509083488
3. ^¹{{cite journal |vauthors=Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang ZD, He L | title = Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach | journal = J. Protein Chem. | volume = 22 | issue = 4 | pages = 395–402 |date=May 2003 | pmid = 13678304 | doi =10.1023/A:1025350409648 }}
4. ^¹{{cite journal |vauthors=Zhou GP, Doctor K | title = Subcellular location prediction of apoptosis proteins | journal = Proteins | volume = 50 | issue = 1 | pages = 44–8 |date=January 2003 | pmid = 12471598 | doi = 10.1002/prot.10251 | url = }}
5. ^¹{{cite journal | author = Zhou GP | title = An intriguing controversy over protein structural class prediction | journal = J. Protein Chem. | volume = 17 | issue = 8 | pages = 729–38 |date=November 1998 | pmid = 9988519 | doi =10.1023/A:1020713915365 }}
6. ^¹{{cite journal |vauthors=Zhou GP, Assa-Munt N | title = Some insights into protein structural class prediction | journal = Proteins | volume = 44 | issue = 1 | pages = 57–9 |date=July 2001 | pmid = 11354006 | doi =10.1002/prot.1071 }}