请输入您要查询的百科知识:

 

词条 Jensen's inequality
释义

  1. Statements

     Finite form  Measure-theoretic and probabilistic form  General inequality in a probabilistic setting   A sharpened and generalized form  

  2. Proofs

     Proof 1 (finite form)  Proof 2 (measure-theoretic form)  Proof 3 (general inequality in a probabilistic setting) 

  3. Applications and special cases

     Form involving a probability density function  Example: even moments of a random variable  Alternative finite form  Statistical physics  Information theory  Rao–Blackwell theorem 

  4. See also

  5. Notes

  6. References

  7. External links

{{Use American English|date = March 2019}}{{Short description|Theorem of convex functions}}{{For|Jensen's inequality for analytic functions|Jensen's formula}}{{refimprove|date=October 2011}}

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proven by Jensen in 1906.[1] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for t ∈ [0,1]),

while the graph of the function is the convex function of the weighted means,

Thus, Jensen's inequality is

In the context of probability theory, it is generally stated in the following form: if X is a random variable and {{mvar|φ}} is a convex function, then

Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.

Finite form

For a real convex function , numbers in its domain, and positive weights , Jensen's inequality can be stated as:

and the inequality is reversed if is concave, which is

Equality holds if and only if or is linear.

As a particular case, if the weights are all equal, then (1) and (2) become

For instance, the function {{math|log(x)}} is concave, so substituting in the previous formula (4) establishes the (logarithm of the) familiar arithmetic mean-geometric mean inequality:

A common application has as a function of another variable (or set of variables) , that is, . All of this carries directly over to the general continuous case: the weights {{math|ai}} are replaced by a non-negative integrable function {{math| f (x)}}, such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic and probabilistic form

Let be a probability space, such that . If is a real-valued function that is -integrable, and if is a convex function on the real line, then:

In real analysis, we may require an estimate on

where , and is a non-negative Lebesgue-integrable function. In this case, the Lebesgue measure of need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get[2]

The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let be a probability space, X an integrable real-valued random variable and {{mvar|φ}} a convex function. Then:

In this probability setting, the measure {{mvar|μ}} is intended as a probability , the integral with respect to {{mvar|μ}} as an expected value , and the function as a random variable X.

Notice that the equality holds if X is constant (degenerate random variable) or if {{mvar|φ}} is linear, and even if there is (a Borel set, in fact) such that

and {{mvar|φ}} is a linear function over A (that is, there are such that ).

General inequality in a probabilistic setting

More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element in T, such that for any element z in the dual space of T: , and . Then, for any measurable convex function {{mvar|φ}} and any sub-σ-algebra of :

Here stands for the expectation conditioned to the σ-algebra . This general statement reduces to the previous ones when the topological vector space {{mvar|T}} is the real axis, and is the trivial {{mvar|σ}}-algebra {{math|{∅, Ω}.}}[3]

A sharpened and generalized form

Let X be a one-dimensional random variable with mean and variance . Let be a twice differentiable function, and define the function

Then[4]

In particular, when is convex, then , and the standard form of Jensen's inequality immediately follows.

Proofs

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where {{mvar|X}} is a real number (see figure). Assuming a hypothetical distribution of {{mvar|X}} values, one can immediately identify the position of and its image in the graph. Noticing that for convex mappings {{math|Y {{=}} φ(X)}} the corresponding distribution of {{mvar|Y}} values is increasingly "stretched out" for increasing values of {{mvar|X}}, it is easy to see that the distribution of {{mvar|Y}} is broader in the interval corresponding to {{math|X > X0}} and narrower in {{math|X < X0}} for any {{math|X0}}; in particular, this is also true for . Consequently, in this picture the expectation of {{mvar|Y}} will always shift upwards with respect to the position of . A similar reasoning holds if the distribution of {{mvar|X}} covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

with equality when {{math|φ(X)}} is not strictly convex, e.g. when it is a straight line, or when {{mvar|X}} follows a degenerate distribution (i.e. is a constant).

The proofs below formalize this intuitive notion.

Proof 1 (finite form)

If {{math|λ1}} and {{math|λ2}} are two arbitrary nonnegative real numbers such that {{math|λ1 + λ2 {{=}} 1}} then convexity of {{mvar|φ}} implies

This can be easily generalized: if {{math|λ1, ..., λn}} are nonnegative real numbers such that {{math|λ1 + ... + λn {{=}} 1}}, then

for any {{math|x1, ..., xn}}. This finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose it is true also for some n, one needs to prove it for {{math|n + 1}}. At least one of the {{math|λi}} is strictly positive, say {{math|λ1}}; therefore by convexity inequality:

Since

one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

where μn is a measure given by an arbitrary convex combination of Dirac deltas:

Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)

Let g be a real-valued μ-integrable function on a probability space Ω, and let {{mvar|φ}} be a convex function on the real numbers. Since {{mvar|φ}} is convex, at each real number {{mvar|x}} we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of {{mvar|φ}} at {{mvar|x}}, but which are at or below the graph of {{mvar|φ}} at all points (support lines of the graph).

Now, if we define

because of the existence of subderivatives for convex functions, we may choose a and b such that

for all real {{mvar|x}} and

But then we have that

for all {{mvar|x}}. Since we have a probability measure, the integral is monotone with {{math|μ(Ω) {{=}} 1}} so that

as desired.

Proof 3 (general inequality in a probabilistic setting)

Let X be an integrable random variable that takes values in a real topological vector space T. Since is convex, for any , the quantity

is decreasing as {{mvar|θ}} approaches 0+. In particular, the subdifferential of {{mvar|φ}} evaluated at {{mvar|x}} in the direction {{mvar|y}} is well-defined by

It is easily seen that the subdifferential is linear in {{mvar|y}} {{Citation needed|date=October 2013}} (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for {{math|θ {{=}} 1}}, one gets

In particular, for an arbitrary sub{mvar|σ}algebra we can evaluate the last inequality when to obtain

Now, if we take the expectation conditioned to on both sides of the previous expression, we get the result since:

by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation:

Applications and special cases

Form involving a probability density function

Suppose {{math|Ω}} is a measurable subset of the real line and f(x) is a non-negative function such that

In probabilistic language, f is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

If g is any real-valued measurable function and is convex over the range of g, then

If g(x) = x, then this form of the inequality reduces to a commonly used special case:

Example: even moments of a random variable

If g(x) = x2n, and X is a random variable, then g is convex as

and so

In particular, if some even moment 2n of X is finite, X has a finite mean. An extension of this argument shows X has finite moments of every order dividing n.

Alternative finite form

Let {{math|Ω {{=}} {x1, ... xn},}} and take {{mvar|μ}} to be the counting measure on {{math|Ω}}, then the general form reduces to a statement about sums:

provided that {{math|λi ≥ 0}} and

There is also an infinite discrete form.

Statistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

where the expected values are with respect to some probability distribution in the random variable {{mvar|X}}.

The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing

and then applying the inequality {{math|eX ≥ 1 + X}} to the final exponential.

Information theory

If {{math|p(x)}} is the true probability density for {{mvar|X}}, and {{math|q(x)}} is another density, then applying Jensen's inequality for the random variable {{math|Y(X) {{=}} q(X)/p(X)}} and the convex function {{math|φ(y) {{=}} −log(y)}} gives

Therefore:

a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is non-negative is called the Kullback–Leibler divergence of q from p.

Since {{math|−log(x)}} is a strictly convex function for {{math|x > 0}}, it follows that equality holds when {{math|p(x)}} equals {{math|q(x)}} almost everywhere.

Rao–Blackwell theorem

{{main article|Rao–Blackwell theorem}}

If L is a convex function and a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get

So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating

the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed.

This result is known as the Rao–Blackwell theorem.

See also

  • Karamata's inequality for a more general inequality
  • Popoviciu's inequality
  • Law of averages
  • A proof without words of Jensen's inequality

Notes

1. ^{{cite journal |last=Jensen |first=J. L. W. V. |authorlink=Johan Jensen (mathematician) |date=1906 |title=Sur les fonctions convexes et les inégalités entre les valeurs moyennes |journal=Acta Mathematica |volume=30 |issue=1 |pages=175–193 |doi=10.1007/BF02418571 }}
2. ^Niculescu, Constantin P. "Integral inequalities", P. 12.
3. ^Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 in {{cite journal |last=Perlman |first=Michael D. |title=Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space |journal=Journal of Multivariate Analysis |volume=4 |issue=1 |pages=52–65 |year=1974 |doi=10.1016/0047-259X(74)90005-0 }}
4. ^{{cite article | last = Liao | first = J. | last2 = Berg | first2 = A | year = 2018 | title = Sharpening Jensen's Inequality |journal=American Statistician | doi=10.1080/00031305.2017.1419145}}

References

  • {{cite book|author=David Chandler|title=Introduction to Modern Statistical Mechanics|publisher=Oxford|year=1987| isbn=0-19-504277-8 |authorlink=David Chandler (chemist)}}
  • Tristan Needham (1993) "A Visual Explanation of Jensen's Inequality", American Mathematical Monthly 100(8):768–71.
  • {{cite book | author= Nicola Fusco, Paolo Marcellini, Carlo Sbordone| title= Analisi Matematica Due | publisher= Liguori | year= 1996| isbn=978-88-207-2675-1}}
  • {{cite book|author=Walter Rudin|title=Real and Complex Analysis|publisher=McGraw-Hill|year=1987|isbn=0-07-054234-1| authorlink=Walter Rudin}}

External links

  • [https://arxiv.org/abs/math/0204049 Jensen's Operator Inequality] of Hansen and Pedersen.
  • {{springer|title=Jensen inequality|id=p/j054220}}
  • {{MathWorld|urlname=JensensInequality|title=Jensen's inequality}}
  • {{cite web|title=Introduction to Inequalities|url=http://www.mediafire.com/?1mw1tkgozzu |author=Arthur Lohwater|date=1982|publisher=Online e-book in PDF format}}

6 : Inequalities|Probabilistic inequalities|Statistical inequalities|Theorems in analysis|Articles containing proofs|Convex analysis

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/10 12:58:57