STAT 6710 Mathematical Statistics I Fall Semester

STAT 6710
Mathematical Statistics I
Fall Semester 2000
Dr. Jurgen Symanzik
Utah State University
Department of Mathematics and Statistics
3900 Old Main Hill
Logan, UT 84322{3900
Tel.: (435) 797{0696
FAX: (435) 797{1822
e-mail:
[email protected]
Contents
Acknowledgements
1
1 Axioms of Probability
1
1.1
1.2
1.3
1.4
{Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Manipulating Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Combinatorics and Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . 17
2 Random Variables
2.1
2.2
2.3
2.4
24
Measurable Functions . . . . . . . . . . . . . .
Probability Distribution of a Random Variable
Discrete and Continuous Random Variables . .
Transformations of Random Variables . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Moments and Generating Functions
3.1
3.2
3.3
3.4
3.5
42
Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . .
Complex{Valued Random Variables and Characteristic Functions
Probability Generating Functions . . . . . . . . . . . . . . . . . .
Moment Inequalities . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Random Vectors
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
24
27
31
36
42
50
56
66
68
71
Joint, Marginal, and Conditional Distributions
Independent Random Variables . . . . . . . . .
Functions of Random Vectors . . . . . . . . . .
Order Statistics . . . . . . . . . . . . . . . . . .
Multivariate Expectation . . . . . . . . . . . .
Multivariate Generating Functions . . . . . . .
Conditional Expectation . . . . . . . . . . . . .
Inequalities and Identities . . . . . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
77
82
89
91
97
102
106
5 Particular Distributions
112
5.1 Multivariate Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2 Exponential Family of Distributions . . . . . . . . . . . . . . . . . . . . . . . . 119
6 Limit Theorems
121
6.1 Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.2 Weak Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Index
139
2
Acknowledgements
I would like to thank my students, Hanadi B. Eltahir, Rich Madsen, and Bill Morphet, who
helped during the Fall 1999 and Spring 2000 semesters in typesetting these lecture notes
using LATEX and for their suggestions how to improve some of the material presented in class.
Thanks are also due to my Fall 2000 and Spring 2001 semester students, Weiping Deng, David
B. Neal, Emily Simmons, Shea Watrin, Aonan Zhang, and Qian Zhao, for their comments
that helped to improve these lecture notes.
In addition, I particularly would like to thank Mike Minnotte and Dan Coster, who previously
taught this course at Utah State University, for providing me with their lecture notes and other
materials related to this course. Their lecture notes, combined with additional material from
Rohatgi (1976) and other sources listed below, form the basis of the script presented here.
The textbook required for this class is:
Rohatgi, V. K. (1976): An Introduction to Probability Theory and Mathematical Statistics, John Wiley and Sons, New York.
A Web page dedicated to this class is accessible at:
http://www.math.usu.edu/~symanzik/teaching/2000_stat6710/stat6710.html
This course closely follows Rohatgi (1976) as described in the syllabus. Additional material
originates from the lectures from Professors Hering, Trenkler, and Gather I have attended
while studying at the Universitat Dortmund, Germany, the collection of Masters and PhD
Preliminary Exam questions from Iowa State University, Ames, Iowa, and the following textbooks:
Bandelow, C. (1981): Einfuhrung in die Wahrscheinlichkeitstheorie, Bibliographisches
Institut, Mannheim, Germany.
Casella, G., and Berger, R. L. (1990): Statistical Inference, Wadsworth & Brooks/Cole,
Pacic Grove, CA.
Fisz, M. (1989): Wahrscheinlichkeitsrechnung und mathematische Statistik, VEB Deutscher
Verlag der Wissenschaften, Berlin, German Democratic Republic.
Kelly, D. G. (1994): Introduction to Probability, Macmillan, New York, NY.
Mood, A. M., and Graybill, F. A., and Boes, D. C. (1974): Introduction to the Theory
of Statistics (Third Edition), McGraw-Hill, Singapore.
Parzen, E. (1960): Modern Probability Theory and Its Applications, Wiley, New York,
NY.
3
Searle, S. R. (1971): Linear Models, Wiley, New York, NY.
Additional denitions, integrals, sums, etc. originate from the following formula collections:
Bronstein, I. N. and Semendjajew, K. A. (1985): Taschenbuch der Mathematik (22.
Auage), Verlag Harri Deutsch, Thun, German Democratic Republic.
Bronstein, I. N. and Semendjajew, K. A. (1986): Erganzende Kapitel zu Taschenbuch der
Mathematik (4. Auage), Verlag Harri Deutsch, Thun, German Democratic Republic.
Sieber, H. (1980): Mathematische Formeln | Erweiterte Ausgabe E, Ernst Klett, Stuttgart,
Germany.
Jurgen Symanzik, January 12, 2001
4
Lecture 02:
We 08/30/00
1 Axioms of Probability
1.1
{Fields
Let be the sample space of all possible outcomes of a chance experiment. Let ! 2 (or
x 2 ) be any outcome.
Example:
Count # of heads in n coin tosses. = f0; 1; 2; : : : ; ng.
Any subset A of is called an event.
For each event A , we would like to assign a number (i.e., a probability). Unfortunately,
we cannot always do this for every subset of .
Instead, we consider classes of subsets of called elds and {elds.
Denition 1.1.1:
A class L of subsets of is called a eld if 2 L and L is closed under complements and
nite unions, i.e., L satises
(i) 2 L
(ii) A 2 L =) AC 2 L
(iii) A; B 2 L =) A [ B 2 L
Since C = , (i) and (ii) imply 2 L. Therefore, (i)': 2 L [can replace (i)].
Note: De Morgan's Laws
For any class A of sets, and sets A 2 A, it holds:
[
A2A
A=(
\
A2A
AC )C and
\
A2A
A=(
[
A2A
AC )C :
Note:
So (ii), (iii) imply (iii)': A; B 2 L =) A \ B 2 L [can replace (iii)].
1
Proof:
A; B 2 L =(ii)) AC ; B C 2 L =(iii)) (AC [ B C ) 2 L =(ii)) (AC [ B C )C 2 L =DM
)A\B 2L
Denition 1.1.2:
A class L of subsets of is called a {eld (Borel eld, {algebra) if it is a eld and closed
under countable unions,1i.e.,
[
(iv) fAn g1
An 2 L.
n=1 2 L =)
n=1
Note:
(iv) implies (iii) by taking An = for n 3.
Example 1.1.3:
For some , let L contain all nite and all conite sets (A is conite if AC is nite | for
example, if = IN , A = fx j x cg is not nite but since AC = fx j x < cg is nite, A is
conite). Then L is a eld. But L is a {eld i (if and only if) is nite.
For example, let = Z . Take An = fng, each nite, so An 2 L. But
the set is not nite (it is innite) and also not conite ((
Question: Does this construction work for
= Z+
??
1
[
n=1
1
[
n=1
An = Z + 62 L, since
An)C = Z0; is innite, too).
Note:
The largest {eld in is the power set P(
) of all subsets of . The smallest {eld is
L = f; g.
Terminology:
A set A 2 L is said to be \measurable L".
Note:
We often begin with a class of sets, say a, which may not be a eld or a {eld.
Denition 1.1.4:
The {eld generated by a, (a), is the smallest {eld containing a, or the intersection
of all {elds containing a.
2
Note:
(i) Such {elds containing a always exist (e.g., P(
)), and (ii) the intersection of an arbitrary
# of {elds is always a {eld.
Proof:
\
(ii) Suppose L = L . We have to show that conditions (i) and (ii) of Def. 1.1.1 and (iv) of
Def. 1.1.2 are fullled:
(i) 2 L 8 =) 2 L
(ii) Let A 2 L =) A 2 L 8 =) AC 2 L 8[
=) AC 2 L
[
(iv) Let An 2 L 8n =) An 2 L 8 8n =) An 2 L 8 =) An 2 L
n
n
Example 1.1.5:
= f0; 1; 2; 3g; a = ff0gg; b = ff0g; f0; 1gg.
What is (a)?
(a): must include ; ; f0g
also: f1; 2; 3g by 1.1.1 (ii)
Since all unions are included, we have (a) = f
; ; f0g; f1; 2; 3gg
What is (b)?
(b): must include ; ; f0g; f0; 1g
also: f1; 2; 3g; f2; 3g by 1.1.1 (ii)
f0; 2; 3g by 1.1.1 (iii)
f1g by 1.1.1 (ii)
Since all unions are included, we have (b) = f
; ; f0g; f1g; f0; 1g; f2; 3g; f0; 2; 3g; f1; 2; 3gg
Note:
If is nite or countable, we will usually use L = P(
). If j j= n < 1, then j L j= 2n .
If is uncountable, P(
) may be too large to be useful and we may have to use some smaller
{eld.
3
Lecture 03:
Fr 09/01/00
Denition 1.1.6:
If = IR, an important special case is the Borel {eld, i.e., the {eld generated from all
half{open intervals of the form (a; b], denoted B or B1 . The sets of B are called Borel sets.
The Borel {eld on IRd (Bd ) is the {eld generated by d{dimensional rectangles of the form
f(x1 ; x2 ; : : : ; xd) j ai < xi bi ; i = 1; 2; : : : ; dg.
Note:
1
\
B contains all points: fxg = (x ; n1 ; x]
n=1
closed intervals: [x; y] = (x; y] + fxg = (x; y] [ fxg
open intervals: (x; y) = (x; y] ; fyg = (x; y] \ fygC
and semi{innite intervals: (x; 1) =
1
[
(x; x + n]
n=1
Note:
We now have a measurable space (
; L). We next dene a probability measure P () on (
; L)
to obtain a probability space (
; L; P ).
Denition 1.1.7: Kolmogorov Axioms of Probability
A probability measure (pm), P , on (
; L) is a set function P : L ! IR satisfying
(i) 0 P (A) 8A 2 L
(ii) P (
) = 1
(iii) If fAn g1
n=1 are disjoint sets in L and
1
[
n=1
Note:
1
[
An 2 L, then P (
1
[
n=1
An ) =
1
X
n=1
P (An ).
An 2 L holds automatically if L is a {eld but it is needed as a precondition in the case
that L is just a eld. Property (iii) is called countable additivity.
n=1
4
1.2 Manipulating Probability
Theorem 1.2.1:
For P a pm on (
; L), it holds:
(i) P () = 0
(ii) P (AC ) = 1 ; P (A) 8A 2 L
(iii) P (A) 1 8A 2 L
(iv) P (A [ B ) = P (A) + P (B ) ; P (A \ B ) 8A; B 2 L
(v) If A B , then P (A) P (B ).
Proof:
(i) An 8n )
1
[
n=1
An = 2 L .
Ai \ Aj = \ = 8i; j ) An are disjoint 8n.
P () = P (
1
[
n=1
:1:7(iii)
An ) Def 1=
1
X
n=1
P (A n ) =
This can only hold if P () = 0.
1
X
n=1
P ()
(ii) A1 A; A2 AC ; An 8n 3.
1
[
An = A1 [ A2 [
1
[
An = A1 [ A2 [ .
n=1
n=3
A1 \ A2 = A1 \ = A2 \ = ) A1 ; A2 ; are disjoint.
=
1 = P (
)
1
[
=
P(
Def 1:1:7(iii)
1
X
=
n=1
n=1
An)
P (An)
P (A1 ) + P (A2 ) +
=
Th1:2:1(i)
P (A1 ) + P (A2 )
P (A) + P (AC )
=
=
=) P (AC ) = 1 ; P (A) 8A 2 L.
5
1
X
n=3
P (An)
(iii) By Th. 1.2.1 (ii) P (A) = 1 ; P (AC )
=) P (A) 1 8A 2 L since P (AC ) 0 by Def. 1.1.7 (i).
(iv) A [ B = (A \ B C ) [ (A \ B ) [ (B \ AC ). So, (A [ B ) can be written as a union of
disjoint sets (A \ B C ); (A \ B ); (B \ AC ):
=) P (A [ B )
=
Def:1:1:7(iii)
=
=
=
Def:1:1:7(iii)
=
P ((A \ B C ) [ (A \ B ) [ (B \ AC ))
P (A \ B C ) + P (A \ B ) + P (B \ AC )
P (A \ B C ) + P (A \ B ) + P (B \ AC ) + P (A \ B ) ; P (A \ B )
(P (A \ B C ) + P (A \ B )) + (P (B \ AC ) + P (A \ B )) ; P (A \ B )
P (A) + P (B ) ; P (A \ B )
Lecture 04:
We 09/06/00
(v) B = (B \ AC ) [ A where (B \ AC ) and A are disjoint sets.
:1:7(iii)
P (B ) = P ((B \ AC ) [ A) Def 1=
P (B \ AC ) + P (A)
=) P (A) = P (B ) ; P (B \ AC )
=) P (A) P (B ) since P (B \ AC ) 0 by Def. 1.1.7 (i).
Theorem 1.2.2: Principle of Inclusion{Exclusion
Let A1 ; A2 ; : : : ; An 2 L. Then
P(
[n
k=1
Ak ) =
n
X
k=1
n
X
P (Ak );
k1 <k2
P (Ak1 \Ak2 )+
n
X
k1 <k2 <k3
P (Ak1 \Ak2 \Ak3 );: : :+(;1)n+1 P (
\n
k=1
Ak )
Proof:
n = 1 is trivial
n = 2 is Theorem 1.2.1 (iv)
Use induction for higher n (Homework).
Note:
A proof by induction consists of two steps:
First, we have to establish the induction base. For example, if we state that something
holds for all non{negative integers, then we have to show that it holds for n = 0. Similarly, if
we state that something holds for all integers, then we have to show that it holds for n = 1.
Formally, it is sucient to verify a claim for the smallest valid integer only. However, to get
some feeling how the proof from n to n + 1 might work, it is sometimes benecial to verify a
claim for 1; 2; or 3 as well.
6
In the second step, we have to establish the result in the induction step, showing that something holds for n + 1, using the fact that it holds for n (alternatively, we can show that it
holds for n, using the fact that it holds for n ; 1).
Theorem 1.2.3: Bonferroni's Inequality
Let A1 ; A2 ; : : : ; An 2 L. Then
n
X
i=1
P (Ai ) ;
X
i<j
[n
n
X
i=1
i=1
P (Ai \ Aj ) P ( Ai) P (Ai )
Proof:
n
X
[n
Right side: P ( Ai ) P (Ai )
i=1
i=1
Induction Base:
For n = 1, the right side evaluates to P (A1 ) P (A1 ), which is true.
Formally, the next step is not required. However, it does not harm to verify the claim for
n = 2 as well. For n = 2, the right side evaluates to P (A1 [ A2) P (A1 ) + P (A2 ).
P (A1 [ A2 ) Th:1:=2:1(iv) P (A1 ) + P (A2 ) ; P (A1 \ A2) P (A1 ) + P (A2 ) since P (A1 \ A2 ) 0
by Def. 1.1.7 (i).
This establishes the induction base for the right side.
Induction Step:
We assume the right side is true for n and show that it is true for n + 1:
n[
+1
P(
i=1
Ai )
=
Th:1:2:1(iv)
=
Def:1:1:7(i)
I:B:
=
Left side:
n
X
i=1
P (Ai ) ;
X
i<j
[n
P (( Ai ) [ An+1)
i=1
[n
[n
i=1
[n
i=1
P ( Ai ) + P (An+1) ; P (( Ai ) \ An+1)
P ( Ai ) + P (An+1)
i=1
n
X
i=1
nX
+1
i=1
P (Ai ) + P (An+1)
P (Ai )
[n
P (Ai \ Aj ) P ( Ai )
i=1
Induction Base:
For n = 1, the left side evaluates to P (A1 ) P (A1 ), which is true.
7
For n = 2, the left side evaluates to P (A1 ) + P (A2 ) ; P (A1 \ A2 ) P (A1 [ A2 ), which is true
by Th. 1.2.1 (iv).
For n = 3, the left side evaluates to
P (A1 ) + P (A2 ) + P (A3 ) ; P (A1 \ A2 ) ; P (A1 \ A3 ) ; P (A2 \ A3 ) P (A1 [ A2 [ A3 ).
This holds since
P (A1 [ A2 [ A3 )
= P ((A1 [ A2 ) [ A3 )
Th:1:2:1(iv)
= P (A1 [ A2 ) + P (A3 ) ; P ((A1 [ A2 ) \ A3 )
= P (A1 [ A2 ) + P (A3 ) ; P ((A1 \ A3 ) [ (A2 \ A3 ))
Th:1:2:1(iv)
P (A1 ) + P (A2 ) ; P (A1 \ A2 ) + P (A3 ) ; P (A1 \ A3 ) ; P (A2 \ A3 )
+P ((A1 \ A3 ) \ (A2 \ A3 ))
= P (A1 ) + P (A2 ) + P (A3 ) ; P (A1 \ A2 ) ; P (A1 \ A3 ) ; P (A2 \ A3 ) + P (A1 \ A2 \ A3 )
=
Def 1:1:7(i)
P (A1 ) + P (A2 ) + P (A3 ) ; P (A1 \ A2 ) ; P (A1 \ A3 ) ; P (A2 \ A3 )
This establishes the induction base for the left side.
Induction Step:
We assume the left side is true for n and show that it is true for n + 1:
P(
n[
+1
i=1
Ai )
[n
=
P (( Ai) [ An+1)
=
P ( Ai) + P (An+1 ) ; P (( Ai) \ An+1 )
left I:B:
=
i=1
[n
i=1
n
X
i=1
nX
+1
i=1
n
+1
Th:1:2:3 right side X
=
i=1
nX
+1
i=1
P (Ai ) ;
P (Ai ) ;
P (Ai ) ;
P (Ai ) ;
[n
n
X
i<j
n
X
i=1
[n
P (Ai \ Aj ) + P (An+1 ) ; P (( Ai ) \ An+1 )
i<j
n
X
i<j
nX
+1
[n
i=1
P (Ai \ Aj ) ; P ( (Ai \ An+1 ))
P (Ai \ Aj ) ;
i<j
8
P (Ai \ Aj )
i=1
n
X
i=1
P (Ai \ An+1)
Theorem 1.2.4: Boole's Inequality
Let A; B 2 L. Then
(i) P (A \ B ) P (A) + P (B ) ; 1
(ii) P (A \ B ) 1 ; P (AC ) ; P (B C )
Proof:
Homework
Denition 1.2.5: Continuity of sets
For a sequence of sets fAn g1
n=1 ; An 2 L and A 2 L, we say
(i) An " A if A1 A2 A3 : : : and A =
(ii) An # A if A1 A2 A3 : : : and A =
1
[
n=1
1
\
n=1
An .
An .
Theorem 1.2.6:
If fAn g1
n=1 ; An 2 L and A 2 L, then nlim
!1 P (An ) = P (A) if 1.2.5 (i) or 1.2.5 (ii) holds.
Proof:
Part (i): Assume that 1.2.5 (i) holds.
Let B1 = A1 and Bk = Ak ; Ak;1 = Ak \ ACk;1 8k 2
By construction, Bi \ Bj = for i =
6 j
1
1
[
[
It is A =
n=1
An =
and also An =
P(A) = P(
1
[
k=1
[n
i=1
Bn
Ai =
[n
i=1
Bi
Bk ) Def. 1.1.7
= (iii)
Def. 1.1.7 (iii)
=
n=1
[n
nlim
!1[P (
k=1
1
X
k=1
n
X
P (Bk ) = nlim
!1[
Bk )] = nlim
!1[P (
The last step is possible since An =
[n
k=1
[n
k=1
Ak
9
k=1
P (Bk )]
Ak )] = nlim
!1 P (An )
Part (ii): Assume that 1.2.5 (ii) holds. 1
1
\
[
Then, AC1 AC2 AC3 : : : and AC = ( An )C De Morgan
=
ACn
n=1
n=1
C
P (AC ) By Part
= (i) nlim
!1 P (An )
C
So 1 ; P (AC ) = 1 ; nlim
!1 P (An )
C
=) P (A) = nlim
!1(1 ; P (An )) = nlim
!1 P (An )
Lecture 05:
Fr 09/08/00
Theorem 1.2.7:
(i) Countable unions of probability 0 sets have probability 0.
(ii) Countable intersections of probability 1 sets have probability 1.
Proof:
Part (i):
Let fAn g1
n=1 2 L, P (An ) = 0 8n
1
1
1
[
X
Def. 1.7.7 (i)
Th. 1.2.3 X
0 P ( An ) P (An) = 0 = 0
n=1
Therefore P (
1
[
n=1
n=1
An) = 0.
Part (ii):
Let fAn g1
n=1 2 L, P (An ) = 1 8n
=)
Th. 1.2.1 (ii)
n=1
P (ACn ) = 0 8n Th.=1.2.7
) (i) P (
1
[
n=1
ACn ) = 0 De=Morgan
) P(
1
\
n=1
10
An) = 1
1.3 Combinatorics and Counting
For now, we restrict ourselves to sample spaces containing a nite number of points.
X
Let = f!1 ; : : : ; !n g and L = P(
). For any A 2 L; P (A) =
P (!j ).
!j 2A
Denition 1.3.1:
We say the elements of are equally likely (or occur with uniform probability) if
P (!j ) = n1 8j = 1; : : : ; n.
Note:
!j in A
If this is true, P (A) = number
number !j in . Therefore, to calculate such probabilities, we just
need to be able to count elements accurately.
Theorem 1.3.2: Fundamental Theorem of Counting
If we wish to select one element (a1 ) out of n1 choices, a second element (a2 ) out of n2 choices,
and so on for a total of k elements, there are
n1 n2 n3 : : : nk
ways to do it.
Proof: (By Induction)
Induction Base:
k = 1: trivial
k = 2: n1 ways to choose a1 . For each, n2 ways to choose a2 .
Total # of ways = n| 2 + n2 +
{z : : : + n2} = n1 n2.
n1 times
Induction Step:
Suppose it is true for (k ; 1). We show that it is true for k = (k ; 1) + 1.
There are n1 n2 n3 : : : nk;1 ways to select one element (a1 ) out of n1 choices, a
second element (a2 ) out of n2 choices, and so on, up to the (k ; 1)th element (ak;1 ) out of
nk;1 choices. For each of these n1 n2 n3 : : : nk;1 possible ways, we can select the kth
element (ak ) out of nk choices. Thus, the total # of ways = (n1 n2 n3 : : : nk;1) nk .
11
Denition 1.3.3:
For positive integer n, we dene n factorial as n! = n(n;1)(n;2): : :21 = n(n;1)!
and 0! = 1.
Denition 1.3.4:
For nonnegative integers n r, we dene the binomial coecient (read as n choose r) as
!
n = n! = n (n ; 1) (n ; 2) : : : (n ; r + 1) :
r!(n ; r)!
1 2 3 ::: r
r
Note:
A useful extension for the binomial coecient for n < r is
!
n = n (n ; 1) : : : 0 : : : (n ; r + 1) = 0:
r
1 2 ::: r
Note:
Most counting problems consist of drawing a xed number of times from a set of elements
(e.g., f1; 2; 3; 4; 5; 6g). To solve such problems, we need to know
(i) the size of the set, n;
(ii) the size of the sample, r;
(iii) whether the result will be ordered (i.e., is f1; 2g dierent from f2; 1g); and
(iv) whether the draws are with replacement (i.e, can results like f1; 1g occur?).
Theorem 1.3.5:
The number of ways to draw r elements from a set of n, if
(i) ordered, without replacement, is (n;n!r)! ;
(ii) ordered, with replacement, is nr ;
;
(iii) unordered, without replacement, is r!(nn;! r)! = nr ;
1)! = ;n+r;1.
(iv) unordered, with replacement, is (rn!(+nr;;1)!
r
12
Proof:
(i) n choices to select 1st
n ; 1 choices to select 2nd
..
.
n ; r + 1 choices to select rth
By Theorem 1.3.2, there are n (n ; 1) : : : (n ; r +1) = n(n;1):::(n(;nr;)!r+1)(n;r)! =
n!
(n;r)! ways to do so.
Corollary:
The number of permutations of n objects is n!.
(ii) n choices to select 1st
n choices to select 2nd
..
.
n choices to select rth
By Theorem 1.3.2, there are n| n {z: : : n} = nr ways to do so.
r times
(iii) We know from (i) above that there are (n;n!r)! ways to draw r elements out of n elements
without replacement in the ordered case. However, for each unordered set of size r,
there are r! related ordered sets that consist of the same elements. Thus, there are
1 ;n
n!
(n;r)! r! = r ways to draw r elements out of n elements without replacement in the
unordered case.
(iv) There is no immediate direct way to show this part. We have to come up with some
extra motivation. We assume that there are (n ; 1) walls that separate the n bins of
possible outcomes and there are r markers. If we shake everything, there are (n ; 1+ r)!
permutations to arrange these (n ; 1) walls and r markers according to the Corollary.
Since the r markers are indistinguishable and the (n ; 1) walls are also indistinguishable,
we have to divide the number of permutations by r! to get rid of identical permutations
where only the markers are changed and by (n ; 1)! to get rid of identical permutations
r)! ;n+r;1 ways to draw r
where only the walls are changed. Thus, there are (rn!(;n1+
;1)! = r
elements out of n elements with replacement in the unordered case.
13
Theorem 1.3.6: The Binomial Theorem
If n is a non{negative integer, then
n
X
(1 + x)n =
r=0
!
n xr
r
Proof: (By Induction)
Induction Base:
!
0 0!
X
0
n = 0: 1 = (1 + x)0 =
xr = 0 x0 = 1
r
r=0
n = 1:
(1 + x)1
=
1 1!
X
r=0
r
xr
!
!
= 10 x0 + 11 x1 = 1 + x
Induction Step:
Suppose it is true for k. We show that it is true for k + 1.
(1 + x)k+1 = (1 + x)k (1 + x)
IB
=
=
k k! !
X
r
r=0 r
k k!
X
r=0
!
r
x (1 + x)
k k!
X
r
x +
xr+1
r
r=0
k " k!
X
k
!#
!
=
k x0 +
0
=
k
k + 1 x0 + X
k + k
r + k + 1 xk+1
x
0
r;1
k+1
r=1 r
()
=
=
!
r=1
r + r;1
" !
!
!
xr +
!#
k xk+1
k
!
!
k k+1
k + 1 x0 + X
1 xk+1
xr + kk +
0
r
+
1
r=1
kX
+1
r=0
!
k + 1 xr
r
(*) Here we use Theorem 1.3.8 (i). Since the proof of Theorem 1.3.8 (i) only needs algebraic
transformations without using the Binomial Theorem, part (i) of Theorem 1.3.8 can be applied here.
14
Lecture 06:
Mo 09/11/00
Corollary 1.3.7:
For a non{negative integer n, it holds:
!
!
!
!
!
(i) n0 + n1 + : : : + nn = 2n
!
!
!
(ii) n0 ; n1 + n2 ; n3 + : : : + (;1)n nn = 0
!
!
!
!
!
!
!
(iii) 1 n1 + 2 n2 + 3 n3 + : : : + n nn = n2n;1
!
(iv) 1 n1 ; 2 n2 + 3 n3 + : : : + (;1)n;1 n nn = 0
Proof:
Use the Binomial Theorem:
(i) Let x = 1. Then
n
X
2n = (1 + 1)n Th:=1:3:6
r=0
(ii) Let x = ;1. Then
(iii)
n
d (1 + x)n = d X
dx
dx
r=0
!
n
X
0 = (1 + (;1))n Th:=1:3:6
r=0
n xr
r
n
X
=) n(1 + x)n;1 = r r=1
Substitute x = 1, then
!
!
n (;1)r
r
!
n xr;1
r
n2n;1 = n(1 + 1)n;1
(iv) Substitute x = ;1 in (iii) above, then
n
X
0 = n(1 + (;1))n;1 = r r=1
since for
!
n n
n 1r = X
r
r=0 r
P a = 0 also P(;a ) = 0.
i
i
15
!
n
X
!
= r nr
r=1
!
n
n (;1)r = X
n (;1)r;1
r
r
r
r=1
Theorem 1.3.8:
For non{negative integers, n; m; r, it holds:
!
!
!
;1 = n
(i) n ;r 1 + nr ;
1
r
! !
!
!
! !
!
n m = m+n
(ii) n0 mr + n1 r m
+
:
:
:
+
;1
r 0
r
!
!
!
!
+1
(iii) 0r + 1r + 2r + : : : + nr = nr +
1
Proof:
Homework
16
!
1.4 Conditional Probability and Independence
So far, we have computed probability based only on the information that is used for a
probability space (
; L; P ). Suppose, instead, we know that event H 2 L has happened.
What statement should we then make about the chance of an event A 2 L ?
Denition 1.4.1:
Given (
; L; P ) and H 2 L; P (H ) > 0, and A 2 L, we dene
\ H ) = P (A)
P (AjH ) = P (PA(H
H
)
and call this the conditional probability of A given H .
Note:
P (AjH ) is undened if P (H ) = 0.
Theorem 1.4.2:
In the situation of Denition 1.4.1, (
; L; PH ) is a probability space.
Proof:
If PH is a probability measure, it must satisfy Def. 1.1.7.
(i) P (H ) > 0 and by Def. 1.1.7 (i) P (A \ H ) 0 =) PH (A) = P P(A(H\H) ) 0 8A 2 L
(ii) PH (
) = P P(
(H\H) ) = PP ((HH )) = 1
(iii) Let fAn g1
n=1 be a sequence of disjoint sets. Then,
PH (
1
[
n=1
An )
Def:1:4:1
=
=
P ((
Def 1:4:1
=
17
n=1
An) \ H )
P (H )
P(
1
[
n=1
1
X
Def:1:1:7(iii) n=1
=
=
1
[
(An \ H ))
P (H )
P (An \ H )
P (H )
1 P (A \ H )
X
( n
)
n=1
1
X
n=1
P (H )
PH (An )
Note:
What we have done is to move to a new sample space H and a new {eld LH = L \ H of
subsets A \ H for A 2 L. We thus have a new measurable space (H; LH ) and a new probability
space (H; LH ; PH ).
Note:
From Denition 1.4.1, if A; B 2 L; P (A) > 0, and P (B ) > 0, then
P (A \ B ) = P (A)P (B jA) = P (B )P (AjB );
which generalizes to the following Theorem.
Theorem 1.4.3: Multiplication Rule
n\
;1
If A1 ; : : : ; An 2 L and P ( Aj ) > 0, then
j =1
P(
\n
j =1
Aj ) = P (A1 ) P (A2 jA1 ) P (A3 jA1 \ A2 ) : : : P (An j
n\
;1
j =1
Aj ):
Proof:
Homework
Denition 1.4.4:
A collection of subsets fAn g1
n=1 of form a partition of if
(i)
1
[
n=1
An = , and
(ii) Ai \ Aj = 8i 6= j , i.e., elements are pairwise disjoint.
Theorem 1.4.5: Law of Total Probability
If fHj g1
j =1 is a partition of , and P (Hj ) > 0 8j , then, for A 2 L,
P (A) =
1
X
j =1
P (A \ Hj ) =
1
X
j =1
P (Hj )P (AjHj ):
Proof:
By the Note preceding Theorem 1.4.3, the summands on both sides are equal
=) the right side of Th. 1.4.5 is true.
18
The left side proof:
Hj are disjoint ) A \ Hj are disjoint
1
[
A = A \ Def=1:4:4 A \ (
1
[
j =1
Hj ) =
=) P (A) = P ( (A \ Hj ))
j =1
1
[
(A \ Hj )
j =1
1
Def 1:1:7(iii) X
=
j =1
P (A \ Hj )
Theorem 1.4.6: Bayes' Rule
Let fHj g1
j =1 be a partition of , and P (Hj ) > 0 8j . Let A 2 L and P (A) > 0. Then
P (Hj )P (AjHj ) 8j:
P (Hj jA) = X
1
P (Hn)P (AjHn )
n=1
Proof:
P (Hj \ A) Def=1:4:1 P (A) P (Hj jA) = P (Hj ) P (AjHj )
P (Hj )P (AjHj ) .
=) P (Hj jA) = P (HjP)P(A()AjHj ) Th:=1:4:5 X
1
P (Hn)P (AjHn)
n=1
Denition 1.4.7:
For A; B 2 L, A and B are independent i P (A \ B ) = P (A)P (B ).
Note:
There are no restrictions on P (A) or P (B ).
If A and B are independent, then P (AjB ) = P (A) (given that P (B ) > 0) and P (B jA) =
P (B ) (given that P (A) > 0).
If A and B are independent, then the following events are independent as well: A and
B C ; AC and B ; AC and B C .
Denition 1.4.8:
Let A be a collection of L{sets. The events of A are pairwise independent i for every
distinct A1 ; A2 2 A it holds P (A1 \ A2 ) = P (A1 )P (A2 ).
19
Lecture 07:
We 09/13/00
Denition 1.4.9:
Let A be a collection of L{sets. The events of A are mutually independent (or completely
independent) i for every nite subcollection fAi1 ; : : : ; Aik g; Aij 2 A, it holds
P(
\k
j =1
Ai ) =
j
Yk
j =1
P (Ai ):
j
Note:
To check for mutually independence of n events fA1 ; : : : ; An g 2 L, there are 2n ; n ; 1 relations (i.e., all subcollections of size 2 or more) to check.
Example 1.4.10:
Flip a fair coin twice. = fHH; HT; TH; TT g.
A1 = \H on 1st toss"
A2 = \H on 2nd toss"
A3 = \Exactly one H "
Obviously, P (A1 ) = P (A2 ) = P (A3 ) = 12 .
Question: Are A1; A2 and A3 pairwise independent and also mutually independent?
P (A1 \ A2 ) = :25 = :5 :5 = P (A1 ) P (A2 ) ) A1; A2 are independent.
P (A1 \ A3 ) = :25 = :5 :5 = P (A1 ) P (A3 ) ) A1; A3 are independent.
P (A2 \ A3 ) = :25 = :5 :5 = P (A2 ) P (A3 ) ) A2; A3 are independent.
Thus, A1 ; A2 ; A3 are pairwise independent.
P (A1 \ A2 \ A3 ) = 0 6= :5 :5 :5 = P (A1 ) P (A2 ) P (A3 ) ) A1 ; A2 ; A3 are not mutually
independent.
Example 1.4.11: (from Rohatgi, page 37, Example 5)
r students. 365 possible birthdays for each student that are equally likely.
One student at a time is asked for his/her birthday.
If one of the other students hears this birthday and it matches his/her birthday, this
other student has to raise his/her hand | if at least one other student raises his/her
hand, the procedure is over.
20
We are interested in
pk = P (procedure terminates at the kth student)
= P (a hand is rst risen when the kth student is asked for his/her birthday)
It is
p1 = P (at least 1 other from the (r ; 1) students has a birthday on this particular day.)
= 1 ; P (all (r ; 1) students have a birthday on the remaining 364 out of 365 days)
364 r;1
= 1 ; 365
p2 = P (no student has a birthday matching the rst student and at least one
of the other (r ; 2) students has a b-day matching the second student)
Let A No student has a b-day matching the 1st student
Let B At least one of the other (r ; 2) has b-day matching 2nd
So p2 = P (A \ B )
= P (A) P (B jA)
= P (no student has a matching b-day with the 1st student ) P (at least one of the remaining students has a matching b-day with the second,
given that no one matched the rst.)
= (1 ; p1 )[1 ; P (all (r ; 2) students have a b-day on the remaining 363 out of 364 days)
364 r;1 " 363 r;2 #
1 ; 364
= 365
=
365 ; 1 r;1 "
363 r;2#
1 ; 364
365
"
()
1 r ;1
363 r;2
365
= 365 1 ; 365
1 ; 364
=
365P2;1 (365)2;1
#
"
r;2+1
365 ; 2 r;2
2
;
1
1 ; 365
1 ; 365 ; 2 + 1
#
Formally, we have to write this sequence of equalities in this order. However, it often might
be easier to rst work from both sides towards a particular result and combine partial results
afterwards. Here, one might decide to stop at () with the \forward" direction of the equalities
21
and rst work \backwards" from the book, which makes things a lot simpler:
"
365P2;1 p2 =
(365)2;1
r;2+1
365 ; 2 r;2
2
;
1
1 ; 365
1 ; 365 ; 2 + 1
"
1 r;1
363 r;2
365
= 365 1 ; 365
1 ; 364
365 ; 1 r;1 "
=
365
#
#
363 r;2#
1 ; 364
We see that this is the same result as ().
Now let us consider p3 :
p3 = P (No one has same b-day as rst and no one same as second, and at least one of the
remaining (r ; 3) has a matching b-day with the 3rd student)
Let A No one has the same b-day as the rst student
Let B No one has the same b-day as the second student
Let C At least one of the other (r ; 3) has the same b-day as the third students
Now:
p3 = P (A \ B \ C )
= P (A) P (B jA) P (C jA \ B )
364 r;1 363 r;2
= 365
[1 ; P (all (r ; 3) students have a b-day on the remaining 362 out of 363 days]
364
364 r;1 363 r;2 " 362 r;3#
1 ; 363
= 365
364
"
362 r;3
r;1 (363)r;2
(364)
= (365)r;1 (364)r;2 1 ; 363
=
364r;1
364r;2
!
! "
#
363r;2 1 ; 362 r;3
365r;1
363
#
364 363r;2 ! " 362 r;3#
= 365 365r;2 1 ; 363
(365)(364) 363 r;2 " 362 r;3#
=
1 ; 363
(365)2
365
22
=
=
365P3;1 "
365P2 (365)2
(365)3;1
r;2
362 r;3
2
1 ; 365
1 ; 363
#
"
r;3+1
365 ; 3 r;3
3
;
1
1 ; 365
1 ; 365 ; 3 + 1
#
Once again, working \backwards" from the book should help to better understand these
transformations.
For general pk and restrictions on r and k see Homework.
23
Lecture 08:
Fr 09/15/00
2 Random Variables
2.1 Measurable Functions
Denition 2.1.1:
A random variable (rv) is a set function from to IR.
More formally: Let (
; L; P ) be any probability space. Suppose X : ! IR and that
X is a measurable function, then we call X a random variable.
More generally: If X : ! IRk , we call X a random vector, X = (X1 (!); X2 (!); : : : ; Xk (!)).
What does it mean to say that a function is measurable?
Denition 2.1.2:
Suppose (
; L) and (S; B) are two measurable spaces and X : ! S is a mapping from
to S. We say that X is measurable L ; B if X ;1 (B ) 2 L for every set B 2 B, where
X ;1 (B ) = f! 2 : X (!) 2 B g.
Example 2.1.3:
Record the opinion of 50 people: \yes" (y) or \no" (n).
= fAll 250 possible sequences of y/ng | HUGE !
L = P(
)
X : ! S = fAll 250 possible sequences of 1 (=y) and 0 (=n)g
B = P(S)
X is a random vector since each element in S has a corresponding element in , for B 2
B; X ;1 (B ) 2 L = P(
).
Consider X : ! S = f0; 1; 2; : : : ; 50g, where X (!) = \# of y's in !" is a more manageable
random variable.
A simple function, which takes only nite many values x1 ; : : : ; xk is measurable i
X ;1 (xi ) 2 L 8xi.
Here, X ;1 (k) = f! 2 : # 1's in sequence ! = kg is a subset of , so it is in L = P(
)
24
Example 2.1.4:
Let = \innite fair coin tossing space", i.e., innite sequence of H's and T's.
Let Ln be a {eld
for the 1st n tosses.
1
[
Dene L = ( Ln ).
n=1
Let Xn : ! IR be Xn (!) = \proportion of H's in 1st n tosses".
For each n, Xn () is simple (values f0; n1 ; n2 ; : : : ; ng) and Xn;1 ( nk ) 2 Ln 8k = 0; 1; : : : ; n.
Therefore, Xn;1 ( nk ) 2 L.
So every random variable Xn () is measurable L ; B. Now we have a sequence of rv's fXn g1
n=1 .
1
We will show later that P (f! : Xn (!) ! 2 g) = 1, i.e., the Strong Law of Large Numbers
(SLLN).
Some Technical Points about Measurable Functions
2.1.5:
Suppose (
; L) and (S; B) are measure spaces and that a collection of sets A generates B, i.e.,
(A) = B. Let X : ! S. If X ;1 (A) 2 L 8A 2 A, then X is measurable L ; B.
This means we only have to check measurability on a basis collection A. The usage is: B on
IR is generated by f(;1; x] : x 2 IRg.
2.1.6:
If (
; L); (
0 ; L0 ), and (
00 ; L00 ) are measure spaces and X : ! 0 and Y : 0 ! 00 are
measurable, then the composition (Y X ) : ! 00 is measurable L ; L00 .
2.1.7:
If f : IRi ! IRk is a continuous function, then f is measurable Bi ; Bk .
2.1.8:
If fj : ! IR; j = 1; : : : k and g : IRk ! IR are measurable, then g(f1 (); : : : ; fk ()) is measurable.
The usage is: g could be sum, average, dierence, product, (nite) maximums and minimums
of x1 ; : : : ; xk , etc.
25
2.1.9:
Limits: Extend the real line to [;1; 1] = IR [ f;1; 1g.
We say f : ! IR is measurable L ; B if
(i) f ;1 (B ) 2 L 8B 2 B, and
(ii) f ;1 (;1); f ;1 (1) 2 L also.
2.1.10:
Suppose f1 ; f2 ; : : : is a sequence of real{valued measurable functions (
; L) ! (IR; B). Then
it holds:
(i) sup fn ; inf
n fn; lim sup fn ; lim inf
n fn , are measurable.
n
n
(ii) If f = lim
n fn exists, then f is measurable.
(iii) The set f! : fn (!) convergesg 2 L.
(iv) If f is any measurable function, the set f! : fn (!) ! f (!)g 2 L.
26
2.2 Probability Distribution of a Random Variable
The denition of a random variable X : (
; L) ! (S; B) makes no mention of P . We now
introduce a probability measure on (S; B).
Theorem 2.2.1:
A random variable X on (
; L; P ) induces a probability measure on a space (IR; B; Q) with
the probability distribution Q of X dened by
Q(B ) = P (X ;1 (B )) = P (f! : X (!) 2 B g) 8B 2 B:
Note:
By the denition of a random variable, X ;1 (B ) 2 L 8B 2 B. Q is called induced probability.
Proof:
If X induces a probability measure Q on (IR; B), then Q must satisfy the Kolmogorov Axioms
of probability.
X : (
; L) ! (S; B). X is a rv ) X ;1 (B ) = f! : X (!) 2 B g = A 2 L 8B 2 B.
(i) Q(B ) = P (X ;1 (B )) = P (f! : X (!) 2 B g) = P (A)
Def:1:1:7(i)
0 8B 2 B
(ii) Q(IR) = P (X ;1 (IR)) X=rv P (
) Def:1=:1:7(ii) 1
(iii) Let fBn g1
n=1 2 B; Bi \ Bj = 8i 6= j . Then,
Q(
1
[
n=1
1
1
1
1
[
[
X
()
Def:1:1:7(iii) X
;
1
;
1
;
1
Bn ) = P (X ( Bn)) = P ( (X (Bn ))) =
P (X (Bn )) = Q(Bn)
n=1
n=1
n=1
n=1
() holds since X ;1 () commutes with unions/intersections and preserves disjointedness.
Lecture 09:
Denition 2.2.2:
Mo 09/18/00
A real{valued function F on (;1; 1) that is non{decreasing, right{continuous, and satises
F (;1) = 0; F (1) = 1
is called a cumulative distribution function (cdf) on IR.
Note:
No mention of probability space or measure P in Denition 2.2.2 above.
27
Denition 2.2.3:
Let P be a probability measure on (IR; B). The cdf associated with P is
F (x) = FP (x) = P ((;1; x]) = P (f! : X (!) xg) = P (X x)
for a random variable X dened on (IR; B; P ).
Note:
F () dened as in Denition 2.2.3 above indeed is a cdf.
Proof (of Note):
(i) Let x1 < x2
=) (;1; x1 ] (;1; x2 ]
Th:1:2:1(v)
=) F (x1 ) = P (f! : X (!) x1 g) P (f! : X (!) x2 g) = F (x2 )
Thus, since x1 < x2 and F (x1 ) F (x2 ), F (:) is non{decreasing.
(ii) Since F is non-decreasing, it is sucient to show that F (:) is right{continuous if for any
sequence of numbers xn ! x+ (which means that xn is approaching x from the right)
with x1 > x2 > : : : > xn > : : : > x : F (xn ) ! F (x):
Let An = f! : X (!) 2 (x; xn ]g 2 L and An # . None of the intervals (x; xn ] contains x.
As xn ! x+, the number of points ! in An diminishes until the set is empty. Formally,
nlim
!1 An = nlim
!1
\n
i=1
Ai =
1
\
n=1
An = .
By Theorem 1.2.6 it follows that
nlim
!1 P (An ) = P (nlim
!1 An ) = P () = 0.
It is
P (An ) = P (f! : X (!) xn g) ; P (f! : X (!) xg) = F (xn ) ; F (x).
=) (nlim
!1 F (xn )) ; F (x) = nlim
!1(F (xn ) ; F (x)) = nlim
!1 P (An ) = 0
=) nlim
!1 F (xn ) = F (x)
=) F (x) is right{continuous.
28
(iii) F (;n) Def:=2:2:3 P (f! : X (!) ;ng)
=)
F (;1) = nlim
!1 F (;n)
= nlim
!1 P (f! : X (!) ;ng)
= P (nlim
!1f! : X (!) ;ng)
= P ()
= 0
(iv) F (n) Def:=2:2:3 P (f! : X (!) ng)
=)
F (1 ) =
=
=
=
=
nlim
!1 F (n)
nlim
!1 P (f! : X (!) ng)
P (nlim
!1f! : X (!) ng)
P (
)
1
Note that (iii) and (iv) implicitly use Theorem 1.2.6. In (iii), we use An = (;1; ;n) where
An An+1 and An # . In (iv), we use An = (;1; n) where An An+1 and An " IR.
Denition 2.2.4:
If a random variable X : ! IR has induced a probability measure PX on (IR; B) with cdf
F (x), we say
(i) rv X is continuous if F (x) is continuous in x.
(ii) rv X is discrete if F (x) is a step function in x.
Note:
There are rvs that are mixtures of continuous and discrete rvs. One such example is a truncated failure time distribution. We assume a continuous distribution (e.g., exponential) up
to a given truncation point x and assign the \remaining" probability to the truncation point.
Thus, a single point has a probability > 0 and F (x) jumps at the truncation point x.
29
Denition 2.2.5:
Two random variables X and Y are identically distributed i
PX (X 2 A) = PY (Y 2 A) 8A 2 L:
Note:
Def. 2.2.5 does not mean that X (!) = Y (!) 8! 2 . For example,
X = # H in 3 coin tosses
Y = # T in 3 coin tosses
X; Y are both Bin(3; 0:5), i.e., identically distributed, but for ! = (H; H; T ); X (!) = 2 6= 1 =
Y (!), i.e., X 6= Y .
Theorem 2.2.6:
The following two statements are equivalent:
(i) X; Y are identically distributed.
(ii) FX (x) = FY (x) 8x 2 IR.
Proof:
(i) ) (ii):
FX (x)
=
=
byDef:2:2:5
=
=
=
PX ((;1; x])
P (f! : X (!) 2 (;1; x]g)
P (f! : Y (!) 2 (;1; x]g)
PY ((;1; x])
FY (X )
(ii) ) (i):
Requires extra knowledge from measure theory.
30
Lecture 10:
We 09/20/00
2.3 Discrete and Continuous Random Variables
We now extend Denition 2.2.4 to make our denitions a little bit more formal.
Denition 2.3.1:
Let X be a real{valued random variable with cdf F on (
; L; P ). X is discrete if there exists
a countable set E IR such that P (X 2 E ) = 1, i.e., P (f! : X (!) 2 E g) = 1. The points of E
which have positive probability are the jump points of the step function F , i.e., the cdf of X .
Dene pi = P (f! : X (!) = xi ; xi 2 E g) = PX (X = xi ) 8i 1. Then, pi 0;
1
X
i=1
pi = 1.
We call fpi : pi 0g the probability mass function (pmf) (also: probability frequency
function) of X .
Note:
1
X
Given any set of numbers fpn g1
pn = 1, fpn g1
n=1 ; pn 0 8n 1;
n=1 is the pmf of some rv
n=1
X.
Note:
The issue of continuous rv's and probability density functions (pdfs) is more complicated. A
rv X : ! IR always has a cdf F . Whether there exists a function f such that f integrates
to F and F 0 exists and equals f (almost everywhere) depends on something stronger than
just continuity.
Denition 2.3.2:
A real{valued function F is continuous in x0 2 IR i
8 > 0 9 > 0 8x : j x ; x0 j< )j F (x) ; F (x0) j< :
F is continuous i F is continuous in all x 2 R.
31
Denition 2.3.3:
A real{valued function F dened on [a; b] is absolutely continuous on [a; b] i
8 > 0 9 > 0 8 nite subcollection of disjoint subintervals [ai; bi ]; i = 1; : : : ; n :
n
n
X
X
(bi ; ai ) < ) j F (bi ) ; F (ai ) j< :
i=1
i=1
Note:
Absolute continuity implies continuity.
Theorem 2.3.4:
(i) If F is absolutely continuous, then F 0 exists almost everywhere.
(ii) A function F is an indenite integral i it is absolutely continuous. Thus, every absolutely continuous function F is the indenite integral of its derivative F 0 .
Denition 2.3.5:
Let X be a random variable on (
; L; P ) with cdf F . We say X is a continuous rv i F
is absolutely continuous. In this case, there exists a non{negative integrable function f , the
probability density function (pdf) of X , such that
F (x) =
Zx
;1
f (t)dt = P (X x):
From this it follows that, if a; b 2 IR; a < b, then
PX (a < X b) = F (b) ; F (a) =
Zb
a
f (t)dt
exists and is well dened.
Theorem 2.3.6:
Let X be a continuous random variable with pdf f . Then it holds:
(i) For every Borel set B 2 B; P (B ) =
Z
B
f (t)dt.
(ii) If F is absolutely continuous and f is continuous at x, then F 0 (x) = dFdx(x) = f (x).
32
Proof:
Part (i): From Denition 2.3.5 above.
Part (ii): By Fundamental Theorem of Calculus.
Note:
As already stated in the Note following Denition 2.2.4, not every rv will fall into one of
these two (or if you prefer { three {, i.e., discrete, continuous/absolutely continuous) classes.
However, most rv which arise in practice will. We look at one example that is unlikely to
occur in practice in the next Homework assignment.
However, note that every cdf F can be written as
F (x) = aFd (x) + (1 ; a)Fc(x); 0 a 1;
where Fd is the cdf of a discrete rv and Fc is a continuous (but not necessarily absolute
continuous) cdf.
Some authors, such as Marek Fisz Wahrscheinlichkeitsrechnung und mathematische Statistik,
VEB Deutscher Verlag der Wissenschaften, Berlin, 1989, are even more specic. There it is
stated that every cdf F can be written as
F (x) = a1 Fd (x) + a2Fc (x) + a3 Fs (x); a1 ; a2 ; a3 0; a1 + a2 + a3 = 1:
Here, Fd (x) and Fc (x) are discrete and absolute continuous cdfs. Fs (x) is called a singular cdf. Singular means that Fs (x) is continuous and its derivative F 0(x) equals 0 almost
everywhere (i.e., everywhere but in those points that belong to a Borel{measurable set of
probability 0).
Question: Does \continuous" but \not absolutely continuous" mean \singular"? | We will
(hopefully) see later: : :
Example 2.3.7:
Consider
8
>
0;
x<0
>
>
< 1=2;
x=0
F (x) = >
1=2 + x=2; 0 < x < 1
>
>
: 1;
x1
We can write F (x) as aFd (x) + (1 ; a)Fc (x); 0 a 1. How?
Since F (x) has only one jump at x = 0, it is reasonable to get started with a pmf p0 = 1 and
corresponding cdf
(
Fd (x) = 0; x < 0
1; x 0
33
Since F (x) = 0 for x < 0 and F (x) = 1 for x 1, it must clearly hold that Fc (x) = 0 for
x < 0 and Fc (x) = 1 for x 1. In addition F (x) increases linearly in 0 < x < 1. A good
guess would be a pdf fc(x) = 1 I(0;1) (x) and corresponding cdf
8
>
< 0; x 0
Fc(x) = > x; 0 < x < 1
: 1; x 1
Knowing that F (0) = 1=2, we have at least to multiply Fd (x) by 1=2. And, indeed, F (x) can
be written as
F (x) = 21 Fd (x) + 12 Fc(x):
Denition 2.3.8:
The two{valued function IA (x) is called indicator function and it is dened as follows:
IA(x) = 1 if x 2 A and IA(x) = 0 if x 62 A for any set A.
34
An Excursion into Logic
When proving theorems we only used direct methods so far. We used induction proofs to
show that something holds for arbitrary n. To show that a statement A implies a statement
B , i.e., A ) B , we used proofs of the type A ) A1 ) A2 ) : : : ) An;1 ) An ) B where
one step directly follows from the previous step. However, there are dierent approaches to
obtain the same result.
Implication: (A implies B )
A ) B is equivalent to :B ) :A is equivalent to :A _ B :
A B A ) B :A :B :B ) :A :A _ B
1
1
0
0
1
0
1
0
1
0
1
1
0
0
1
1
0
1
0
1
1
0
1
1
1
0
1
1
Lecture 11:
Fr 09/22/00
Equivalence: (A is equivalent to B )
A , B is equivalent to (A ) B ) ^ (B ) A) is equivalent to (:A _ B ) ^ (A _ :B ):
A B A , B A ) B B ) A (A ) B ) ^ (B ) A) :A _ B A _ :B (:A _ B ) ^ (A _ :B )
1
1
0
0
1
0
1
0
1
0
0
1
1
0
1
1
1
1
0
1
1
0
0
1
1
0
1
1
Negations of Quantiers:
:8x 2 X : B (x) is equivalent to 9x 2 X : :B (x)
:9x 2 X : B (x) is equivalent to 8x 2 X : :B (x)
9x 2 X 8y 2 Y : B (x; y) implies 8y 2 Y 9x 2 X : B (x; y)
35
1
1
0
1
1
0
0
1
2.4 Transformations of Random Variables
Let X be a real{valued random variable on (
; L; P ), i.e., X : (
; L) ! (IR; B). Let g be
any Borel{measurable real{valued function on IR. Then, by statement 2.1.6, Y = g(X ) is a
random variable.
Theorem 2.4.1:
Given a random rariable X with known induced distribution and a Borel{measurable function
g, then the distribution of the random variable Y = g(X ) is determined.
Proof:
FY (y) = PY (Y y)
= P (f! : g(X (!)) yg)
= P (f! : X (!) 2 By g) where By = g;1 ((;1; y]) 2 B since g is Borel{measureable.
= P (X ;1 (By ))
Note:
From now on, we restrict ourselves to real{valued (vector{valued) functions that are Borel{
measurable, i.e., measurable with respect to (IR; B) or (IRk ; Bk ).
More generally, PY (Y 2 C ) = PX (X 2 g;1 (C )) 8C 2 B.
Example 2.4.2:
Suppose X is a discrete random variable. Let A be a countable set such that P (X 2 A) = 1
and P (X = x) > 0 8x 2 A.
Let Y = g(X ). Obviously, the sample space of Y is also countable. Then,
PY (Y = y) =
X
x2g;1 (fyg)
PX (X = x) =
X
fx:g(x)=yg
PX (X = x) 8y 2 g(A):
Example 2.4.3:
X U (;1; 1) so the pdf of X is fX (x) = 1=2I[;1;1] (x), which, according to Denition 2.3.8,
reads as fX (x) = 1=2 for ;1 x 1 and 0 otherwise.
(
x; x 0
+
Let Y = X =
0; otherwise
36
Then,
8
>
0;
>
>
< 1=2;
FY (y) = PY (Y y) = >
1=2 + y=2;
>
>
: 1;
y<0
y=0
0<y<1
y1
This is the mixed discrete/continuous distribution from Example 2.3.7.
Note:
We need to put some conditions on g to ensure g(X ) is continuous if X is continuous and
avoid cases as in Example 2.4.3 above.
Denition 2.4.4:
For a random variable X from (
; L; P ) to (IR; B), the support of X (or P ) is any set A 2 L
for which P (A) = 1. For a continuous random variable X with pdf f , we can think of the
support of X as X = X ;1 (fx : fX (x) > 0g).
Denition 2.4.5:
Let f be a real{valued function dened on D IR; D 2 B. We say:
f is (strictly) non{decreasing if x < y =) f (x) (<) f (y) 8x; y 2 D
f is (strictly) non{increasing if x < y =) f (x) (>) f (y) 8x; y 2 D
f is monotonic on D if f is either increasing or decreasing and write f " or f #.
Lecture 12:
Theorem 2.4.6:
Mo 09/25/00
Let X be a continuous rv with pdf fX and support X. Let y = g(x) be dierentiable for all x
and either (i) g0 (x) > 0 or (ii) g0 (x) < 0 for all x.
Then, Y = g(X ) is also a continuous rv with pdf
d g;1 (y) j I (y):
fY (y) = fX (g;1 (y)) j dy
g(X)
Proof:
Part (i): g0 (x) > 0 8x 2 X
So g is strictly increasing and continuous.
Therefore, x = g;1 (y) exists and it is also strictly increasing and also dierentiable.
37
Then, from Rohatgi, page 9, Theorem 15:
d g;1 (y) = d g(x) j ;1 ;1 > 0
x=g (y)
dy
dx
We get FY (y) = PY (Y y) = PY (g(X ) y) = PX (X g;1 (y)) = FX (g;1 (y)) for y 2 g(X)
and, by dierentiation,
d g ;1 (y )
d (F (g;1 (y))) By Chain
= Rule fX (g;1 (y)) dy
fY (y) = FY0 (y) = dy
X
Part (ii): g0 (x) < 0 8x 2 X
So g is strictly decreasing and continuous.
Therefore, x = g;1 (y) exists and it is also strictly decreasing and also dierentiable.
Then, from Rohatgi, page 9, Theorem 15:
d g;1 (y) = d g(x) j ;1 ;1 < 0
x=g (y)
dy
dx
We get FY (y) = PY (Y y) = PY (g(X ) y) = PX (X g;1 (y)) = 1 ; PX (X g;1 (y)) =
1 ; FX (g;1 (y)) for y 2 g(X) and, by dierentiation,
d
d
d
By Chain Rule
0
;
1
;
1
;
1
;
1
;
1
fY (y) = FY (y) = dy (1;FX (g (y)))
=
;fX (g (y)) dy g (y) = fX (g (y)) ; dy g (y)
Since dyd g;1 (y) < 0, the negative sign will cancel out, always giving us a positive value. Hence
the need for the absolute value signs.
Combining parts (i) and (ii), we can therefore write
d g;1 (y) j I (y):
fY (y) = fX (g;1 (y)) j dy
g(X)
Note:
In Theorem 2.4.6, we can also write
fY (y) = fdg((xx))
j
dx
; y 2 g(X)
j x=g;1(y)
If g is monotonic over disjoint intervals, we can also get an expression for the pdf/cdf of
Y = g(X ) as stated in the following Theorem.
38
Theorem 2.4.7:
Let Y = g(X ) where X is a rv with pdf fX (x) on support X. Suppose there exists a partition
A0 ; A1 ; : : : ; Ak of X such that P (X 2 A0) = 0 and fX (x) is continuous on each Ai . Suppose
there exist functions g1 (x); : : : ; gk (x) dened on A1 through Ak , respectively, satisfying
(i) g(x) = gi (x) 8x 2 Ai ,
(ii) gi (x) is monotonic on Ai ,
(iii) the set Y = gi (Ai ) = fy : y = gi (x) for some x 2 Ai g is the same for each i = 1; : : : ; k,
and
(iv) gi;1 (y) has a continuous derivative on Y for each i = 1; : : : ; k.
Then,
fY (y) =
k
X
i=1
d g;1 (y) j I (y)
fX (gi;1 (y)) j dy
i
Y
Note:
Rohatgi, page 73, Theorem 4, removes condition (iii) by dening n = n(y) and x1 (y); : : : ; xn (y).
Example 2.4.8:
Let X be a rv with pdf fX (x) = 2x2 I(0;) (x).
Let Y = sin(X ). What is fY (y)?
Since sin is not monotonic on (0; ), Theorem 2.4.6 cannot be used to determine the pdf of
Y.
Two possible approaches:
Method 1: cdfs
For 0 < y < 1 we have
FY (y) = PY (Y y)
= PX (sin X y)
= PX ([0 X sin;1 (y)] or [ ; sin;1 (y) X ])
= FX (sin;1 (y)) + (1 ; FX ( ; sin;1 (y)))
39
since [0 X sin;1 (y)] and [ ; sin;1 (y) X ] are disjoint sets. Then,
fY (y) = FY0 (y)
= fX (sin;1 (y)) p 1 2 + (;1)fX ( ; sin;1 (y)) p ;1 2
1;y
1;y
= p 1 2 fX (sin;1 (y)) + fX ( ; sin;1 (y))
1;y
;1 (y)) 2( ; sin;1 (y)) !
1
2(sin
= p
+
2
2
1 ; y2
= 2 p 1 2 2
1;y
= p 2 2 I(0;1) (y)
1;y
Method 2: Use of Theorem 2.4.7
Let A1 = (0; 2 ), A2 = ( 2 ; ), and A0 = f 2 g.
Let g1;1 (y) = sin;1 (y) and g2;1 (y) = ; sin;1 (y).
It is dyd g1;1 (y) = p11;y2 = ; dyd g2;1 (y) and Y = (0; 1).
Thus, by use of Theorem 2.4.7, we get
fY (y) =
2
X
i=1
d g;1 (y) j I (y)
fX (gi;1 (y)) j dy
Y
i
;1
;1 (y)) 1
p 2 I(0;1) (y)
= 2 sin2 (y) p 1 2 I(0;1) (y) + 2( ; sin
2
1;y
1;y
= 22 p 1 2 I(0;1) (y)
1;y
= p 2 2 I(0;1) (y)
1;y
Obviously, both results are identical.
40
Lecture 13:
Theorem 2.4.9:
We 09/27/00
Let X be a rv with a continuous cdf FX (x) and let Y = FX (X ). Then, Y U (0; 1).
Proof:
We have to consider two possible cases:
(a) FX is strictly increasing, i.e., FX (x1 ) < FX (x2 ) for x1 < x2 , and
(b) FX is non{decreasing, i.e., there exists x1 < x2 and FX (x1 ) = FX (x2 ). Assume that
x1 is the inmum and x2 the supremum of those values for which FX (x1) = FX (x2 ) holds.
In (a), FX;1 (y) is uniquely dened. In (b), we dene FX;1 (y) = inf fx : FX (x) yg
Without loss of generality:
FX;1 (1) = +1 if FX (x) < 1 8x 2 IR and
FX;1 (0) = ;1 if FX (x) > 0 8x 2 IR.
For Y = FX (X ) and 0 < y < 1, we have
P (Y y)
=
F ;1 "
=
()
=
=
=
X
P (FX (X ) y)
P (FX;1 (FX (X )) FX;1 (y))
P (X FX;1(y))
FX (FX;1 (y))
y
At the endpoints, we have P (Y y) = 1 if y 1 and P (Y y) = 0 if y 0.
But why is () true? | In (a), if FX is strictly increasing and continuous, it is certainly
x = FX;1 (FX (x)).
In (b), if FX (x1 ) = FX (x2 ) for x1 < x < x2 , it may be that FX;1 (FX (x)) 6= x. But by
denition, FX;1 (FX (x)) = x1 8x 2 [x1 ; x2 ]. () holds since on [x1 ; x2 ], it is P (X x) =
P (X x1) 8x 2 [x1 ; x2 ]. The at cdf denotes FX (x2 ) ; FX (x1 ) = P (x1 < X x2 ) = 0 by
denition.
Note:
This proof also holds if there exist multiple intervals with xi < xj and FX (xi ) = FX (xj ), i.e.,
if the support of X is split in more than just 2 disjoint intervals.
41
3 Moments and Generating Functions
3.1 Expectation
Denition 3.1.1:
Let X be a real-valued rv with cdf FX and pdf fX if X is continuous (or pmf fX and support
X if X is discrete). The expected value (mean) of a measurable function g() of X is
8Z1
>
>
< ;1 g(x)fX (x)dx; if X is continuous
E (g(X )) = > X
>
: g(x)fX (x); if X is discrete
x2X
if E (j g(X ) j) < 1; otherwise E (g(X )) is undened, i.e., it does not exist.
Example:
X Cauchy; fX (x) = (1+1 x2 ) ; ;1 < x < 1:
Z1 x
dx = 1 [log(1 + x2 )]1
E (j X j) = 2
0 =1
2
1
+
x
0
So, E (X ) does not exist for the Cauchy distribution.
Theorem 3.1.2:
If E (X ) exists and a and b are nite constants, then E (aX + b) exists and equals aE (X ) + b.
Proof:
Continuous case only:
Existence:
E (j aX + b j) =
Z1
j ax + b j fX (x)dx
Z;1
1
(j a j j x j + j b j)fX (x)dx
;1Z
Z1
1
= jaj
j x j fX (x)dx+ j b j
fX (x)dx
;1
;1
= j a j E (j X j)+ j b j
Numerical Result:
< 1
E (aX + b) =
Z1
(ax + b)fX (x)dx
;1
Z1
Z1
= a
xfX (x)dx + b
fX (x)dx
;1
;1
= aE (X ) + b
42
Lecture 14:
Theorem 3.1.3:
Fr 09/29/00
If X is bounded (i.e., there exists a M; 0 < M < 1, such that P (j X j< M ) = 1), then E (X )
exists.
Denition 3.1.4:
The kth moment of X , if it exists, is mk = E (X k ).
The kth central moment of X , if it exists, is k = E ((X ; E (X ))k ).
Denition 3.1.5:
The variance of X , if it exists, is the second central moment of X , i.e.,
V ar(X ) = E ((X ; E (X ))2 ):
Theorem 3.1.6:
V ar(X ) = E (X 2 ) ; (E (X ))2 .
Proof:
V ar(X ) = E ((X ; E (X ))2 )
= E (X 2 ; 2XE (X ) + (E (X ))2 )
= E (X 2 ) ; 2E (X )E (X ) + (E (X ))2
= E (X 2 ) ; (E (X ))2
Theorem 3.1.7:
If V ar(X ) exists and a and b are nite constants, then V ar(aX + b) exists and equals
a2V ar(X ).
Proof:
Existence & Numerical Result:
;
;
V ar(aX + b) = E ((aX + b) ; E (aX + b))2 exists if E j ((aX + b) ; E (aX + b))2 j exists.
43
It holds that
=
=
Th:=
3:1:6
Th:=
3:1:2
Th:=
3:1:2
=
Th:=
3:1:6
<
E j ((aX + b) ; E (aX + b))2 j
E ((aX + b) ; E (aX + b))2
V ar(aX + b)
E ((aX + b)2 ) ; (E (aX + b))2
E (a2 X 2 + 2abX + b2 ) ; (aE (X ) + b)2
a2 E (X 2 ) + 2abE (X ) + b2 ; a2 (E (X ))2 ; 2abE (X ) ; b2
a2 (E (X 2 ) ; (E (X ))2 )
a2 V ar(X )
1 since V ar(X ) exists
Theorem 3.1.8:
If the tth moment of a rv X exists, then all moments of order 0 < s < t exist.
Proof:
Continuous case only:
Z
j x js fX (x)dx +
j x js fX (x)dx
j
x
j
1
j
x
j
>
1
Z
Z
1 fX (x)dx +
j x jt fX (x)dx
jxj1
jxj>1
P (j X j 1) + E (j X jt)
< 1
E (j X js) =
Z
Lecture 16:
We 10/04/00
Theorem 3.1.9:
If the tth moment of a rv X exists, then
t
nlim
!1 n P (j X
Proof:
Continuous case only:
Z
Z
t
1 > j x j fX (x)dx = nlim
!1
IR
=) nlim
!1
But, nlim
!1
Z
Z
jxj>n
jxj>n
jxjn
j> n) = 0:
j x jt fX (x)dx
j x jt fX (x)dx = 0
t
j x jt fX (x)dx nlim
!1 n
Z
jxj>n
t
fX (x)dx = nlim
!1 n P (j X j> n) = 0
44
Note:
t
th
The inverse is not necessarily true, i.e., if nlim
!1 n P (j X j> n) = 0, then the t moment of a
rv X does not necessarily exist. We can only approach t up to some > 0 as the following
Theorem 3.1.10 indicates.
Theorem 3.1.10:
t
Let X be a rv with a distribution such that nlim
!1 n P (j X j> n) = 0 for some t > 0. Then,
E (j X js ) < 1 8 0 < s < t:
Note:
To prove this Theorem, we need Lemma 3.1.11 and Corollary 3.1.12.
Lemma 3.1.11:
Let X be a non{negative rv with cdf F . Then,
E (X ) =
Z1
0
(1 ; FX (x))dx
(if either side exists).
Proof:
Continuous case only:
To prove that the left side implies that the right side is nite and both sides are identical, we
assume that E (X ) exists. It is
Z1
Zn
E (X ) = xfX (x)dx = nlim
!1 xfX (x)dx
0
0
Replace the expression for the right side integral using integration by parts.
Let u = x and dv = fX (x)dx, then
Zn
0
xfX (x)dx = (xF (x)) jn0 ;
= nFX (n) ; 0FX (0) ;
= nFX (n) ; n + n ;
= nFX (n) ; n +
Zn
0
Zn
0
Zn
Z
0
0
n
FX (x)dx
FX (x)dx
FX (x)dx
[1 ; FX (x)]dx
45
= n[FX (n) ; 1] +
Zn
0
= ;n[1 ; FX (n)] +
= ;nP (X > n) +
[1 ; FX (x)]dx
Zn
0
n
Z
0
X 0
[1 ; FX (x)]dx
[1 ; FX (x)]dx
= ;n1 P (jX j > n) +
Zn
0
[1 ; FX (x)]dx
1
=) E (X 1 ) = nlim
!1[;n P (jX j > n) +
Th:=3:1:9 0 +
=
Z1
0
nlim
!1
Zn
0
Zn
0
[1 ; FX (x)]dx]
[1 ; FX (x)]dx
[1 ; FX (x)]dx
Thus, the existence of E (X ) implies that
identical.
Z1
0
[1 ; FX (x)]dx is nite and that both sides are
We still have to show the converse implication:
Z1
If
[1 ; FX (x)]dx is nite, then E (X ) exists, i.e., E (j X j) = E (X ) < 1, and both sides
0
are identical. It is
Zn
0
xfX (x)dx X=0
Zn
j x j fX (x)dx = ;n[1 ; FX (n)] +
0
as seen above.
Since ;n[1 ; FX (n)] 0, we get
Zn
0
Thus,
j x j fX (x)dx Zn
nlim
!1 0
Zn
0
j x j fX (x) =
=) E (X ) exists and is identical to
[1 ; FX (x)]dx Z1
0
Z1
0
Z1
0
j x j fX (x)dx Zn
0
[1 ; FX (x)]dx < 1 8n
Z1
0
[1 ; FX (x)]dx < 1
[1 ; FX (x)]dx as seen above.
46
[1 ; FX (x)]dx
Corollary 3.1.12:
Proof:
Let
E (j X js) = s
E (j X js ) Lemma= 3:1:11
z = ys .
Then
dz
dy
Z1
0
= sys;1
P (j X
Z1
0
Z1
0
ys;1P (j X j> y)dy
[1 ; FjX js (z )]dz =
and dz = sys;1dy.
js>
z)dz
=
Therefore,
Z1
=
s
monotonic "
s
=
Z1
0Z
P (j X js > ys )sys;1dy
1 s;1
y P (j X js > ys )dy
Z01
0
0
P (j X js > z)dz
ys;1P (j X j> y)dy
Proof (of Theorem 3.1.10):
For any given > 0, choose
N such that the tail probability P (j X j> n) < nt 8n N .
Z
1
E (j X js ) Cor:=3:1:12 s ys;1 P (j X j> y)dy
=s
ZN
0
N
Z
0
0
ys;1P (j X
Z1
j> y)dy + s ys;1P (j X j> y)dy
N
Z1
sys;1 1 dy + s
= ys jN0 + s
= N s + s
Z1
N
Z1
N
N
ys;1 yt dy
ys;1 y1t dy
ys;1;t dy
It is
Z1
N
ycdy
(
1 c+1 1
c+1 y jN ;
ln y j1
N;
c 6= ;1
c = ;1
(
1;
c ;1
=
1
c
+1
; c+1 N < 1; c < ;1
=
Thus, for E (j X js ) < 1, it must hold that s ; 1 ; t < ;1, or equivalently, s < t. So
E (j X js ) < 1, i.e., it exists, for every s with 0 < s < t for a rv X with a distribution such
t
that nlim
!1 n P (j X j> n) = 0 for some t > 0.
47
Theorem 3.1.13:
Let X be a rv such that
Lecture 17:
Fr 10/06/00
k) = 0 8 > 1:
lim PP((j jXXj>
j> k)
k!1
Then, all moments of X exist.
Proof:
For > 0; we select some k0 such that
P (j X j> k) < 8k k :
0
P (j X j> k)
Select k1 such that P (j X j> k) < 8k k1 :
Select N = max(k0 ; k1 ).
If we have some xed positive integer r:
P (j X j> r k) = P (j X j> k) P (j X j> 2k) P (j X j> 3 k) : : : P (j X j> r k)
P (j X j> k)
P (j X j> k) P (j X j> k) P (j X j> 2 k)
P (j X j> r;1k)
k) P (j X j> (k)) P (j X j> (2 k)) : : : P (j X j> (r;1 k))
= PP((j jXX j>
j> k) P (j X j> 1 (k)) P (j X j> 1 (2 k))
P (j X j> 1 (r;1 k))
Note: Each of these r terms on the right side is < by our original statement of selecting
)
n
some k0 such that PP((jXjXj>k
j>k) < 8k k0 and since > 1 and therefore k k0 .
k)
r
Now we get for our entire expression that PP(j(XjXj>
j>k) for k N (since in this case also
k k0 ) and > 1:
Overall, we have P (j X j> r k) r P (j X j> k) r+1 for k N (since in this case also
k k1 ).
For a xed positive integer n:
r
E(j X
jn ) Cor:=3:1:12
We know that:
n
but is
Z1
n
ZN
0
0
xn;1P(j X
xn;1 P (j X
j> x)dx = n
ZN
0
xn;1 P(j X
ZN
Z1
j> x)dx + n xn;1P(j X j> x)dx
N
j> x)dx nxn;1dx = xn jN0 = N n < 1
0
Z1
n xn;1P (j X j> x)dx < 1 ?
N
48
To check the second part, we use:
Z1
1 Z N
X
n
;
1
x P (j X j> x)dx =
xn;1 P (j X j> x)dx
r
r=1r;1 N
N
We know that:
Zr N
r;1 N
xn;1 P (j X
j> x)dx r
Zr N
r;1 N
xn;1 dx
This step is possible since r P (j X j r;1 N ) P (j X j> x) P (j X j r N )
8x 2 (r;1 N; r N ) and N = max(k0 ; k1 ).
Since (r;1 N )n;1 xn;1 (r N )n;1 8x 2 (r;1N; r N ), we get:
Z N
Z N
r
n
;
1
r
r
n
;
1
x dx ( N )
1dx r (r N )n;1 (r N ) = r (r N )n
r
r
r;1 N
r;1 N
Now we go back to our original inequality:
Z1
N
xn;1P (j X j> x)dx n n
1
X
r
Zr N
r=1 r;1 N
xn;1 dx 1
X
r=1
r (r N )n = N n
n n
1
n
= 1N; n if < 1 or, equivalently, if < n
n
Since 1N; n is nite, all moments E (j X j ) exist.
49
1
X
r=1
( n )r
Lecture 18:
Mo 10/09/00
3.2 Generating Functions
Denition 3.2.1:
Let X be a rv with cdf FX . The moment generating function (mgf) of X is dened as
MX (t) = E (etX )
provided that this expectation exists in an (open) interval around 0, i.e., for ;h < t < h for
some h > 0.
Theorem 3.2.2:
If a rv X has a mgf MX (t) that exists for ;h < t < h for some h > 0, then
n
E (X n ) = MX(n) (0) = dtd n MX (t) jt=0 :
Proof:
We assume that we can dierentiate under the integral sign. If, and when, this really is true
will be discussed later in this section.
d M (t) = d Z 1 etx f (x)dx
X
dt X
Zdt1 ;1@
=
( @t etx fX (x))dx
;1
Z1
=
xetx fX (x)dx
;1
= E (XetX )
Evaluating this at t = 0, we get: dtd MX (t) jt=0 = E (X )
By induction, we get for n 2:
!
dn M (t) = d dn;1 M (t)
dtn X
dt dtn;1 X
Z 1
= dtd
xn;1 etx fX (x)dx
Z 1 ;1
@ xn;1 etxf (x))dx
=
( @t
X
Z;1
1
xn etx fX (x)dx
= E (X n etX )
=
;1
Evaluating this at t = 0, we get: dtdnn MX (t) jt=0 = E (X n )
50
Example 3.2.3:
X U (a; b), where a < b; fX (x) = b;1 a I[a;b](x).
Then,
MX (t)
=
Z b etx
etb ; eta
dx
=
t(b ; a)
a b;a
MX (0)
=
0
0
=
betb ; aeta
b;a
=
1
L'Hospital
t=0
So MX (0) = 1 and since et(b ;; ea) is continuous, it also exists in an open interval around 0 (in
fact, it exists for every t 2 IR).
tb
MX0 (t)
ta
=
=
=) E (X )
=
(betb ; aeta )t(b ; a) ; (etb ; eta )(b ; a)
t2(b ; a)2
t(betb ; aeta ) ; (etb ; eta )
t2(b ; a)
MX0 (0) = 00
=
betb ; aeta + tb2 etb ; ta2 eta ; betb + aeta
2t(b ; a)
=
tb2 etb ; ta2eta
2t(b ; a)
L'Hospital
=
=
=
b2 etb ; a2 eta
2(b ; a)
b2 ; a2
2(b ; a)
b+a
2
51
t=0
t=0
t=0
Note:
In the previous example, we made use of L'Hospital's rule. This rule gives conditions under
1 ".
which we can resolve indenite expressions of the type \ 00 " and \ 1
(i) Let f and g be functions that are dierentiable in an open interval around x0 , say
in (x0 ; ; x0 + ); but not necessarily dierentiable in x0 . Let f (x0 ) = g(x0 ) = 0
f 0(x) = A implies that also
and g0 (x) 6= 0 8x 2 (x0 ; ; x0 + ) ; fx0 g. Then, xlim
!x0 g0 (x)
f (x)
+
xlim
!x0 g(x) = A. The same holds for the cases xlim
!x0 f (x) = xlim
!x0 g(x) = 1 and x ! x0
or x ! x;0 .
(ii) Let f and g be functions that are dierentiable for x > a (a > 0). Let
0
0 (x) 6= 0. Then, lim f (x) = A implies that
lim
f
(
x
)
=
lim
g
(
x
)
=
0
and
lim
g
x!1
x!1
x!1
x!1 g0 (x)
f (x)
also xlim
!1 g(x) = A.
(iii) We can iterate this process as long as the required conditions are met and derivatives
exist, e.g., if the rst derivatives still result in an indenite expression, we can look at
the second derivatives, then at the third derivatives, and so on.
(iv) It is recommended to keep expressions as simple as possible. If we have identical factors
in the numerator and denominator, we can exclude them from both and continue with
the simpler functions.
(v) Indenite expressions of the form \0 1" can be handled by rearranging them to \ 1=01 "
f (;x)
and x!;1
lim fg((xx)) can be handled by use of the rules for xlim
!1 g(;x) .
Lecture 19:
Note:
We 10/11/00
The following Theorems provide us with rules that tell us when we can dierentiate under
the integral sign. Theorem 3.2.4 relates to nite integral bounds a() and b() and Theorems
3.2.5 and 3.2.6 to innite bounds.
Theorem 3.2.4: Leibnitz's Rule
If f (x; ); a(), and b() are dierentiable with respect to (for all x) and ;1 < a() <
b() < 1, then
d Z b() f (x; )dx = f (b(); ) d b() ; f (a(); ) d a() + Z b() @ f (x; )dx:
d a()
d
d
a() @
The rst 2 terms are vanishing if a() and b() are constant in .
52
Proof:
Uses the Fundamental Theorem of Calculus and the chain rule.
Theorem 3.2.5: Lebesque's Dominated Convergence Theorem
Z1
Let g be an integrable function such that
g(x)dx < 1. If j fn j g almost everywhere
;1
(i.e., except for a set of Borel{measure 0) and if fn ! f almost everywhere, then fn and f
are integrable and
Z1
Z1
fn(x)dx !
f (x)dx:
;1
;1
Note:
If f is dierentiable with respect to , then
@ f (x; ) = lim f (x; + ) ; f (x; )
!0
@
and
Z 1 f (x; + ) ; f (x; )
Z1 @
f
(
x;
)
dx
=
lim
dx
;1 !0
;1 @
Z 1 f (x; + ) ; f (x; )
@ Z1
while
!0 ;1
@ ;1 f (x; )dx = lim
dx
Theorem 3.2.6:
Let fn (x; Z0 ) = f (x;0 +nn);f (x;0 ) for some 0 . Suppose there exists an integrable function g(x)
1
such that
g(x)dx < 1 and j fn(x; ) j g(x) 8x, then
;1
Z1@
d ;1 f (x; )dx =0 = ;1 @ f (x; ) j=0 dx:
d Z1
Usually, if f is dierentiable for all , we write
d Z 1 f (x; )dx = Z 1 @ f (x; )dx:
d ;1
;1 @
Corollary 3.2.7:
Let fZ(x;1) be dierentiable for
@all . Suppose there exists an integrable function g(x; ) such
that
g(x; )dx < 1 and @ f (x; ) j=0 g(x; ) 8x 80 in some {neighborhood of ,
;1
then
d Z 1 f (x; )dx = Z 1 @ f (x; )dx:
d
@
;1
;1
53
More on Moment Generating Functions
@
etxfX (x) jt=t0 =j x j et0 xfX (x) for j t0 ; t j 0 :
@t
Choose t; 0 small enough such that t + 0 2 (;h; h) and t ; 0 2 (;h; h), or, equivalently,
j t + 0 j< h and j t ; 0 j< h. Then,
@
etxfX (x) jt=t0 g(x; t)
Consider
@t
where
g(x; t) =
R
(
j x j e(t+0 )xfX (x); x 0
j x j e(t;0 )xfX (x); x < 0
To verify g(x; t)dx < 1, we need to know fX (x).
Suppose mgf MX (t) exists for j t j h for some h > 1, where h ;1 h. Then j t+0 +1 j< h
and j t ; 0 ; 1 j< h . Since j x j ejxj 8x, we get
(
(t+0 +1)x
g(x; t) e(t;0 ;1)xfX (x); x 0
e
fX (x); x < 0
Z1
Z0
g(x; t)dx MX (t + 0 + 1) < 1 and
g(x; t)dx MX (t ; 0 ; 1) < 1 and,
0 Z1
;1
therefore,
g(x)dx < 1.
Then,
;1
Together with Corollary 3.2.7, this establishes that we can dierentiate under the integral in
the Proof of Theorem 3.2.2.
If h 1, we may need to check more carefully to see if the condition holds.
Note:
If MX (t) exists for t 2 (;h; h), then we have an innite collection of moments.
Does a collection of integer moments fmk : k = 1; 2; 3; : : :g completely characterize the distribution, i.e., cdf, of X ? | Unfortunately not, as Example 3.2.8 shows.
Example 3.2.8:
Let X1 and X2 be rv's with pdfs
fX1 (x) = p1 x1 exp(; 21 (log x)2 ) I(0;1) (x)
2
and
fX2 (x) = fX1 (x) (1 + sin(2 log x)) I(0;1)(x)
54
It is E (X1r ) = E (X2r ) = er2 =2 for r = 0; 1; 2; : : : as you have to show in the Homeworks.
Two dierent pdfs/cdfs have the same moment sequence! What went wrong? In this example,
MX1 (t) does not exist as shown in the Homeworks!
Theorem 3.2.9:
Let X and Y be 2 rv's with cdf's FX and FY for which all moments exist.
(i) If FX and FY have bounded support, then FX (u) = FY (u) 8u i E (X r ) = E (Y r ) for
r = 0; 1; 2; : : : .
(ii) If both mgf's exist, i.e., MX (t) = MY (t) for t in some neighborhood of 0, then FX (u) =
FY (u) 8u.
Note:
The existence of moments is not equivalent to the existence of a mgf as seen in Example 3.2.8
above and some of the Homework assignments.
Theorem 3.2.10:
Suppose rv's fXi g1
M (t) = MX (t) 8t 2 (;h; h) for some
i=1 have mgf's MXi (t) and that ilim
!1 Xi
h > 0 and that MX (t) itself is a mgf. Then, there exists a cdf FX whose moments are determined by MX (t) and for all continuity points x of FX (x) it holds that ilim
F (x) = FX (x),
!1 Xi
i.e., the convergence of mgf's implies the convergence of cdf's.
Proof:
Uniqueness of Laplace transformations, etc.
Theorem 3.2.11:
For constants a and b, the mgf of Y = aX + b is
MY (t) = ebt MX (at);
given that MX (t) exists.
Proof:
MY (t) = E (e(aX +b)t )
= E (eaXt ebt )
= ebt E (eXat )
= ebt MX (at)
55
Lecture 20:
Fr 10/13/00
3.3 Complex{Valued Random Variables and Characteristic Functions
Recall the following facts regarding complex numbers:
p
i0 = +1; i = ;1; i2 = ;1; i3 = ;i; i4 = +1; etc.
in the planar Gauss'ian number plane it holds that i = (0; 1)
z = a + ib = r(cos + i sin )
p
r =j z j= a2 + b2
tan = ab
Euler's Relation: z = r(cos + i sin ) = rei
Mathematical Operations on Complex Numbers:
z1 z2 = (a1 a2) + i(b1 b2 )
z1 z2 = r1 r2ei(1 +2) = r1 r2 (cos(1 + 2 ) + i sin(1 + 2 ))
z1 r1 i(1 ;2 ) = r1 (cos(1 ; 2 ) + i sin(1 ; 2 ))
z2 = r2 e
r2
Moivre's Theorem: z n = (r(cos + i sin ))n = rn(cos(n) + i sin(n))
pn z = pn a + ib = pn r cos( +k3600 ) + i sin( +k3600 ) for k = 0; 1; : : : ; (n ; 1) and the main
n
n
value for k = 0
ln z = ln(a + ib) = ln(j z j) + i i2n where = arctan ab and the main value for n = 0
Conjugate Complex Numbers:
For z = a + ib, we dene the conjugate complex number z = a ; ib. It holds:
z=z
z = z i z 2 IR
z1 z2 = z1 z2
z1 z2 = z1 z2
z1 z1
z2 = z2
z z = a2 + b2
Re(z ) = a = 12 (z + z )
Im(z ) = b = 21i (z ; z )
p
p
j z j= a2 + b2 = z z
56
Denition 3.3.1:
Let (
; L; P ) be a probability space and X and Y real{valued rv's, i.e., X; Y : (
; L) ! (IR; B)
(i) Z = X + iY : (
; L) ! (CI; BCI) is called a complex{valued random variable (CI{rv).
(ii) If E (X ) and E (Y ) exist, then E (Z ) is dened as E (Z ) = E (X ) + iE (Y ) 2 CI.
Note:
E (Z ) exists i E (j X j) and E (j Y j) exist. It also holds that if E (Z ) exists, then j E (Z ) j
E (j Z j) (see Homework).
Denition 3.3.2:
Let X be a real{valued rv on (
; L; P ). Then, X (t) : IR ! CI with X (t) = E (eitX ) is called
the characteristic function of X .
Note:
(i) X (t) =
Z1
;1
eitx fX (x)dx =
Z1
if X is continuous.
(ii) X (t) =
X
x2X
eitx P (X = x) =
;1
X
x2X
cos(tx)fX (x)dx + i
Z1
;1
cos(tx)P (X = x) + i
if X is discrete and X is the support of X .
sin(tx)fX (x)dx
X
x2X
sin(tx)P (X = x)(x)
(iii) X (t) exists for all real{valued rv's X since j eitx j= 1.
Theorem 3.3.3:
Let X be the characteristic function of a real{valued rv X . Then it holds:
(i) X (0) = 1.
(ii) j X (t) j 1 8t 2 IR.
(iii) X is uniformly continuous, i.e., 8 > 0 9 > 0 8t1; t2 2 IR :j t1 ; t2 j< )j (t1 ) ;
(t2 ) j< .
(iv) X is a positive denite function, i.e., 8n 2 IN 81 ; : : : ; n 2 CI 8t1 ; : : : ; tn 2 IR :
n X
n
X
l=1 j =1
l j X (tl ; tj ) 0.
57
(v) X (t) = X (;t).
(vi) If X is symmetric around 0, i.e., if X has a pdf that is symmetric around 0, then
X (t) 2 IR 8t 2 IR.
(vii) aX +b (t) = eitb X (at).
Proof:
See Homework for parts (i), (ii), (iv), (v), (vi), and (vii).
Part (iii):
Known conditions:
(i) Let > 0.
(ii) 9 a > 0 : P (;a < X < +a) > 1 ; 4 and P (j X j> a) < 4
(iii) 9 > 0 : j e{(t0 ;t)x ; 1 j< 2 8x s.t. j x j< a and 8(t0 ; t) s.t. 0 < (t0 ; t) < .
This third condition holds since j e{0 ; 1 j= 0 and the exponential function is continuous.
Therefore, if we select (t0 ; t) and x small enough, j e{(t0 ;t)x ; 1 j will be < 2 for a given
.
Let t; t0 2 IR, t < t0 , and t0 ; t < . Then,
j X (t0) ; X (t) j =
= j
= j
j
Z +1
;1
Z ;a
;1
Z ;a
;1
j
Z +1
;1
Z +1
0x
{t
e fX (x)dx ;
e{tx fX (x)dx j
;1
(e{t0 x ; e{tx )fX (x)dx j
(e{t0 x ; e{tx )fX (x)dx +
Z +a
;a
(e{t0 x ; e{tx )fX (x)dx j + j
+j
Z +1
+a
(e{t0 x ; e{tx )fX (x)dx +
Z +a
;a
Z +1
+a
(e{t0 x ; e{tx )fX (x)dx j
(e{t0 x ; e{tx )fX (x)dx j
58
(e{t0 x ; e{tx )fX (x)dx j
We now take a closer look at the rst and third of these absolute integrals. It is:
j
Z ;a
;1
(e{t0 x ; e{tx )fX (x)dx j
Z ;a
j
Z ;a
=
e{t0 xf
X (x)dx j + j
j e{t0 x j fX (x)dx +
;1
(A)
=
;1
= j
Z ;a
;1
Z ;a
;1
1fX (x)dx +
Z ;a
Z ;a
;1
;1
Z ;a
;1
Z ;a
;1
e{t0 x fX (x)dx ;
Z ;a
;1
e{tx fX (x)dx j
e{tx fX (x)dx j
j e{tx j fX (x)dx
1fX (x)dx
2fX (x)dx.
(A) holds due to Note (iii) that follows Denition 3.3.2.
Similarly,
j
Z +1
+a
(e{t0 x ; e{tx )fX (x)dx j
Z +1
+a
2fX (x)dx
Returning to the main part of the proof, we get
j X (t0) ; X (t) j
=2
Z ;a
;1
Z ;a
;1
fX (x)dx +
= 2P (j X j> a) + j
Condition (ii)
2 4 + j
2fX (x)dx + j
Z +1
+a
Z +a
;a
Z +a
;a
Z +a
Z +1
0x
{t
{tx
(e ; e )fX (x)dx j +
2fX (x)dx
;a
+a
fX (x)dx + j
Z +a
;a
(e{t0 x ; e{tx )fX (x)dx j
(e{t0 x ; e{tx )fX (x)dx j
Z +a
= 2 +j
e{tx (e{(t0 ;t)x ; 1)fX (x)dx j
;a
Z +a
j e{tx (e{(t0 ;t)x ; 1) j fX (x)dx
2+
;a
Z +a
2+
j e{tx j j (e{(t0 ;t)x ; 1) j fX (x)dx
;a
59
(e{t0 x ; e{tx )fX (x)dx j
Lecture 21:
Mo 10/16/00
Z +a 1 2 fX (x)dx
2+
(B )
;a
Z +1 2+
2 fX (x)dx
;1
= 2 + 2
=
(B) holds due to Note (iii) that follows Denition 3.3.2 and due to condition (iii).
Theorem 3.3.4: Bochner's Theorem
Let : IR ! CI be any function with properties (i), (ii), (iii), and (iv) from Theorem 3.3.3.
Then there exists a real{valued rv X with X = .
Theorem 3.3.5:
Let X be a real{valued rv and E (X k ) exists for an integer k. Then, X is k times dierentiable and (Xk) (t) = ik E (X k eitX ). In particular for t = 0, it is (Xk) (0) = ik mk .
Theorem 3.3.6:
Let X be a real{valued rv with characteristic function X and let X be k times dierentiable,
where k is an even integer. Then the kth moment of X , mk , exists and it is (Xk) (0) = ik mk .
Theorem 3.3.7: Levy's Theorem
Let X be a real{valued rv with cdf FX and characteristic function X . Let a; b 2 IR, a < b.
If P (X = a) = P (X = b) = 0, i.e., FX is continuous in a and b, then
F (b) ; F (a) = 21
Z 1 e;ita ; e;itb
X (t)dt:
it
;1
Theorem 3.3.8:
Let X and Y be a real{valued rv with characteristic functions X and Y . If X = Y , then
X and Y are identically distributed.
60
Theorem 3.3.9:
Z1
Let X be a real{valued rv with characteristic function X such that
j X (t) j dt < 1.
;1
Then X has pdf
Z1
fX (x) = 21
e;itx X (t)dt:
;1
Theorem 3.3.10:
Let X be a real{valued rv with mgf MX (t), i.e., the mgf exists. Then X (t) = MX (it).
Theorem 3.3.11:
1
1
Suppose real{valued rv's fXi g1
i=1 have cdf's fFXi gi=1 and characteristic functions fXi (t)gi=1 .
If ilim
(t) = X (t) 8t 2 (;h; h) for some h > 0 and X (t) is itself a characteristic func!1 Xi
tion (of a rv X with cdf FX ), then ilim
F (x) = FX (x) for all continuity points x of FX (x),
!1 Xi
i.e., the convergence of characteristic functions implies the convergence of cdf's.
Theorem 3.3.12:
Characteristic functions for some well{known distributions:
Distribution
X (t)
(i) X Dirac(c)
eitc
(ii) X Bin(1; p) 1 + p(eit ; 1)
(iii) X Poisson(c) exp(c(eit ; 1))
eitb ;eita
(iv) X U (a; b)
(b;a)it
(v) X N (0; 1)
exp(;t2 =2)
(vi) X N (; 2 ) eit exp(;2 t2 =2)
(1 ; itq );p
(vii) X ;(p; q)
(1 ; itc );1
(viii) X Exp(c)
(1 ; 2it);n=2
(ix)
X 2n
Proof:
(i) X (t) = E (eitX ) = eitc P (X = c) = eitc
(ii) X (t) =
1
X
k=0
eitk P (X = k) = eit0 (1 ; p) + eit1 p = 1 + p(eit ; 1)
61
(iii) X (t) =
since
1
n
X
eitn cn! e;c = e;c n1! (c eit )n = e;c ece = ec(e ;1)
X
it
n2IN 0
1
X xn
n=0
= ex
n
!
n=0
1
(iv) X (t) = b;a
it
Zb
a
eitx dx =
"
#
eitx b = eitb ; eita
b ; a it a (b ; a)it
1
(v) X N (0; 1) is symmetric around 0
=) X (t) is real since there is no imaginary part according to Theorem 3.3.3 (vi)
=) X (t) = p12
Z1
;1
cos(tx)e
;x2
2
dx
Since E (X ) exists, X (t) is dierentiable according to Theorem 3.3.5 and the following
holds:
0X (t) = Re(0X (t))
0
Z1
B
= Re @
ix
;1
=
=
=
=
1
p1 e dxC
eitx
A
|{z}
2
cos(tx)+i sin(tx)
Z 1
Z1
2
2 Re
ix cos(tx) p1 e ;2 dx +
;x sin(tx) p1 e ;2 dx
2
2
;1
;1
Z
1
;2
0 = ;t cos(tx) and v = ;e ;2 2
2 dx
p1
(|; sin(
tx
))
xe
j
u
{z } | {zv0 }
2 ;1
u
Z1
2
;2
p1 sin(tx)e ;2 j1;1 ; p1
(;t cos(tx))(;e 2 )dx
| 2 {z
} 2 ;1
=0 since sin is odd
Z1
;2
1
;t p
cos(tx)e 2 dx
2 ;1
= ;tX (t)
; x2
2
x
x
x
x
x
x
x
0
Thus, 0X (t) = ;tX (t). It follows that XX ((tt)) = ;t and by integrating both sides, we
get ln j X (t) j= ; 12 t2 + c with c 2 IR.
For t = 0, we know that X (0) = 1 by Theorem 3.3.3 (i) and ln j X (0) j= 0. It follows
that 0 = 0 + c. Therefore, c = 0 and j X (t) j= e; 12 t2 .
If we take t = 0, then X (0) = 1 by Theorem 3.3.3 (i). Since X is uniformly continuous,
X must take the value 0 before it can eventually take a negative value. However, since
e; 21 t2 > 0 8t 2 IR, X cannot take 0 as a possible value and therefore cannot pass into
the negative numbers. So, it must hold that X (t) = e; 21 t2 8t 2 IR:
62
Lecture 22:
We 10/18/00
(vi) For > 0; 2 IR, we know that if X N (0; 1), then X + N (; 2 ). By Theorem
3.3.3 (vii) we have
1
X + (t) = eit X (t) = eit e; 2 2 t2 :
(vii)
X (t) =
Z1
0
eitx (p; q; x)dx
Z 1 qp
xp;1e;(q;it)x dx
=
0 ;(p)
Z1
p
= ;(q p) (q ; it);p ((q ; it)x)p;1 e;(q;it)x (q ; it)dx j u = (q ; it)x; du = (q ; it)dx
0
Z
p
1
= ;(q p) (q ; it);p (u)p;1 e;u du
| 0 {z
}
=;(p)
= qp(q ; it);p
;p
= (q ; it)
q;p
;p
= q ;q it
;p
= 1 ; itq
Lecture 23:
(viii) Since an Exp(c) distribution is a ;(1; c) distribution, we get for X Exp(c) = ;(1; c): Fr 10/20/00
X (t) = (1 ; itc );1
(ix) Since a 2n distribution (for n 2 IN ) is a ;( n2 ; 21 ) distribution, we get for X 2n =
;( n2 ; 12 ):
X (t) = (1 ; 1it=2 );n=2 = (1 ; 2it);n=2
63
Example 3.3.13:
Since we know that m1 = E (X ) and m2 = E (X 2 ) exist for X Bin(1; p), we can determine
these moments according to Theorem 3.3.5 using the characteristic function.
It is
X (t) = 1 + p(eit ; 1)
0X (t) = pieit
0X (0) = pi
0
=) m1 = Xi(0) = pii = p = E (X )
00X (t) = pi2 eit
00X (0) = pi2
00
2
=) m2 = Xi2(0) = pii2 = p = E (X 2 )
=) V ar(X ) = E (X 2 ) ; (E (X ))2 = p ; p2 = p(1 ; p)
Note:
Z1
The restriction
j X (t) j dt < 1 in Theorem 3.3.9 works in such a way that we don't
;1
end up with a (non{existing) pdf if X is a discrete rv. For example,
X Dirac(c):
Z1
;1
j X (t) j dt =
Z1
j eitc j dt
Z;1
1
=
1dt
;1
= x j1
;1
which is undened.
Also for X Bin(1; p):
Z1
Z1
j X (t) j dt =
j 1 + p(eit ; 1) j dt
;1
;1
Z1
=
j peit ; (p ; 1) j dt
;1
Z1
j j peit j ; j (p ; 1) j j dt
;1
Z1
Z1
j peit j dt ;
j (p ; 1) j dt
;1
;1
64
= p
Z1
;1
1dt ; (1 ; p)
Z1
= (2p ; 1)
1 dt
;1
= (2p ; 1)x j1
;1
which is undened for p 6= 1=2.
If p = 1=2, we have
Z1
;1
j peit ; (p ; 1) j dt = 1=2
= 1=2
Z1
Z;1
1
Z1
;1
1 dt
j eit + 1 j dt
j cos t + i sin t + 1 j dt
Z;1
1q
= 1=2
(cos t + 1)2 + (sin t)2 dt
;1
Z1q
= 1 =2
cos2 t + 2 cos t + 1 + sin2 t dt
Z;1
1p
= 1 =2
2 + 2 cos t dt
;1
which also does not exist.
Otherwise, X N (0; 1):
Z1
;1
j X (t) j dt =
Z1
exp(;t2 =2)dt
p 1 1
p exp(;t2 =2)dt
= 2
2
;1
p
= 2
;1 Z
< 1
65
3.4 Probability Generating Functions
Denition 3.4.1:
Let X be a discrete rv which only takes non{negative integer values, i.e., pk = P (X = k), and
1
X
pk = 1. Then, the probability generating function (pgf) of X is dened as
k=0
G(s) =
1
X
k=0
pk sk :
Theorem 3.4.2:
G(s) converges for j s j< 1.
Proof: 1
1
X
X
j G(s) j j pk sk j j pk j= 1
k=0
k=0
Theorem 3.4.3:
Let X be a discrete rv which only takes non{negative integer values and has pgf G(s). Then
it holds:
dk G(s) j
P (X = k) = k1! ds
s=0
k
Theorem 3.4.4:
Let X be a discrete rv which only takes non{negative integer values and has pgf G(s). If
E (X ) exists, then it holds:
d G(s) j
E (X ) = ds
s=1
Denition 3.4.5:
The kth factorial moment of X is dened as
E [X (X ; 1)(X ; 2) : : : (X ; k + 1)]
if this expectation exists.
66
Theorem 3.4.6:
Let X be a discrete rv which only takes non{negative integer values and has pgf G(s). If
E [X (X ; 1)(X ; 2) : : : (X ; k + 1)] exists, then it holds:
dk G(s) j
E [X (X ; 1)(X ; 2) : : : (X ; k + 1)] = ds
s=1
k
Note:
Similar to the Cauchy distribution for the continuous case, there exist discrete distributions
where the mean (or higher moments) do not exist. See Homework.
67
Lecture 24:
Mo 10/23/00
3.5 Moment Inequalities
Theorem 3.5.1:
Let h(X ) be a non{negative Borel{measurable function of a rv X . If E (h(X )) exists, then it
holds:
P (h(X ) ) E (h(X )) 8 > 0
Proof:
Continuous case only:
E (h(X )) =
=
Z1
Z;1
ZA
ZA
h(x)fX (x)dx
h(x)fX (x)dx +
h(x)fX (x)dx
Z
AC
h(x)fX (x)dx j where A = fx : h(x) g
fX (x)dx
= P (h(X ) ) 8 > 0
A
Therefore, P (h(X ) ) E (h(X )) 8 > 0.
Corollary 3.5.2: Markov's Inequality
Let h(X ) =j X jr and = kr where r > 0 and k > 0. If E (j X jr ) exists, then it holds:
r
P (j X j k) E (j X j )
kr
Proof:
Since P (j X j k) = P (j X jr kr ) for k > 0, it follows using Theorem 3.5.1:
r
Th:3:5:1
P (j X j k) = P (j X jr kr ) E (j kXr j )
Corollary 3.5.3: Chebychev's Inequality
Let h(X ) = (X ; )2 and = k2 2 where E (X ) = , V ar(X ) = 2 < 1, and k > 0. Then it
holds:
P (j X ; j> k) k12
68
Proof:
Since P (j X ; j> k) = P (j X ; j2 > k2 2 ) for k > 0, it follows using Theorem 3.5.1:
Th:3:5:1
j2 ) = V ar(X ) = 2 = 1
P (j X ; j> k) = P (j X ; j2 > k2 2 ) E (j Xk2;
2
k2 2
k2 2 k2
Theorem 3.5.4: Lyapunov's Inequality
Let 0 < n = E (j X jn ) < 1. For arbitrary k such that 2 k n, it holds that
1
1
(k;1 ) k;1 (k ) k ;
1
i.e., (E (j X jk;1 )) k;1 (E (j X jk )) k1 .
Proof:
Continuous case only:
k+1
k;1
Let Q(u; v) = E (u j X j 2 +v j X j 2 )2 . Obviously, Q(u; v) 0 8u; v 2 IR. Also,
Q(u; v) =
Z1
;1
= u2
(u j x j
Z1
;1
k
;1
2
+v j x j
k
+1 2
2 )
fX (x)dx
j x jk;1 fX (x)dx + 2uv
Z1
;1
= u2 k;1 + 2uvk + v2 k+1
j x jk fX (x)dx + v2
Z1
;1
j x jk+1 fX (x)dx
0 8u; v 2 IR
Using the fact that Ax2 + 2Bxy + Cy2 0 8x; y 2 IR i A > 0 and AC ; B 2 > 0 (see
Rohatgi, page 6, Section P2.4), we get with A = k;1 ; B = k , and C = k+1 :
k;1 k+1 ; k2 0
=) k2 k;1 k+1
=) k2k kk;1 kk+1
This means that 12 0 2 , 24 12 32 , 36 23 43 , and so on. Multiplying these, we get:
kY
;1
j =1
j2j kY
;1
j =1
jj;1 jj+1
= (0 2 )(12 32 )(23 43 )(34 54 ) : : : (kk;;32 kk;;12 )(kk;;21 kk;1 )
= 0 kk;;12 kk;1
kY
;2
j =1
j2j
69
Dividing both sides by
kY
;2
j =1
j2j , we get:
k2;k;1 2 0 kk;1 kk;;12
0 =1 k
=) k;1
kk;1
;1
=) k1;1 k k
k
1
1
=) kk;;11 kk
70
Lecture 25:
We 10/25/00
4 Random Vectors
4.1 Joint, Marginal, and Conditional Distributions
Denition 4.1.1:
The vector X = (X1 ; : : : ; Xn )0 on (
; L; P ) ! IRn dened by X (!) = (X1 (!); : : : ; Xn (!))0 ; ! 2
, is an n{dimensional random vector (n{rv) if X ;1 (I ) = f! : X1 (!) a1 ; : : : ; Xn (!) ang 2 L for all n{dimensional intervals I = f(x1 ; : : : ; xn) : ;1 < xi ai; ai 2 IR 8i =
1; : : : ; ng.
Note:
It follows that if X1 ; : : : ; Xn are any n rv's on (
; L; P ), then X = (X1 ; : : : ; Xn )0 is an n{rv
on (
; L; P ) since for any I , it holds:
X ;1 (I ) = f! : (X1 (!); : : : ; Xn (!)) 2 I g
= f! : X1 (!) a1 ; : : : ; Xn (!) an g
=
\n
f| ! : Xk ({z!) ak g}
k=1
|
{z2L
}
2L
Denition 4.1.2:
For an n{rv X , a function F dened by
F (x) = P (X x) = P (X1 x1 ; : : : ; Xn xn ) 8x 2 IRn
is the joint cumulative distribution function (joint cdf) of X .
Note:
(i) F is non{decreasing and right{continuous in each of its arguments xi .
(ii) xlim
!1 F (x) = x1 !1lim
;:::xn !1 F (x) = 1 and xklim
!;1 F (x) = 0 8x1 ; : : : ; xk;1 ; xk+1 ; : : : ; xn 2
IR.
However, conditions (i) and (ii) together are not sucient for F to be a joint cdf. Instead we
need the conditions from the next Theorem.
71
Theorem 4.1.3:
A function F (x) = F (x1 ; : : : ; xn ) is the joint cdf of some n{rv X i
(i) F is non{decreasing and right{continuous with respect to each xi ,
(ii) F (;1; x2 ; : : : ; xn ) = F (x1 ; ;1; x3 ; : : : ; xn ) = : : : = F (x1 ; : : : ; xn;1 ; ;1) = 0 and
F (1; : : : ; 1) = 1, and
(iii) 8x 2 IRn 8i > 0; i = 1; : : : n, the following inequality holds:
F (x + ) ;
+
n
X
F (x1 + 1 ; : : : ; xi;1 + i;1; xi ; xi+1 + i+1 ; : : : ; xn + n)
i=1X
1i<j n
F (x1 + 1 ; : : : ; xi;1 + i;1 ; xi ; xi+1 + i+1 ; : : : ;
:::
+ (;1)n F (x)
0
xj ;1 + j ;1 ; xj ; xj +1 + j +1; : : : ; xn + n )
Note:
We won't prove this Theorem but just see why we need condition (iii) for n = 2:
P (x1 < X x2; y1 < Y y2 ) =
P (X x2 ; Y y2 ) ; P (X x1 ; Y y2 ) ; P (X x2 ; Y y1)+ P (X x1; Y y1) 0
Note:
We will restrict ourselves to n = 2 for most of the next Denitions and Theorems but those
can be easily generalized to n > 2. The term bivariate rv is often used to refer to a 2{rv
and multivariate rv is used to refer to an n{rv, n 2.
Denition 4.1.4:
A 2{rv (X; Y ) is discrete if there exists a countable collection X ofX
pairs (xi ; yi ) that has
probability 1. Let pij = P (X = xi ; Y = yj ) > 0 8(xi ; yj ) 2 X. Then, pij = 1 and fpij g is
the joint probability mass function (joint pmf) of (X; Y ).
72
i;j
Denition 4.1.5:
Let (X; Y ) be a discrete 2{rv with joint pmf fpij g. Dene
pi =
and
pj =
1
X
j =1
1
X
i=1
pij =
pij =
1
X
j =1
1
X
i=1
P (X = xi ; Y = yj ) = P (X = xi )
P (X = xi ; Y = yj ) = P (Y = yj ):
Then fpi g is called the marginal probability mass function (marginal pmf) of X and
fpj g is called the marginal probability mass function of Y .
Denition 4.1.6:
A 2{rv (X; Y ) is continuous if there exists a non{negative function f such that
F (x; y) =
Zx Zy
;1 ;1
f (u; v) dv du 8(x; y) 2 IR2
where F is the joint cdf of (X; Y ). We call f the joint probability density function (joint
pdf) of (X; Y ).
Note:
If F is continuous at (x; y), then
d2 F (x; y) = f (x; y):
dx dy
Denition 4.1.7:
Z1
Let (X; Y ) be a continuous 2{rv with joint pdf f . Then fX (x) =
f (x; y)dy is called the
;1
marginal probability density function (marginal pdf) of X and fY (y) =
is called the marginal probability density function of Y .
Z1
Note:
(i)
Z1
;1
fX (x)dx =
Z 1 Z 1
f (x; y)dy dx = F (1; 1) = 1 =
;1 ;1
Z 1 Z 1
Z1
f (x; y)dx dy =
fY (y)dy
;1 ;1
;1
and fX (x) 0 8x 2 IR and fY (y) 0 8y 2 IR.
73
;1
f (x; y)dx
(ii) Given a 2{rv (X; Y ) with joint cdf F (x; y), how do we generate a marginal cdf
FX (x) = P (X x) ? | The answer is P (X x) = P (X x; ;1 < Y < 1) =
F (x; 1).
Denition 4.1.8:
If FX (x1 ; : : : ; xn ) = FX (x) is the joint cdf of an n{rv X = (X1 ; : : : ; Xn ), then the marginal
cumulative distribution function (marginal cdf) of (Xi1 ; : : : ; Xik ); 1 k n ; 1; 1 i1 < i2 < : : : < ik n, is given by
lim
F (x) = FX (1; : : : ; 1; xi1 ; 1; : : : ; 1; xi2 ; 1; : : : ; 1; xik ; 1; : : : ; 1):
xi !1;i6=i1;:::;ik X
Note:
In Denition 1.4.1, we dened conditional probability distributions in some probability space
(
; L; P ). This denition extends to conditional distributions of 2{rv's (X; Y ).
Denition 4.1.9:
Let (X; Y ) be a discrete 2{rv. If P (Y = yj ) = pj > 0, then the conditional probability
mass function (conditional pmf) of X given Y = yj (for xed j ) is dened as
pijj = P (X = xi j Y = yj ) = P (XP=(Yxi=; Yy =) yj ) = ppij :
j
j
Note:
For a continuous 2{rv (X; Y ) with pdf f , P (X x j Y = y) is not dened. Let > 0 and
suppose that P (y ; < Y y + ) > 0. For every x and every interval (y ; ; y + ], consider
the conditional probability of X x given Y 2 (y ; ; y + ]. We have
P (X x j y ; < Y y + ) = P (XP (y x;; y ;< Y<Yy + y)+ )
which is well{dened if P (y ; < Y y + ) > 0 holds.
So, when does
lim P (X x j Y 2 (y ; ; y + ])
!0+
exist? See the next denition.
74
Denition 4.1.10:
The conditional cumulative distribution function (conditional cdf) of a rv X given
that Y = y is dened to be
FX jY (x j y) = lim
P (X x j Y 2 (y ; ; y + ])
!0+
provided that this limit exists. If it does exist, the conditional probability density function (conditional pdf) of X given that Y = y is any non{negative function fX jY (x j y)
satisfying
Zx
FX jY (x j y) =
fX jY (t j y)dt 8x 2 IR:
;1
Note:
Z1
For xed y, fX jY (x j y) 0 and
fX jY (x j y)dx = 1. So it is really a pdf.
;1
Theorem 4.1.11:
Let (X; Y ) be a continuous 2{rv with joint pdf fX;Y . It holds that at every point (x; y) where
f is continuous and the marginal pdf fY (y) > 0, we have
x; Y 2 (y ; ; y + ])
FX jY (x j y) = !
lim0+ P (XP (Y 2 (y ; ; y + ])
1
0 Z x Z y +
1
B 2 ;1 y; fX;Y (u; v)dv du CC
= !
lim0+ B
CA
B@
Z y+
1
f (v)dv
Zx
=
;1
2 y;
Y
fX;Y (u; y)du
fY (y)
Z x fX;Y (u; y)
=
f (y) du:
;1
Y
(x;y)
Thus, fX jY (x j y) exists and equals fX;Y
fY (y) , provided that fY (y) > 0. Furthermore, since
Zx
;1
fX;Y (u; y)du = fY (y)FX jY (x j y);
we get the following marginal cdf of X :
FX (x) =
Z 1 Z x
;1
;1
fX;Y (u; y)du dy =
75
Z1
;1
fY (y)FX jY (x j y)dy
Example 4.1.12:
Consider
fX;Y (x; y) =
(
Lecture 26:
Fr 10/27/00
2; 0 < x < y < 1
0; otherwise
We calculate the marginal pdf's fX (x) and fY (y) rst:
fX (x) =
and
Z1
;1
fY (y) =
fX;Y (x; y)dy =
Z1
;1
Z1
x
fX;Y (x; y)dx =
2dy = 2(1 ; x) for 0 < x < 1
Zy
0
2dx = 2y for 0 < y < 1
The conditional pdf's fY jX (y j x) and fX jY (x j y) are calculated as follows:
(x; y) = 2 = 1 for x < y < 1 (where 0 < x < 1)
fY jX (y j x) = fX;Y
f (x)
2(1 ; x) 1 ; x
X
and
(x; y) = 2 = 1 for 0 < x < y (where 0 < y < 1)
fX jY (x j y) = fX;Y
f (y )
2y y
Y
Thus, it holds that Y j X = x U (x; 1) and X j Y = y U (0; y), i.e., both conditional pdf's
are related to uniform distributions.
76
4.2 Independent Random Variables
Example 4.2.1: (from Rohatgi, page 119, Example 1)
Let f1 ; f2 ; f3 be 3 pdf's with cdf's F1 ; F2 ; F3 and let j j 1. Dene
f(x1 ; x2 ; x3 ) = f1 (x1 )f2 (x2 )f3 (x3 ) (1 + (2F1 (x1 ) ; 1)(2F2 (x2) ; 1)(2F3 (x3 ) ; 1)):
We can show
(i) f is a pdf for all 2 [;1; 1].
(ii) ff : ;1 1g all have marginal pdf's f1 ; f2 ; f3 .
See book for proof and further discussion | but when do the marginal distributions uniquely
determine the joint distribution?
Denition 4.2.2:
Let FX;Y (x; y) be the joint cdf and FX (x) and FY (y) be the marginal cdf's of a 2{rv (X; Y ).
X and Y are independent i
FX;Y (x; y) = FX (x)FY (y) 8(x; y) 2 IR2:
Lemma 4.2.3:
If X and Y are independent, a; b; c; d 2 IR, and a < b and c < d, then
P (a < X b; c < Y d) = P (a < X b)P (c < Y d):
Proof:
P (a < X b; c < Y d) = FX;Y (b; d) ; FX;Y (a; d) ; FX;Y (b; c) + FX;Y (a; c)
= FX (b)FY (d) ; FX (a)FY (d) ; FX (b)FY (c) + FX (a)FY (c)
= (FX (b) ; FX (a))(FY (d) ; FY (c))
= P (a < X b)P (c < Y d)
Denition 4.2.4:
A collection of rv's X1 ; : : : ; Xn with joint cdf FX (x) and marginal cdf's FXi (xi ) are mutually
(or completely) independent i
FX (x) =
n
Y
i=1
FX (xi ) 8x 2 IRn :
i
77
Note:
We often simply say that the rv's X1 ; : : : ; Xn are independent when we really mean that
they are mutually independent.
Theorem 4.2.5: Factorization Theorem
(i) A necessary and sucient condition for discrete rv's X1 ; : : : ; Xn to be independent is
that
n
Y
P (X = x) = P (X1 = x1 ; : : : ; Xn = xn ) = P (Xi = xi) 8x 2 X
where X IRn
is the countable support of X .
i=1
(ii) For an absolutely continuous n{rv X = (X1 ; : : : ; Xn ), X1 ; : : : ; Xn are independent i
fX (x) = fX1 ;:::;X (x1 ; : : : ; xn ) =
n
n
Y
i=1
fX (xi);
i
where fX is the joint pdf and fX1 ; : : : ; fXn are the marginal pdfs of X .
Proof:
(i) Discrete case:
Let X be a random vector whose components are independent random variables of the
discrete type with P (X = b) > 0. Lemma 4.2.3 extends to:
lim
P (a < X b)
a"b
=
=
=
indep:
=
lim
P (a1 < X1 b1 ; a2 < X2 b2 ;
ai "bi 8i2f1;:::;ng
P (X1 = b1 ; X2 = b2 ; : : : ; Xn = bn)
: : : ; an < Xn bn )
P (X = b)
n
Y
P (Xi = bi )
i=1
Before considering the converse factorization of the joint cdf of an n-rv, recall that
independence allows each value in the support of a component to combine with each
of all possible combinations of the other component values: a 3-dimensional vector
where the rst component has support fx1a ; x1b ; x1c g, the second component has support
fx2a ; x2b ; x2cg, and the third component has support fx3a ; x3b ; x3c g, has 3 3 3 = 27
points in its support. Due to independence, all these vectors can be arranged into three
sets: the rst set having x1a , the second set having x1b , and the third set having x1c ,
with each set having 9 combinations of x2 and x3 .
78
=) F (x1 :; x2 :; x3 :)
= P (X1 = x1a )F (x2 :; x3 :) + P (X1 = x1b )F (x2 :; x3 :) + P (X1 = x1c )F (x2 :; x3 :)
= F (x1 :)F (x2 :; x3 :)
= F (x1 :)[P (X2 = x2a )F (x3 :) + P (X2 = x2b )F (x3 :) + P (X2 = x2c )F (x3 :)]
= F (x1 :)[P (X2 = x2a ) + P (X2 = x2b ) + P (X2 = x2c )]F (x3 :)
= F (x1 :)F (x2 :)F (x3 :)
More generally, for n dimensions, let xi = (xi1 ; xi2 ; : : : ; xin ); B = fxi : xi1 x1 ; xi2 x2 ; : : : ; xin xng; B 2 X. Then it holds:
FX (x)
=
=
indep:
=
=
=
X
xi 2B
X
xi 2B
X
xi 2B
P (X = xi)
P (X1 = xi1 ; X2 = xi2 ; : : : ; Xn = xin)
P (X1 = xi1 )P (X2 = xi2 ; : : : ; Xn = xin)
X
xi1 x1 ; :::; xin xn
X
xi1 x1
P (X1 = xi1 )P (X2 = xi2 ; : : : ; Xn = xin)
P (X1 = xi1 )
X
X
xi2 x2 ; :::; xin xn
=
FX1 (x1 )
indep:
=
FX1 (x1 )
=
FX1 (x1 )FX2 (x2 )
=
=
:::
FX1 (x1 )FX2 (x2 ) : : : FX (xn)
=
n
Y
i=1
xi2 x2 ; :::; xin xn
X
xi2 x2
P (X2 = xi2 ; : : : ; Xn = xin)
P (X2 = xi2 )
X
X
xi3 x3 ; :::; xin xn
xi3 x3 ; :::; xin xn
n
FX (xi )
i
(ii) Continuous case: Homework
79
P (X2 = xi2; : : : ; Xn = xin )
P (X3 = xi3 ; : : : ; Xn = xin )
P (X3 = xi3 ; : : : ; Xn = xin )
Lecture 27:
Theorem 4.2.6:
Mo 10/30/00
n
Y
X1 ; : : : ; Xn are independent i P (Xi 2 Ai ; i = 1; : : : ; n) = P (Xi 2 Ai ) 8 Borel sets Ai 2 B
i=1
(i.e., rv's are independent i all events involving these rv's are independent).
Proof:
Lemma 4.2.3 and denition of Borel sets.
Theorem 4.2.7:
Let X1 ; : : : ; Xn be independent rv's and g1 ; : : : ; gn be Borel{measurable functions. Then
g1 (X1 ); g2 (X2 ); : : : ; gn(Xn ) are independent.
Proof:
Fg1 (X1 );g2 (X2 );:::;g
n
(Xn ) (h1 ; h2 ; : : : ; hn )
=
P (g1 (X1 ) h1 ; g2 (X2 ) h2 ; : : : ; gn (Xn ) hn )
()
P (X1 2 g1;1 ((;1; h1 ]); : : : ; Xn 2 gn;1 ((;1; hn ]))
=
Th:=
4:2:6
=
=
n
Y
i=1
n
Y
i=1
n
Y
i=1
P (Xi 2 gi;1 ((;1; hi ]))
P (gi (Xi ) hi )
Fg (X ) (hi )
i
i
() holds since g1;1 ((;1; h1 ]) 2 B; : : : ; gn;1 ((;1; hn ]) 2 B
Theorem 4.2.8:
If X1 ; : : : ; Xn are independent, then also every subcollection Xi1 ; : : : ; Xik ; k = 2; : : : ; n ; 1,
1 i1 < i2 : : : < ik n, is independent.
Denition 4.2.9:
A set (or a sequence) of rv's fXn g1
n=1 is independent i every nite subcollection is independent.
80
Note:
Recall that X and Y are identically distributed i FX (x) = FY (x) 8x 2 IR according to
Denition 2.2.5 and Theorem 2.2.6.
Denition 4.2.10:
We say that fXn g1
n=1 is a set (or a sequence) of independent identically distributed (iid)
1
rv's if fXn gn=1 is independent and all Xn are identically distributed.
Note:
Recall that X and Y being identically distributed does not say that X = Y with probability
1. If this happens, we say that X and Y are equivalent rv's.
Note:
We can also extend the dention of independence to 2 random vectors X n1 and Y n1 : X
and Y are independent i FX;Y (x; y) = FX (x)FY (y ) 8x; y 2 IRn .
This does not mean that the components Xi of X or the components Yi of Y are independent.
However, it does mean that each pair of components (Xi ; Yi ) are independent, any subcollections (Xi1 ; : : : ; Xik ) and (Yj1 ; : : : ; Yjl ) are independent, and any Borel{measurable functions
f (X ) and g(Y ) are independent.
Corollary 4.2.11: (to Factorization Theorem 4.2.5)
If X and Y are independent rv's, then
FX jY (x j y) = FX (x) 8x;
and
FY jX (y j x) = FY (y) 8y:
81
4.3 Functions of Random Vectors
Theorem 4.3.1:
If X and Y are rv's on (
; L; P ) ! IR, then
(i) X Y is a rv.
(ii) XY is a rv.
(iii) If f! : Y (!) = 0g = , then XY is a rv.
Theorem 4.3.2:
Let X1 ; : : : ; Xn be rv's on (
; L; P ) ! IR. Dene
MAXn = maxfX1 ; : : : ; Xn g = X(n)
by
MAXn(!) = maxfX1 (!); : : : ; Xn(!)g 8! 2 and
MINn = minfX1 ; : : : ; Xng = X(1) = ; maxf;X1 ; : : : ; ;Xn g
by
MINn(!) = minfX1 (!); : : : ; Xn (!)g 8! 2 :
Then,
(i) MINn and MAXn are rv's.
(ii) If X1 ; : : : ; Xn are independent, then
FMAX (z ) = P (MAXn z) = P (Xi z 8i = 1; : : : ; n) =
n
and
FMIN (z) = P (MINn z ) = 1 ; P (Xi > z 8i = 1; : : : ; n) = 1 ;
n
(iii) If fXi gni=1 are iid rv's with common cdf FX , then
FMAX (z) = FXn (z )
n
and
FMIN (z) = 1 ; (1 ; FX (z))n :
n
82
n
Y
i=1
n
Y
FX (z)
i
(1 ; FXi (z )):
i=1
If FX is absolutely continuous with pdf fX , then the pdfs of MAXn and MINn are
fMAX (z) = n FXn;1 (z ) fX (z)
n
and
fMIN (z ) = n (1 ; FX (z))n;1 fX (z)
for all continuity points of FX .
n
Note:
Using Theorem 4.3.2, it is easy to derive the joint cdf and pdf of MAXn and MINn for iid
rv's fX1 ; : : : ; Xn g. For example, if the Xi 's are iid with cdf FX and pdf fX , then the joint
pdf of MAXn and MINn is
fMAX
n
;MINn (x; y) =
(
0;
xy
n
;
2
n(n ; 1) (FX (x) ; FX (y)) fX (x)fX (y); x > y
However, note that MAXn and MINn are not independent. See Rohatgi, page 129, Corollary,
for more details.
Note:
The previous transformations are special cases of the following Theorem 4.3.3.
Theorem 4.3.3:
If g : IRn ! IRm is a Borel{measurable function (i.e., 8B 2 Bm : g;1 (B ) 2 Bn ) and if
X = (X1 ; : : : ; Xn ) is an n{rv, then g(X ) is an m{rv.
Proof:
If B 2 Bm , then f! : g(X (!)) 2 B g = f! : X (!) 2 g;1 (B )g 2 Bn .
Question: How do we handle more general transformations of X ?
Discrete Case:
Let X = (X1 ; : : : ; Xn ) be a discrete n{rv and X IRn be the countable support of X , i.e.,
P (X 2 X) = 1 and P (X = x) > 0 8x 2 X.
Dene ui = gi (x1 ; : : : ; xn ); i = 1; : : : ; n to be 1{to{1-mappings of X onto B . Let u =
(u1 ; : : : ; un )0 . Then
P (U = u) = P (g1 (X ) = u1 ; : : : ; gn (X ) = un) = P (X1 = h1 (u); : : : ; Xn = hn (u)) 8u 2 B
83
where xi = hi (u); i = 1; : : : ; n, is the inverse transformation (and P (U = u) = 0 8u 62 B ).
The joint marginal pmf of any subcollection of ui 's is now obtained by summing over the other
remaining uj 's.
Example 4.3.4:
Let X; Y be iid Bin(n; p); 0 < p < 1. Let U = YX+1 and V = Y + 1.
Then X = UV and Y = V ; 1. So the joint pmf of U; V is
P (U = u; V = v) =
!
!
n puv (1 ; p)n;uv n pv;1 (1 ; p)n+1;v
uv
v;1
!
!
n
n puv+v;1 (1 ; p)2n+1;uv;v
= uv
v;1
for v 2 f1; 2; : : : ; n + 1g and uv 2 f0; 1; : : : ; ng.
Continuous Case:
Let X = (X1 ; : : : ; Xn ) be a continuous n{rv with joint cdf FX and joint pdf fX .
Let
0
B
U =B
@
1
0 g (X )
BB 1 ..
.. C
C
=
g
(
X
)
=
. A
@ .
U1
Un
i.e., Ui = gi (X ), be a mapping from IRn into IRn .
If B 2 Bn , then
R
R
gn (X )
1
CC ;
A
R
R
Y
P (U 2 B ) = P (X 2 g;1 (B )) = g;:1:(:B ) fX (x)d(x) = g;:1:(:B ) fX (x) dxi
n
i=1
where g;1 (B ) = fx = (x1 ; : : : ; xn ) 2 IRn : g(x) 2 B g.
Suppose we dene B as the half{innite n{dimensional interval
Bu = f(u01 ; : : : ; u0n ) : ;1 < u0i < ui 8i = 1; : : : ; ng
for any u 2 IRn . Then the joint cdf of U is
R :::R
G(u) = P (U 2 Bu ) = P (g1 (X ) u1 ; : : : ; gn (X ) un) = g;1 (B ) fX (x)d(x):
u
84
Lecture 28:
We 11/01/00
If G happens to be absolutely continuous, the joint pdf of U will be given by fU (u) =
@ n G(u)
@u1 @u2 :::@un at every continuity point of fU .
Under certain conditions, we can write fU in terms of the original pdf fX of X as stated in
the next Theorem:
Theorem 4.3.5: Multivariate Transformation
Let X = (X1 ; : : : ; Xn ) be a continuous n{rv with joint pdf fX .
(i) Let
0
B
U =B
@
1
0 g (X )
1
B
.
.. C
. C
A = g(X ) = B@ ..
U1
1
CC ;
A
gn (X )
(i.e., Ui = gi (X )) be a 1{to{1{mapping from IRn into IRn , i.e., there exist inverses hi ,
i = 1; : : : ; n, such that xi = hi(u) = hi (u1 ; : : : ; un ); i = 1; : : : ; n, over the range of the
transformation g.
Un
(ii) Assume both g and h are continuous.
@xi = @hi (u) ; i; j = 1; : : : ; n, exist and are continuous.
(iii) Assume partial derivatives @u
@uj
j
(iv) Assume that the Jacobian of the inverse transformation
0 @x1 : : : @x1 1 @x1 : : : @x1 @u C @u1
@u B @u.. 1
.. C = ..
.. J = det @ (u ; : : : ; u ) = det B
.
.
.
. @
A
1
n
@x
@x
@x : : : @x
@u1
@u
@u1 : : : @u
@ (x1 ; : : : ; xn) n
n
is dierent from 0 for all u in the range of g.
n
n
n
n
n
n
Then the n{rv U = g(X ) has a joint absolutely continuous cdf with corresponding joint pdf
fU (u) =j J j fX (h1 (u); : : : ; hn (u)):
Proof:
Let u 2 IRn and
Then,
Bu = f(u01 ; : : : ; u0n ) : ;1 < u0i < ui 8i = 1; : : : ; ng:
R :::R
GU (u) = g;1 (B ) fX (x)d(x)
u
=
R :::R
Bu fX (h1 (u); : : : ; hn (u)) j J j d(u)
85
The result follows from dierentiation of GU .
For additional steps of the proof see Rohatgi (page 135 and Theorem 17 on page 10) or a book
on multivariate calculus.
Theorem 4.3.6:
Let X = (X1 ; : : : ; Xn ) be a continuous n{rv with joint pdf fX .
(i) Let
0
B
U =B
@
1
0 g (X )
BB 1 ..
.. C
C
=
g
(
X
)
=
. A
@ .
U1
Un
(i.e., Ui = gi (X )) be a mapping from IRn into IRn .
gn (X )
1
CC ;
A
(ii) Let X = fx : fX (x) > 0g be the support of X .
(iii) Suppose that for each u 2 B = fu 2 IRn : u = g(x) for some x 2 Xg there is a nite
number k = k(u) of inverses.
(iv) Suppose we can partition X into X0 ; X1 ; : : : ; Xk s.t.
(a) P (X 2 X0 ) = 0.
(b) U = g(X ) is a 1{to{1{mapping
1 Xl onto B for all l = 1; : : : ; k, with inverse
0 h (u) from
l1
C ; u 2 B , i.e., for each u 2 B , h (u) is the unique
B
B
transformation hl (u) = @ ... C
l
A
hln (u)
x 2 Xl such that u = g(x).
@xi = @hli (u) ; l = 1; : : : ; k; i; j = 1; : : : ; n, exist and are
(v) Assume partial derivatives @u
@uj
j
continuous.
(vi) Assume the Jacobian of each of the inverse transformations
1 @h 1 : : :
@u1
.. C
.
C
=
. A ..
@h@u1 : : :
: : : @h
@u
0 @h 1 : : :
0 @x1 : : : @x1 1
@u
@u
1
BB @u.. 1
B ..
.. C
C
=
det
Jl = det B
. A
@ .
@ .
l
n
@xn
@u1
:::
@hln
@u1
@xn
@un
is dierent from 0 for all u in the range of g.
Then the joint pdf of U is given by
fU (u) =
k
X
l=1
@hl1
@un
ln
n
j Jl j fX (hl1 (u); : : : ; hln(u)):
86
l
ln
.. ; l = 1; : : : ; k;
. @h @u
@hl1
@un
ln
n
Example 4.3.7:
Let X; Y be iid N (0; 1). Dene
U = g1 (X; Y ) =
(
Y=
6 0
0; Y = 0
X;
Y
and
V = g2 (X; Y ) =j Y j :
X = IR2 , but U; V are not 1{to{1 mappings since (U; V )(x; y) = (U; V )(;x; ;y), i.e., conditions
do not apply for the use of Theorem 4.3.5. Let
X0 = f(x; y) : y = 0g
X1 = f(x; y) : y > 0g
X2 = f(x; y) : y < 0g
Then P ((X; Y ) 2 X0 ) = 0.
Let B = f(u; v) : v > 0g = g(X1 ) = g(X2 ).
Inverses:
B ! X1 : x
y
B ! X2 : x
y
h11 (u; v) = uv
h12 (u; v) = v
h21 (u; v) = ;uv
h22 (u; v) = ;v
v
u
) jJ1 j =j v j
J1 = =
=
=
=
0 1
;
v
;
u
) jJ2 j =j v j
J2 = 0 ;1 fX;Y (x; y) = 21 e;x2 =2 e;y2 =2
fU;V (u; v) = j v j 21 e;(uv)2 =2 e;v2 =2 + j v j 21 e;(;uv)2 =2 e;(;v)2 =2
;( 2 +1) 2
= v e 2 ; ;1 < u < 1; 0 < v < 1
u
Marginal:
fU (u) =
=
Z1v
0
Z1
0
e
;(u2 +1)v2
2
v
2
2
2
dv j z = (u +2 1)v ; dz
dv = (u + 1)v
1
;z
(u2 + 1) e dz
87
1
= (u21+ 1) (;e;z )
0
= (1 +1 u2 ) ; ;1 < u < 1
Thus, the ratio of two iid N (0; 1) rv's is a rv that has a Cauchy distribution.
88
Lecture 29:
Fr 11/03/00
4.4 Order Statistics
Denition 4.4.1:
Let (X1 ; : : : ; Xn ) be an n{rv. The kth order statistic X(k) is the kth smallest of the Xi0 s, i.e.,
X(1) = minfX1 ; : : : ; Xn g, X(2) = minffX1 ; : : : ; Xng ; X(1) g, : : :, X(n) = maxfX1 ; : : : ; Xng.
It is X(1) X(2) : : : X(n) and fX(1) ; X(2) ; : : : ; X(n) g is the set of order statistics for
(X1 ; : : : ; Xn ).
Note:
As shown in Theorem 4.3.2, X(1) and X(n) are rv's. This result will be extended in the following Theorem:
Theorem 4.4.2:
Let (X1 ; : : : ; Xn ) be an n{rv. Then the kth order statistic X(k) , k = 1; : : : ; n, is also an rv.
Theorem 4.4.3:
Let X1 ; : : : ; Xn be continuous iid rv's with pdf fX . The joint pdf of X(1) ; : : : ; X(n) is
8 Y
n
>
< n! fX (xi); x1 x2 : : : xn
fX(1);:::;X( ) (x1 ; : : : ; xn ) = > i=1
: 0;
otherwise
n
Proof:
For the case n = 3, look at the following scenario how X1 ; X2 , and X3 can be possibly ordered
to yield X(1) < X(2) < X(3) . Columns represent X(1) ; X(2) , and X(3) . Rows represent X1 ; X2 ,
and X3 :
1 0 0 X1 < X2 < X3 : 0 1 0 0 0 1 1
X1 < X3 < X2 : 0
0
0
X2 < X1 < X3 : 1
0
89
0 0 0 1 1 0 1 0 0 0 0 1 0
X2 < X3 < X1 : 1
0
0
X3 < X1 < X2 : 0
1
0
X3 < X2 < X1 : 0
1
0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 For n = 3, there are 3! = 6 possible arrangements.
In general, there are n! arrangements of X1 ; : : : ; Xn for each (X(1) ; : : : ; X(n) ). This mapping
is not 1{to{1. For each mapping, we have a n n matrix J that results from an n n identity
matrix through the rearrangement of rows. Therefore, j J j= 1. By Theorem 4.3.6, we get
fX(1) ;:::;X( ) (x(1) ; : : : ; x(n) ) = n!fX1 ;:::;X (x(k1 ); x(k2 ) ; : : : ; x(k ))
n
= n!
= n!
n
Y
i=1
n
Y
i=1
n
n
fX (x(k ) )
i
i
fX (xi )
Theorem 4.4.4: Let X1 ; : : : ; Xn be continuous iid rv's with pdf fX and cdf FX . Then the
following holds:
(i) The marginal pdf of X(k) , k = 1; : : : ; n, is
n!
k;1
n; k
fX( ) (x) = (k ; 1)!(
n ; k)! (FX (x)) (1 ; FX (x)) fX (x):
k
(ii) The joint pdf of X(j ) and X(k) , 1 j < k n, is
fX( ) ;X( ) (xj ; xk ) = (j ; 1)!(k ; nj !; 1)!(n ; k)! j
k
(FX (xj ))j ;1 (FX (xk ) ; FX (xj ))k;j ;1 (1 ; FX (xk ))n;k fX (xj )fX (xk )
if xj < xk and 0 otherwise.
90
4.5 Multivariate Expectation
In this section, we assume that X = (X1 ; : : : ; Xn ) is an n{rv and g : IRn ! IRn is a Borel{
measurable function.
Denition 4.5.1:
If n = 1, i.e., g is univariate, we dene the following:
(i) Let
1 ; : : : ; Xn = xin ). If
X X be discrete with joint pmf pi1 ;:::;in = P (X1 = xiX
pi1 ;:::;in g(xi1 ; : : : ; xin )
pi1 ;:::;in j g(xi1 ; : : : ; xin ) j< 1, we dene E (g(X )) =
i1 ;:::;in
i1 ;:::;in
and this value exists.
(ii) Let X be continuous with joint pdf fX (x). If
E (g(X )) =
Z
IRn
Z
IRn
j g(x) j fX (x) dx < 1, we dene
g(x)fX (x)dx and this value exists.
Note:
The above can be extended to vector{valued functions g (n > 1) in the obvious way. For
example, if g is the identity mapping from IRn ! IRn , then
0 E (X )
1
B
.
E (X ) = B
@ ..
E (Xn )
1 0
CC = BB
A @
1
.. C
. C
A
1
n
provided that E (j Xi j) < 1 8i = 1; : : : ; n.
Similarly, provided that all expectations exist, we get for the variance{covariance matrix:
V ar(X ) = X = E ((X ; E (X )) (X ; E (X ))0 )
with (i; j )th component
E ((Xi ; E (Xi )) (Xj ; E (Xj ))) = Cov(Xi ; Xj )
and with (i; i)th component
E ((Xi ; E (Xi )) (Xi ; E (Xi ))) = V ar(Xi ) = i2 :
Joint higher{order moments can be dened similarly when needed.
91
Note:
We are often interested in (weighted) sums of rv's or products of rv's and their expectations.
This will be addressed in the next two Theorems.
Theorem 4.5.2:
Let Xi ; i = 1; : : : ; n, be rv's such that E (j Xi j) < 1. Let a1 ; : : : ; an 2 IR and dene
n
X
S = aiXi . Then it holds that E (j S j) < 1 and
i=1
E (S ) =
Proof:
Continuous case only:
E (j S j) =
=
=
=
<
n
X
i=1
aiE (Xi ):
Z
n
X
j ai xi j fX (x)dx
IR i=1
Z X
n
j ai j j xi j fX (x)dx
IR i=1
Z
Z
n
X
j ai j j x i j
fX (x)dx1 : : : dxi;1 dxi+1 : : : dxn dxi
IR
IR ;1
i=1
Z
n
X
j ai j j xi j fX (xi )dxi
IR
i=1
n
X
j ai j E (j Xi j)
i=1
1
n
X
n
n
n
i
It follows that E (S ) = ai E (Xi ) by the same argument without using the absolute values
i=1
j j.
Note:
If Xi ; i = 1; : : : ; n, are iid with E (Xi ) = , then
E (X ) = E ( n1
n
X
i=1
Xi ) =
92
n 1
X
i=1 n
E (Xi ) = :
Lecture 30:
Theorem 4.5.3:
Mo 11/06/00
Let Xi ; i = 1; : : : ; n, be independent rv's such that E (j Xi j) < 1. Let gi ; i = 1; : : : ; n, be
Borel{measurable functions. Then
Yn
n
Y
i=1
i=1
E ( gi (Xi )) =
E (gi (Xi ))
if all expectations exist.
Proof:
n
Y
By Theorem 4.2.5, fX (x) = fXi (xi ), and by Theorem 4.2.7, gi (Xi ); i = 1; : : : ; n, are also
i=1
independent. Therefore,
n
Y
E ( gi (Xi ))
i=1
=
Th:=
4:2:5
=
Th:=4:2:7
=
=
Z Y
n
IRn =1
Z iY
n
gi (xi)fX (x)dx
(gi (xi )fXi (xi )dxi )
IRn i=1
Z
IR
Z
:::
Z Y
n
IR i=1
gi (xi )
n
Y
i=1
Z
g1 (x1 )fX1 (x1)dx1
IR
n Z
Y
i=1 IR
n
Y
i=1
fX (xi)
IR
i
n
Y
i=1
dxi
g2 (x2)fX2 (x2 )dx2 : : :
Z
IR
gn (xn)fX (xn)dxn
n
gi(xi )fX (xi )dxi
i
E (gi (Xi ))
Corollary 4.5.4:
If X; Y are independent, then Cov(X; Y ) = 0.
Theorem 4.5.5:
Two rv's X; Y are independent i for all pairs of Borel{measurable functions g1 and g2 it
holds that E (g1 (X ) g2 (Y )) = E (g1 (X )) E (g2 (Y )) if all expectations exist.
Proof:
\=)": It follows from Theorem 4.5.3 and the independence of X and Y that
E (g1 (X )g2 (Y )) = E (g1 (X )) E (g2 (Y )):
\(=": From Theorem 4.2.6, we know that X and Y are independent i
P (X 2 A1 ; Y 2 A2) = P (X 2 A1) P (Y 2 A2 ) 8 Borel sets A1 and A2 :
93
How do we relate Theorem 4.2.6 to g1 and g2 ? Let us dene two Borel{measurable functions
g1 and g2 as:
(
g1 (x) = IA1 (x) = 1; x 2 A1
0; otherwise
(
g2 (y) = IA2 (y) = 1; y 2 A2
0; otherwise
Then,
E (g1 (X )) = 0 P (X 2 Ac1 ) + 1 P (X 2 A1 ) = P (X 2 A1 );
E (g2 (Y )) = 0 P (Y 2 Ac2 ) + 1 P (Y 2 A2 ) = P (Y 2 A2 )
and
E (g1 (X ) g2 (Y ))
=
P (X 2 A1 ; Y 2 A2 ):
=) P (X 2 A1 ; Y 2 A2 )
=
E (g1 (X ) g2 (Y ))
given
=
E (g1 (X )) E (g2 (Y ))
=
P (X 2 A1 )P (Y 2 A2 )
=) X; Y independent by Theorem 4.2.6.
Denition 4.5.6:
th
th
The (ith
1 ; i2 ; : : : ; in ) multi{way moment of X = (X1 ; : : : ; Xn ) is dened as
mi1 i2 :::i = E (X1i1 X2i2 : : : Xni )
n
n
if it exists.
th
th
The (ith
1 ; i2 ; : : : ; in ) multi{way central moment of X = (X1 ; : : : ; Xn ) is dened as
n
Y
i1 i2 :::i = E ( (Xj ; E (Xj ))i )
n
j
j =1
if it exists.
Note:
If we set ir = is = 1 and ij = 0 8j 6= r; s in Denition 4.5.6 for the multi{way central moment,
we get
0 : : : 0 1 0 : : : 0 1 0 : : : 0 = rs = Cov(Xr ; Xs ):
"
r
"
s
94
Theorem 4.5.7: Cauchy{Schwarz Inequality
Let X; Y be 2 rv's with nite variance. Then it holds:
(i) Cov(X; Y ) exists.
(ii) (E (XY ))2 E (X 2 )E (Y 2 ).
(iii) (E (XY ))2 = E (X 2 )E (Y 2 ) i there exists an (; ) 2 IR2 ; f(0; 0)g such that
P (X + Y = 0) = 1.
Proof:
Assumptions: V ar(X ); V ar(Y ) < 1. Then also E (X 2 ); E (X ); E (Y 2 ); E (Y ) < 1.
Result used in proof:
0 (a ; b)2 = a2 ; 2ab + b2 =) ab a2 +2 b2
0 (a + b)2 = a2 + 2ab + b2 =) ;ab a2 +2 b2
=)j ab j a2 +2 b2 8 a; b 2 IR ()
(i)
E (j XY j) =
Z
IR2
j xy j fX;Y (x; y)dx dy
Z x2 + y2
2 fX;Y (x; y)dx dy
IR2
Z y2
Z x2
fX;Y (x; y)dx dy + 2 2 fX;Y (x; y)dy dx
=
IR
IR2 2
Z x2
Z y2
()
2 fX (x)dx + IR 2 fY (y)dy
2
2
= E (X ) +2 E (Y )
=
IR
< 1
=) E (XY ) exists
=) Cov(X; Y ) = E (XY ) ; E (X )E (Y ) exists
(ii) 0 E ((X + Y )2 ) = 2 E (X 2 ) + 2E (XY ) + 2 E (Y 2 ) 8 ; 2 IR (A)
If E (X 2 ) = 0, then X has a degenerate 1{point Dirac distribution and the inequality
trivially is true. Therefore, we can assume that E (X 2 ) > 0. As (A) is true for all
; 2 IR, we can choose = ;EE((XXY2) ) ; = 1.
95
2
2
))
(E (XY ))
2
=) (EE(XY
(X 2 ) ; 2 E (X 2 ) + E (Y ) 0
=) ;(E (XY ))2 + E (Y 2 )E (X 2 ) 0
=) (E (XY ))2 E (X 2 ) E (Y 2 )
(iii) When are the left and right sides of the inequality in (ii) equal?
Assume that E (X 2 ) > 0. (E (XY ))2 = E (X 2 )E (Y 2 ) holds i E ((X + Y )2 ) = 0 based
on (ii). It is therefore sucient to show that E ((X +Y )2 ) = 0 i P (X +Y = 0) = 1:
\=)":
Let Z = X + Y . Since E ((X + Y )2 ) = E (Z 2 ) = V ar(Z ) + (E (Z ))2 = 0 and
V ar(Z ) 0 and (E (Z ))2 0, it follows that E (Z ) = 0 and V ar(Z ) = 0.
This means that Z has a degenerate 1{point Dirac(0) distribution with P (Z = 0) =
P (X + Y = 0) = 1.
\(=":
If P (X + Y = 0) = P (Y = ; X ) = 1 for some (; ) 2 IR2 ; f(0; 0)g, i.e., Y is
linearly dependent on X with probability 1, this implies:
2 = ( )2 (E (X 2 ))2 = E (X 2 )( )2 E (X 2 ) = E (X 2 )E (Y 2 )
(E (XY ))2 = (E (X ;X
))
96
Lecture 31:
We 11/08/00
4.6 Multivariate Generating Functions
Denition 4.6.1:
Let X = (X1 ; : : : ; Xn ) be an n{rv. We dene the multivariate moment generating function (mmgf) of X as
MX
(t) = E (et0 X ) = E
exp
n
X
i=1
tiXi
!!
v
u
n
uX
if this expectation exists for j t j= t t2i < h for some h > 0.
i=1
Denition 4.6.2:
Let X = (X1 ; : : : ; Xn ) be an n{rv. We dene the n{dimensional characteristic function
X : IRn ! CI of X as
0 0 n
11
X
X (t) = E (e ) = E @exp @i tj Xj AA :
it0 X
j =1
Note:
(i) X (t) exists for any real{valued n{rv.
(ii) If MX (t) exists, then X (t) = MX (it).
Theorem 4.6.3:
(i) If MX (t) exists, it is unique and uniquely determines the joint distribution of X . X (t)
is also unique and uniquely determines the joint distribution of X .
(ii) MX (t) (if it exists) and X (t) uniquely determine all marginal distributions of X , i.e.,
MXi (ti ) = MX (0; ti ; 0) and and Xi (ti ) = X (0; ti ; 0).
(iii) Joint moments of all orders (if they exist) can be obtained as
i1 +i2 +:::+i
(
t
)
= @i1 i2
M
= E (X1i1 X2i2 : : : Xni )
X
@t1 @t2 : : : @tin
t=0
n
mi1 :::i
n
if the mmgf exists and
n
n
i1 +i2 +:::+i
mi1 :::i = ii1 +i21+:::+i @i1 i2
X (0) = E (X1i1 X2i2 : : : Xni ):
i
@t1 @t2 : : : @tn
n
n
n
n
n
97
(iv) X1 ; : : : ; Xn are independent rv's i
MX (t1 ; : : : ; tn) = MX (t1 ; 0) MX (0; t2 ; 0) : : : MX (0; tn) 8t1 ; : : : ; tn 2 IR;
given that MX (t) exists.
Similarly, X1 ; : : : ; Xn are independent rv's i
X (t1 ; : : : ; tn ) = X (t1 ; 0) X (0; t2 ; 0) : : : X (0; tn ) 8t1; : : : ; tn 2 IR:
Proof:
Rohatgi, page 162: Theorem 7, Corollary, Theorem 8, and Theorem 9 (for mmgf and the case
n = 2).
Theorem 4.6.4:
Let X1 ; : : : ; Xn be independent rv's.
(i) If mgf's MX1 (t); : : : ; MXn (t) exist, then the mgf of Y =
MY (t) =
n
Y
n
X
i=1
MX (ait)
i
i=1
on the common interval where all individual mgf's exist.
(ii) The characteristic function of Y =
n
X
j =1
aj Xj is
Y (t) =
n
Y
Xj (aj t)
j =1
(iii) If mgf's MX1 (t); : : : ; MXn (t) exist, then the mmgf of X is
MX (t) =
n
Y
i=1
MX (ti)
i
on the common interval where all individual mgf's exist.
(iv) The n{dimensional characteristic function of X is
X (t) =
98
n
Y
j =1
Xj (tj ):
aiXi is
Proof:
Homework (parts (ii) and (iv) only)
Theorem 4.6.5:
Let X1 ; : : : ; Xn be independent discrete rv's on the non{negative integers with pgf's GX1 (s); : : : ; GXn (s).
n
X
The pgf of Y = Xi is
i=1
GY (s) =
n
Y
i=1
GX (s):
i
Proof:
Version 1:
GX (s)
i
=
E (sX )
GY (s)
=
E (sY )
=
E (s
i
P
indep:
=
n
Y
i=1
n
Y
=
i=1
=1 Xi )
n
i
E (sX )
i
GX (s)
i
Version 2: (case n = 2 only)
GY (s)
=
=
indep:
=
=
=
P (Y = 0) + P (Y = 1)s + P (Y = 2)s2 + : : :
P (X1 = 0; X2 = 0) +
(P (X1 = 1; X2 = 0) + P (X1 = 0; X2 = 1)) s +
(P (X1 = 2; X2 = 0) + P (X1 = 1; X2 = 1) + P (X1 = 0; X2 = 2)) s2 + : : :
P (X1 = 0)P (X2 = 0) +
(P (X1 = 1)P (X2 = 0) + P (X1 = 0)P (X2 = 1)) s +
2
(P (X1 = 2)P (X2 = 0) + P (X1 = 1)2P (X2 = 1) + P (X1 = 0)P (X2 = 2)) s + : : :
P (X1 = 0) + P (X1 = 1)s + P (X1 = 2)s + : : : P (X2 = 0) + P (X2 = 1)s + P (X2 = 2)s2 + : : :
GX1 (s) GX2 (s)
A generalized proof for n 3 needs to be done by induction on n.
99
Theorem 4.6.6:
Let X1 ; : : : ; XN be iid discrete rv's on the non{negative integers with common pgf GX (s).
Let N be a discrete rv on the non{negative integers with pgf GN (s). Let N be independent
of the Xi 's. Dene SN =
N
X
i=1
Xi . The pgf of SN is
GS (s) = GN (GX (s)):
N
Proof:
P (SN = k)
=
=) GSN (s)
=
=
=
=
Th:=4:6:5
iid
=
=
1
X
n=0
P (SN = kjN = n) P (N = n)
1 X
1
X
P (SN = kjN = n) P (N = n) sk
k=0 n=0
1
X
n=0
1
X
n=0
1
X
n=0
1
X
n=0
1
X
1
X
P (N = n) P (SN = kjN = n) sk
k=0
1
X
P (N = n) P (Sn = k) sk
k=0
1 X
n
X
P (N = n) P ( Xi = k) sk
k=0
n
Y
i=1
P (N = n) GX (s)
i=1
i
P (N = n) (GX (s))n
n=0
GN (GX (s))
Example 4.6.7:
Starting with a single cell at time 0, after one time unit there is probability p that the cell will
have split (2 cells), probability q that it will survive without splitting (1 cell), and probability
r that it will have died (0 cells). It holds that p; q; r 0 and p + q + r = 1. Any surviving
cells have the same probabilities of splitting or dying. What is the pgf for the # of cells at
time 2?
GX (s) = GN (s) = ps2 + qs + r
GS (s) = p(ps2 + ps + r)2 + q(ps2 + ps + r) + r
N
100
Theorem 4.6.8:
Let X1 ; : : : ; XN be iid rv's with common mgf MX (t). Let N be a discrete rv on the non{
N
X
negative integers with mgf MN (t). Let N be independent of the Xi 's. Dene SN = Xi .
i=1
The mgf of SN is
MSN (t) = MN (ln MX (t)):
Proof:
Consider the case that the Xi 's are non{negative integers:
We know that
GX (s) = E (sX ) = E (eln s ) = E (e(ln s)X ) = MX (ln s)
=) MX (s) = GX (es )
=) MS (t) = GS (et ) Th:=4:6:6 GN (GX (et )) = GN (MX (t)) = MN (ln MX (t))
X
N
N
In the general case, i.e., if the Xi0 s are not non{negative integers, we need results from Section
4.7 (conditional expectation) to proof this Theorem.
101
Lecture 32:
Fr 11/10/00
4.7 Conditional Expectation
In Section 4.1, we established that the conditional pmf of X given Y = yj (for PY (yj ) > 0)
(x;y)
is a pmf. For continuous rv's X and Y , when fY (y) > 0, fX jY (x j y) = fX;Y
fY (y) , and fX;Y
and fY are continuous, then fX jY (x j y) is a pdf and it is the conditional pdf of X given Y = y.
Denition 4.7.1:
Let X; Y be rv's on (
; L; P ). Let h be a Borel{measurable function. Assume that E (h(X ))
exists. Then the conditional expectation of h(X ) given Y , i.e., E (h(X ) j Y ), is a rv that
takes the value E (h(X ) j y). It is dened as
8X
>
h(x)P (X = x j Y = y); if (X; Y ) is discrete and P (Y = y) > 0
>
< x2X
E (h(X ) j y) = > Z
1
>
:
h(x)fX jY (x j y)dx;
if (X; Y ) is continuous and fY (y) > 0
;1
Note:
(i) The rv E (h(X ) j Y ) = g(Y ) is a function of Y as a rv.
(ii) The usual properties of expectations apply to the conditional expectation:
(a) E (c j Y ) = c 8c 2 IR.
(b) E (aX + b j Y ) = aE (X j Y ) + b 8a; b 2 IR.
(c) If g1 ; g2 are Borel{measurable functions and if E (g1 (X )); E (g2 (X )) exist, then
E (a1 g1 (X ) + a2g2 (X ) j Y ) = a1 E (g1 (X ) j Y ) + a2 E (g2 (X ) j Y ) 8a1; a2 2 IR.
(d) If X 0 then E (X j Y ) 0.
(e) If X1 X2 then E (X1 j Y ) E (X2 j Y ).
(iii) Moments are dened in the usual way. If E (j X jr ) < 1, then E (X r j Y ) exists and is
the rth conditional moment of X given Y .
102
Example 4.7.2:
Recall Example 4.1.12:
(
2; 0 < x < y < 1
0; otherwise
The conditional pdf's fY jX (y j x) and fX jY (x j y) have been calculated as:
fX;Y (x; y) =
and
fY jX (y j x) = 1 ;1 x for x < y < 1 (where 0 < x < 1)
fX jY (x j y) = y1 for 0 < x < y (where 0 < y < 1).
So,
and
E (X j y) =
Zyx
0
y
dx
=
y
2
Z1 1
2 1 1 1 ; x2 1 + x
1
y
E (Y j x) = 1 ; x y dy = 1 ; x 2 = 2 1 ; x = 2 :
x
x
Therefore, we get the rv's E (X j Y ) = Y2 and E (Y j X ) = 1+2X .
Theorem 4.7.3:
If E (h(X )) exists, then
EY (EX jY (h(X ) j Y )) = E (h(X )):
Proof:
Continuous case only:
EY (E (h(X ) j Y )) =
Z1
E (h(x) j y)fY (y)dy
;1
Z1Z1
=
h(x)fX jY (x j y)fY (y)dxdy
;1 ;1
Z1
Z1
=
h(x)
fX;Y (x; y)dydx
;1
;1
Z1
=
h(x)fX (x)dx
;1
= E (h(X ))
103
Theorem 4.7.4:
If E (X 2 ) exists, then
V arY (E (X j Y )) + EY (V ar(X j Y )) = V ar(X ):
Proof:
V arY (E (X j Y )) + EY (V ar(X j Y )) = EY ((E (X j Y ))2 ) ; (EY (E (X j Y )))2
+ EY (E (X 2 j Y ) ; (E (X j Y ))2 )
= EY ((E (X j Y ))2 ) ; (E (X ))2 + E (X 2 ) ; EY ((E (X j Y ))2 )
= E (X 2 ) ; (E (X ))2
= V ar(X )
Note:
If E (X 2 ) exists, then V ar(X ) V arY (E (X j Y )). V ar(X ) = V arY (E (X j Y )) i X = g(Y ).
The inequality directly follows from Theorem 4.7.4.
For equality, it is necessary that EY (V ar(X j Y )) = EY ((X ; E (X j Y ))2 j Y ) = 0 which
holds if X = E (X j Y ) = g(Y ).
If X; Y are independent, FX jY (x j y) = FX (x) 8x.
Thus, if E (h(X )) exists, then E (h(X ) j Y ) = E (h(X )).
Proof: (of Theorem 4.6.8)
MS (t)
N
Def:
=
=
E (etS )
N
1
0
CC
BB
N
X
C
B
EB
B@exp(t i=1 Xi )CCA
| {z }
\h(X )00
Th:=4:7:3
0
1
!CC
BB
N
X
EN B
exp(t Xi ) j N C
N
B@EX
CA
i=1
Xi j N
i=1
104
First consider
EX
N
i=1
Xi j N
exp(t
N
X
i=1
Xi ) j n
!
=
EX
n
i=1
n
Th: 4:6:4(i) Y
=
Xi iid
=
=
=) MSN (t)
105
i=1
exp(t
Xi
i=1
MX (t)
i
MX (t)
(MX (t))n
N
Y
!
EN
=
EN ((MX (t))N )
=
EN (exp(N ln MX (t)))
()
MN (ln MX (t))
i=1
!
Xi )
=
=
() holds since MN (k) = EN (exp(N k)).
i=1
n
Y
n
X
MX (t)
Lecture 33:
Mo 11/13/00
4.8 Inequalities and Identities
Lemma 4.8.1:
Let a; b be positive numbers and p; q > 1 such that p1 + 1q = 1 (i.e., pq = p + q and q = p;p 1 ).
Then it holds that
1 ap + 1 bq ab
p
q
with equality i ap = bq .
Proof:
Fix b. Let
g(a) = 1p ap + 1q bq ; ab
=) g0 (a) = ap;1 ; b =! 0
=) b = ap;1
=) bq = a(p;1)q = ap
g00 (a) = (p ; 1)ap;2 > 0
Since g00 (a) > 0, this is really a minimum. The minimum value is obtained for b = ap;1 and
it is
1 ap + 1 (ap;1 )q ; aap;1 = 1 ap + 1 ap ; ap = ap( 1 + 1 ; 1) = 0:
p
q
p
q
p q
Since g00 (a) > 0, the minimum is unique and g(a) 0. Therefore,
g(a) + ab = 1 ap + 1 bq ab:
p
q
Theorem 4.8.2: Holder's Inequality
Let X; Y be 2 rv's. Let p; q > 1 such that 1p + 1q = 1 (i.e., pq = p + q and q = p;p 1 ). Then it
holds that
1
1
E (j XY j) (E (j X jp )) p (E (j Y jq )) q :
Proof:
In Lemma 4.8.1, let
a=
jXj
jY j :
1 and b =
1
(E (j X jp ))
(E (j Y jq ))
p
q
106
1 j X jp + 1 j Y jq j XY j
p E (j X jp) q E (j Y jq ) (E (j X jp )) p1 (E (j Y jq )) 1q
Taking expectations on both sides of this inequality, we get
E (j XY j)
1 = p1 + 1q 1
1
(E (j X jp )) p (E (j Y jq )) q
1
1
The result follows immediately when multiplying both sides with (E (j X jp )) p (E (j Y jq )) q .
Lemma
=)4:8:1
Note:
Note that Theorem 4.5.7 (ii) (Cauchy{Schwarz Inequality) is a special case of Theorem 4.8.2
with p = q = 2.
Theorem 4.8.3: Minkowski's Inequality
Let X; Y be 2 rv's. Then it holds for 1 p < 1 that
1
1
1
(E (j X + Y jp )) p (E (j X jp )) p + (E (j Y jp )) p :
Proof:
E (j X + Y jp)
=
E (j X + Y j j X + Y jp;1 )
E ((j X j + j Y j) j X + Y jp;1 )
=
E (j X j j X + Y jp;1) + E (j Y j j X + Y jp;1 )
Th:4:8:2
()
=
1
1
( E (j X jp ) ) p ( E ( (j X + Y jp;1 )q ) ) q
1
1
+ ( E (j Y jp ) ) p ( E ( (j X + Y jp;1 )q ) ) q (A)
1
1
1
( E (j X jp ) ) p + ( E (j Y jp ) ) p ( E (j X + Y jp ) ) q
1
Divide the left and right side of this inequality by (E (j X + Y jp )) q .
1
1
The result on the left side1 is (E (j X + Y 1jp))1; q = (E (j X + Y jp )) p , and the result on the
right side is ( E (j X jp ) ) p + ( E (j Y jp ) ) p . Therefore, Theorem 4.8.3 holds.
In (A), we dene q = p;p 1 . Therefore, p1 + 1q = 1 and (p ; 1)q = p.
() holds since p1 + 1q = 1, a condition to meet Holder's Inequality.
107
Denition 4.8.4:
A function g(x) is convex if
g(x + (1 ; )y) g(x) + (1 ; )g(y) 8x; y 2 IR 80 < < 1:
Note:
(i) Geometrically, a convex function falls above all of its tangent lines. Also, a connecting
line between any pairs of points (x; g(x)) and (y; g(y)) in the 2{dimensional plane always
falls above the curve.
(ii) A function g(x) is concave i ;g(x) is convex.
Theorem 4.8.5: Jensen's Inequality
Let X be a rv. If g(x) is a convex function, then
E (g(X )) g(E (X ))
given that both expectations exist.
Proof:
Construct a tangent line l(x) to g(x) at the (constant) point x0 = E (X ):
l(x) = ax + b for some a; b 2 IR
The function g(x) is a convex function and falls above the tangent line l(x)
=) g(x) ax + b 8x 2 IR
=) E (g(X )) E (aX + b) = aE (X ) + b = l(E (X )) tangent= point g(E (X ))
Therefore, Theorem 4.8.5 holds.
Note:
Typical convex functions g are:
(i) g1 (x) =j x j ) E (j X j) j E (X ) j.
(ii) g2 (x) = x2 ) E (X 2 ) (E (X ))2 ) V ar(X ) 0.
(iii) g3 (x) = x1p for x > 0; p > 0 ) E ( X1p ) (E (X1 ))p ; for p = 1: E ( X1 ) E (1X )
(iv) Other convex functions are xp for x > 0; p 1; x for > 1; ; ln(x) for x > 0; etc.
108
(v) Recall that if g is convex and dierentiable, then g00 (x) 0 8x.
(vi) If the function g is concave, the direction of the inequality in Jensen's Inequality is
reversed, i.e., E (g(X )) g(E (X )).
Example 4.8.6:
Given the real numbers a1 ; a2 ; : : : ; an > 0, we dene
n
X
arithmetic mean : aA = n1 (a1 + a2 + : : : + an) = n1 ai
1
geometric mean : aG = (a1 a2 : : : an) =
n
n
Y
i=1
i=1
ai
!1
n
1
= X
harmonic mean : aH = 1 1 1 1
n 1
1
1
n a1 + a2 + : : : + an
n
i=1 ai
Let X be a rv that takes values a1 ; a2 ; : : : ; an > 0 with probability n1 each.
(i) aA aG :
ln(aA )
n
X
ln n1 ai
=
!
i=1
ln(E (X ))
ln concave
E (ln(X ))
n 1
X
=
n ln(ai )
=
i=1
n
1X
ln(a )
=
n i=1
n
1 ln( Y
a)
=
i
i=1
n
Y
1
ln(( ai ) n )
i=1
ln(aG )
n
=
=
Taking the anti{log of both sides gives aA aG .
(ii) aA aH :
1
aA
=
1
1
n
109
i
n
X
i=1
ai
Lecture 35/1:
Fr 11/17/00
1
E (X )
1=X convex
E( 1 )
=
X
=
=
=
Inverting both sides gives aA aH .
n 11
X
i=1n ai
1 + 1 + ::: 1 n a1 a2
an
1
1
aH
(iii) aG aH :
; ln(aH )
ln( a1 )
1H 1 1
1
=
ln n ( a + a + : : : + a )
1
2
n
1
=
ln(E ( X ))
ln concave
E (ln( 1 ))
=
=
=
=
=
=
=
n 1
X
X
1
ln( )
i=1 n ai
n
1X
ln( 1 )
n i=1
n
1X
ai
n i=1 ; ln ai
n
Y
; n1 ln( ai)
i=1
n
Y
; ln( ai ) 1
i=1
; ln aG
n
Multiplying both sides with ;1 gives ln aH ln aG . Then taking the anti{log of both
sides gives aH aG .
In summary, aH aG aA . Note that it would have been sucient to prove steps (i) and
(iii) only to establish this result. However, step (ii) has been included to provide another
example how to apply Theorem 4.8.5.
110
Theorem 4.8.7: Covariance Inequality
Let X be a rv with nite mean .
(i) If g(x) is non{decreasing, then
E (g(X )(X ; )) 0
if this expectation exists.
(ii) If g(x) is non{decreasing and h(x) is non{increasing, then
E (g(X )h(X )) E (g(X ))E (h(X ))
if all expectations exist.
(iii) If g(x) and h(x) are both non{decreasing or if g(x) and h(x) are both non{increasing,
then
E (g(X )h(X )) E (g(X ))E (h(X ))
if all expectations exist.
Proof:
Homework
111
Lecture 34:
We 11/15/00
5 Particular Distributions
5.1 Multivariate Normal Distributions
Denition 5.1.1:
A rv X has a (univariate) Normal distribution, i.e., X N (; 2 ) with 2 IR and > 0,
i it has the pdf
1
2
fX (x) = p 1 2 e; 22 (x;) :
2
X has a standard Normal distribution i = 0 and 2 = 1, i.e., X N (0; 1).
Note:
If X N (; 2 ), then E (X ) = and V ar(X ) = 2 .
If X1 N (1 ; 12 ); X2 N (2 ; 22 ), X1 and X2 independent, and c1 ; c2 2 IR, then
Y = c1 X1 + c2 X2 N (c1 1 + c2 2 ; c21 12 + c22 22 ):
Denition 5.1.2:
A 2{rv (X; Y ) has a bivariate Normal distribution i there exist constants a11 ; a12 ; a21 ; a22 ; 1 ; 2 2
IR and iid N (0; 1) rv's Z1 and Z2 such that
X = 1 + a11 Z1 + a12 Z2 ; Y = 2 + a21 Z1 + a22Z2 :
If we dene
!
!
!
!
A = a11 a12 ; = 1 ; X = X ; Z = Z1 ;
a21 a22
2
Y
Z2
then we can write
X = AZ + :
Note:
(i) Recall that for X N (; 2 ), X can be dened as X = Z + , where Z N (0; 1).
(ii) E (X ) = 1 + a11 E (Z1 ) + a12 E (Z2 ) = 1 and E (Y ) = 2 + a21 E (Z1 ) + a22 E (Z2 ) = 2 .
The marginal distributions are X N (1 ; a211 + a212 ) and Y N (2 ; a221 + a222 ). Thus,
X and Y have (univariate) Normal marginal densities or degenerate marginal densities
(which correspond to Dirac distributions) if ai1 = ai2 = 0.
112
(iii) There exists another (equivalent) formulation of the previous dention using the joint
pdf (see Rohatgi, page 227).
Example:
Let X N (1 ; 12 ); Y N (2 ; 22 ), X and Y independent.
What is the distribution of
X
Y
!
?
Since X N (1 ; 12 ), it follows that X = 1 + 1 Z1 , where Z1 N (0; 1).
Since Y N (2 ; 22 ), it follows that Y = 2 + 2 Z2 , where Z2 N (0; 1).
Therefore,
!
!
!
!
X = 1 0
Z1 + 1 :
Y
0 2
Z2
2
Since X; Y are independent, it follows by Theorem 4.2.7 that Z1 = X;11 and Z2 = X!;22 are
independent. Thus, Z1 ; Z2 are iid N (0; 1). It follows from Denition 5.1.2 that
bivariate Normal distribution.
X
Y
has a
Theorem 5.1.3:
Dene g : IR2 ! IR2 as g(x) = Cx + d. If X is a bivariate Normal rv, then g(X ) also is a
bivariate Normal rv.
Proof:
g(X ) = CX + d
= C (AZ + ) + d
= (|CA
{z }) Z + (C + d)
another matrix
| {z }
another vector
~ + ~ which represents another bivariate Normal distribution
= AZ
Note:
1 2 = Cov(X; Y ) = Cov(a11 Z1 + a12 Z2 ; a21 Z1 + a22 Z2)
= a11 a21 Cov(Z1 ; Z1 ) + (a11 a22 + a12 a21 )Cov(Z1 ; Z2 ) + a12 a22 Cov(Z2 ; Z2 )
= a11 a21 + a12 a22
since Z1 ; Z2 are iid N (0; 1) rv's.
113
Denition 5.1.4:
The variance{covariance matrix of (X; Y ) is
= AA0 = a11 a12
a21 a22
Note:
and
!
a11 a21
a12 a22
!
a211 + a212
a11 a21 + a12 a22
=
a11 a21 + a12 a22
a221 + a222
!
!
12 1 2 :
=
1 2 22
det() = j j = 1222 ; 2 12 22 = 12 22 (1 ; 2 );
q
q
jj = 12 1 ; 2
;1 = j1 j 22 ;1 2
;12 12
! 0
=@
1
12 (1;2 )
;
1 2 (1;2 )
;
1 2 (1;2 )
1
22 (1;2 )
1
A
Theorem 5.1.5:
Assume that 1 > 0; 2 > 0 and j j< 1. Then the joint pdf of X = (X; Y ) = AZ + (as
dened in Denition 5.1.2) is
1
1
0
;
1
fX (x) = p exp ; 2 (x ; ) (x ; )
2 j j
x ; 1 2
x ; 1 y ; 2 y ; 2 2!!
1
1
p
=
exp ; 2(1 ; 2 )
; 2 1
2 + 2
21 2 1 ; 2
1
Proof:
Since is positive denite and symmetric, A is invertible.
The mapping Z ! X is 1{to{1:
X = AZ + =) Z = A;1 (X ; )
J = jA;1 j = jA1 j
q
q
q
q
q
q
jAj = jAj2 = jAj jAT j = jAAT j = jj = 1222 ; 212 22 = 12 1 ; 2
We can use this result to get to the second line of the theorem:
2
2
fZ (z ) = p1 e; 21 p1 e; 22
2
2
1
= 21 e; 2 (z z)
z
T
114
z
As already stated, the mapping from Z to X is 1{to{1, so we can apply Theorem 4.3.5:
p1 exp(; 21 (x ; )T (|A;1{z)T A;1}(x ; ))
fX (x) =
2 jj
;1
1 exp(; 1 (x ; )T ;1 (x ; ))
()
p
=
2
2 jj
This proves the 1st line of the Theorem. Step () holds since
(A;1 )T A;1 = (AT );1 A;1 = (AAT );1 = ;1 :
The second line of the Theorem is based on the transformations stated in the Note following
Denition 5.1.4:
x ; 1 2
x ; 1 y ; 2 y ; 2 2!!
1
1
p
exp ; 2(1 ; 2 )
fX (x) =
; 2 1
2 + 2
21 2 1 ; 2
1
Note:
In the situation of Theorem 5.1.5, we say that (X; Y ) N (1 ; 2 ; 12 ; 22 ; ).
Theorem 5.1.6:
If (X; Y ) has a non{degenerate N (1 ; 2 ; 12 ; 22 ; ) distribution, then the conditional distribution of X given Y = y is
N (1 + 1 (y ; 2); 12 (1 ; 2 )):
2
Proof:
Homework
Example 5.1.7:
Let rv's (X1 ; Y1 ) be N (0; 0; 1; 1; 0) with pdf f1 (x; y) and (X2 ; Y2 ) be N (0; 0; 1; 1; ) with pdf
f2(x; y). Let (X; Y ) be the rv that corresponds to the pdf
fX;Y (x; y) = 21 f1 (x; y) + 12 f2(x; y):
(X; Y ) is a bivariate Normal rv i = 0. However, the marginal distributions of X and Y are
always N (0; 1) distributions. See also Rohatgi, page 229, Remark 2.
115
Lecture 35/2:
Theorem 5.1.8:
Fr 11/17/00
The mgf MX (t) of a bivariate Normal rv X = (X; Y ) is
1
1
0
0
2
2
2
2
MX (t) = MX;Y (t1 ; t2 ) = exp( t+ 2 t t) = exp 1t1 + 2 t2 + 2 (1 t1 + 2 t2 + 21 2 t1 t2 ) :
Proof:
The mgf of a univariate Normal rv X N (; 2 ) will be used to develop the mgf of a bivariate
Normal rv X = (X; Y ):
MX (t) = E (exp(tX ))
=
=
=
=
=
=
=
=
1
1
2
exp(tx) p 2 exp ; 22 (x ; ) dx
;1
2
Z1 1
p 2 exp ; 21 2 [;22 tx + (x ; )2] dx
;1 2
1
Z1 1
2
2
2
p
exp ; 22 [;2 tx + (x ; 2x + )] dx
;1 22
Z1 1
p 2 exp ; 21 2 [x2 ; 2( + 2 t)x + ( + t2)2 ; ( + t2)2 + 2] dx
;1 2
Z 1 1
p 2 exp ; 21 2 [x ; ( + t2)]2 dx
exp ; 21 2 [;( + t2 )2 + 2 ]
{z
}
| ;1 2
pdf of N ( + t2 ; 2 ), that integrates to 1
exp ; 21 2 [;2 ; 2t2 ; t2 4 + 2 ]
2 ; t2 4 !
;
2
t
exp
;22
exp t + 12 2 t2
Z1
Bivariate Normal mgf:
MX;Y (t1 ; t2 ) = E (exp(t1 X + t2Y ))
=
Z1Z1
exp(t1 x + t2 y) fX;Y (x; y) dx dy
;1 ;1
Z1Z1
=
exp(t1 x) exp(t2 y) fX (x) fY jX (y j x) dy dx
;1 ;1
Z 1 Z 1
=
exp(t2 y) fY jX (y j x) dy exp(t1 x) fX (x) dx
;1 ;1
116
(A)
=
(B )
=
=
=
=
(C )
=
=
=
! !
Z1 Z1
;1
Z1
;1
2
pexp(t22yp) exp ;2(2y(1;;X2)) dy
;1 2 1 ; 2
2
exp(t1 x) fX (x) dx
j X = 2 + 2 (x ; 1)
exp X t2 + 21 22 (1 ; 2 )t22 exp(t1 x) fX (x) dx
1
exp [2 + 2 (x ; 1 )]t2 + 12 22 (1 ; 2 )t22 + t1 x fX (x) dx
;1
1
Z1
1
2
2
2
2
2
exp 2 t2 + t2 x ; 1 t2 + 2 2 (1 ; )t2 + t1 x fX (x) dx
;1
1
1
1
Z 1 exp 2 22 (1 ; 2 )t22 + t2 2 ; 2 1 t2
exp (t1 + 2 t2 )x fX (x) dx
;1
1
1
1
1
2
2
2
2
2
2
2
2
exp 2 2 (1 ; )t2 + t2 2 ; 1 t2 exp 1 (t1 + t2 ) + 2 1 (t1 + t2 )
1
1
1
1
exp 2 22 t22 ; 12 2 22 t22 + 2 t2 ; 1 2 t2 + 1 t1 + 1 2 t2 + 21 12 t21 + 1 2 t1 t2 + 12 2 22 t22
1
1
!
22
22
exp 1 t1 + 2 t2 + 1 t1 + 2 t2 2+ 21 2 t1 t2
Z1
(A) follows from Theorem 5.1.6 since Y j X N (X ; 22 (1 ; 2 )). (B ) follows when we apply
our calculations of the mgf of a N (; 2 ) distribution to a N (X ; 22 (1 ; 2 )) distribution. (C )
holds since the integral represents MX (t1 + 21 t2 ).
Corollary 5.1.9:
Let (X; Y ) be a bivariate Normal rv. X and Y are independent i = 0.
Denition 5.1.10:
Let Z be a k{rv of k iid N (0; 1) rv's. Let A 2 IRkk be a k k matrix, and let 2 IRk be
a k{dimensional vector. Then X = AZ + has a multivariate Normal distribution with
mean vector and variance{covariance matrix = AA0 .
Note:
(i) If is non{singular, X has the joint pdf
1
1
0
;
1
fX (x) = (2)k=2 (j j)1=2 exp ; 2 (x ; ) (x ; ) :
If is singular, the joint pdf does exist but it cannot be written explicitly.
117
Lecture 36:
Mo 11/20/00
(ii) If is singular, then X ; takes values in a linear subspace of IRk with probability 1.
(iii) If is non{singular, then X has mgf
MX (t) = exp(0 t + 12 t0t):
(iv) X has characteristic function
X (t) = exp(i0 t ; 21 t0 t)
(no matter whether is singular or non{singular).
(v) See Searle, S. R. (1971): \Linear Models", Chapter 2.7, for more details on singular
Normal distributions.
Theorem 5.1.11:
The components X1 ; : : : ; Xk of a normally distributed k{rv X are independent i Cov(Xi ; Xj ) =
0 8i; j = 1; : : : ; k; i 6= j .
Theorem 5.1.12:
Let X = (X1 ; : : : ; Xk )0 . X has a k{dimensional Normal distribution i every linear function
of X , i.e., X 0 t = t1 X1 + t2 X2 + : : : + tk Xk , has a univariate Normal distribution.
Proof:
The Note following Denition 5.1.1 states that any linear function of two Normal rv's has a
univariate Normal distribution. By induction on k, we can show that every linear function of
X , i.e., X 0 t, has a univariate Normal distribution.
Conversely, if X 0 t has a univariate Normal distribution, we know from Theorem 5.1.8 that
1
0
0
2
MX 0 t (s) = exp E (X t) s + 2 V ar(X t) s
= exp 0 ts + 12 t0 ts2
1
0
0
=) MX 0 t (1) = exp t + 2 t t
= MX (t)
By uniqueness of the mgf and Note (iii) that follows Denition 5.1.10, X has a multivariate
Normal distribution.
118
5.2 Exponential Family of Distributions
Denition 5.2.1:
Let # be an interval on the real line. Let ff (; ) : 2 #g be a family of pdf's (or pmf's). We
assume that the set fx : f (x; ) > 0g is independent of , where x = (x1 ; : : : ; xn ).
We say that the family ff (; ) : 2 #g is a one{parameter exponential family if there
exist real{valued functions Q() and D() on # and Borel{measurable functions T (X ) and
S (X ) on IRn such that
f (x; ) = exp(Q()T (x) + D() + S (x)):
Note:
We can also write f (x; ) as
f (x; ) = h(x)c() exp(T (x))
where h(x) = exp(S (x)), = Q(), and c() = exp(D(Q;1 ())), and call this the exponential family in canonical form for a natural parameter .
Denition 5.2.2:
Let # IRk be a k{dimensional interval. Let ff (; ) : 2 #g be a family of pdf's (or pmf's).
We assume that the set fx : f (x; ) > 0g is independent of , where x = (x1 ; : : : ; xn ).
We say that the family ff (; ) : 2 #g is a k{parameter exponential family if there
exist real{valued functions Q1 (); : : : Qk () and D() on # and Borel{measurable functions
T1 (X ); : : : ; Tk (X ) and S (X ) on IRn such that
f (x; ) = exp
k
X
i=1
!
Qi ()Ti (x) + D() + S (x) :
Note:
Similar to the Note following Denition 5.2.1, we can express the k{parameter exponential
family in canonical form for a natural k 1 parameter vector = (1 ; : : : ; k )0 .
119
Example 5.2.3:
Let X N (; 2 ) with both parameters and 2 unknown. We have:
2
f (x; ) = p 1 2 exp ; 21 2 (x ; )2 = exp ; 21 2 x2 + 2 x ; 22 ; 12 ln(22 )
2
= (; 2 )
# = f(; 2 ) : 2 IR; 2 > 0g
Therefore,
Q1 ()
T1 (x)
Q2 ()
T2 (x)
D()
S (x)
=
=
=
=
; 21 2
x2
2
x
2
= ; 22 ; 21 ln(22 )
= 0
Thus, this is a 2{parameter exponential family.
120
!
6 Limit Theorems
Motivation:
I found this slide from my Stat 250, Section 003, \Introductory Statistics" class (an undergraduate class I taught at George Mason University in Spring 1999):
What does this mean at a more theoretical level???
121
Lecture 37:
Mo 11/27/00
6.1 Modes of Convergence
Denition 6.1.1:
Let X1 ; : : : ; Xn be iid rv's with common cdf FX (x). Let T = T (X ) be any statistic, i.e., a
Borel{measurable function of X that does not involve the population parameter(s) #, dened
on the support X of X . The induced probability distribution of T (X ) is called the sampling
distribution of T (X ).
Note:
(i) Commonly used statistics are:
n
X
1
Sample Mean: X n = n Xi
i=1
Sample Variance: Sn2 = n;1 1
n
X
i=1
(Xi ; X n )2
Sample Median, Order Statistics, Min, Max, etc.
(ii) Recall that if X1 ; : : : ; Xn are iid and if E (X ) and V ar(X ) exist, then E (X n ) = =
E (X ), E (Sn2 ) = 2 = V ar(X ), and V ar(X n) = n2 .
(iii) Recall that if X1 ; : : : ; Xn are iid and if X has mgf MX (t) or characteristic function X (t)
then MX n (t) = (MX ( nt ))n or X n (t) = (X ( nt ))n .
Note: Let fXn g1
n=1 be a sequence of rv's on some probability space (
; L; P ). Is there any
meaning behind the expression nlim
!1 Xn = X ? Not immediately under the usual denitions
of limits. We rst need to dene modes of convergence for rv's and probabilities.
Denition 6.1.2:
1
Let fXn g1
n=1 be a sequence of rv's with cdf's fFn gn=1 and let X be a rv with cdf F . If
Fn (x) ! F (x) at all continuity points of F , we say that Xn converges in distribution to
d X ) or X converges in law to X (X ;!
L X ), or F converges weakly to F
X (Xn ;!
n
n
n
w
(Fn ;! F ).
Example 6.1.3:
Let Xn N (0; n1 ). Then
Z x exp ; 21 nt2
q 2 dt
Fn (x) =
;1
122
n
Z pnx exp(; 1 s2)
p 2 ds
=
2
;1
p
= ( nx)
(
8
>
< (1) =11; if x > 0
=) Fn (x) ! > (0) = 2 ;
if x = 0
: (;1) = 0; if x < 0
1; x 0 the only point of discontinuity is at x = 0. Everywhere else,
0; x < 0
p
( nx) = Fn (x) ! FX (x).
d X , where P (X = 0) = 1, or X ;!
d 0 since the limiting rv here is degenerate,
So, Xn ;!
n
i.e., it has a Dirac(0) distribution.
If FX (x) =
Example 6.1.4:
In this example, the sequence fFn g1
n=1 converges pointwise to something that is not a cdf:
Let Xn Dirac(n), i.e., P (Xn = n) = 1. Then,
(
Fn (x) = 0; x < n
1; x n
d X.
It is Fn (x) ! 0 8x which is not a cdf. Thus, there is no rv X such that Xn ;!
Example 6.1.5:
1
1
Let fXn g1
n=1 be a sequence of rv's such that P (Xn = 0) = 1 ; n and P (Xn = n) = n and let
X Dirac(0), i.e., P (X = 0) = 1.
It is
8
>
< 0; 1 x < 0
Fn(x) = > 1 ; n ; 0 x < n
: 1;
xn
FX (x) =
(
0; x < 0
1; x 0
w F but
It holds that Fn ;!
X
E (Xnk ) = nk;1 6! E (X k ) = 0:
Thus, convergence in distribution does not imply convergence of moments/means.
123
Note:
Convergence in distribution does not say that the Xi 's are close to each other or to X . It only
means that their cdf's are (eventually) close to some cdf F . The Xi 's do not even have to be
dened on the same probability space.
Example 6.1.6:
d
Let X and fXn g1
n=1 be iid N (0; 1). Obviously, Xn ;! X but nlim
!1 Xn 6= X .
Theorem 6.1.7:
Let X and fXn g1
rv's with support X and fXn g1
n=1 be discrete
n=1 , respectively. Dene
1
[
the countable set A = X [ Xn = fak : k = 1; 2; 3; : : : g. Let pk = P (X = ak ) and
n=1
d X.
pnk = P (Xn = ak ). Then it holds that pnk ! pk 8k i Xn ;!
Theorem 6.1.8:
1
Let X and fXn g1
n=1 be continuous rv's with pdf's f and ffn gn=1 , respectively. If fn (x) ! f (x)
d X.
for almost all x as n ! 1 then Xn ;!
Theorem 6.1.9:
d
Let X and fXn g1
n=1 be rv's such that Xn ;! X . Let c 2 IR be a constant. Then it holds:
d X + c.
(i) Xn + c ;!
d cX .
(ii) cXn ;!
d aX + b.
(iii) If an ! a and bn ! b, then an Xn + bn ;!
Proof:
Part (iii):
Suppose that a > 0; an > 0. (If a < 0; an < 0, the result follows via (ii) and c = ;1.)
Let Yn = an Xn + bn and Y = aX + b. It is
Likewise,
FY (y) = P (Y < y) = P (aX + b < y) = P (X < y ;a b ) = FX ( y ;a b ):
FY (y) = FX ( y ;a bn ):
n
n
n
If y is a continuity point of FY , y;a b is a continuity point of FX . Since an ! a; bn ! b and
FXn (x) ! FX (x), it follows that FYn (y) ! FY (y) for every continuity point y of FY . Thus,
d aX + b.
an Xn + bn ;!
124
Lecture 38:
Denition 6.1.10:
We 11/29/00
Let fXn g1
n=1 be a sequence of rv's dened on a probability space (
; L; P ). We say that Xn
p
converges in probability to a rv X (Xn ;!
X , P- nlim
!1 Xn = X ) if
nlim
!1 P (j Xn ; X
j> ) = 0 8 > 0:
Note:
The following are equivalent:
nlim
!1 P (j Xn ; X
j> ) = 0
() nlim
!1 P (j Xn ; X j ) = 1
() nlim
!1 P (f! : j Xn (!) ; X (!) j> )) = 0
If X is degenerate, i.e., P (X = c) = 1, we say that Xn is consistent for c. For example, let
Xn such that P (Xn = 0) = 1 ; n1 and P (Xn = 1) = n1 . Then
P (j Xn j> ) =
(
1
n;
0<<1
0; 1
p
Therefore, nlim
!1 P (j Xn j> ) = 0 8 > 0. So Xn ;! 0, i.e., Xn is consistent for 0.
Theorem 6.1.11:
p
p
(i) Xn ;!
X () Xn ; X ;!
0.
p
p
(ii) Xn ;!
X; Xn ;!
Y =) P (X = Y ) = 1.
p
p
p
(iii) Xn ;!
X; Xm ;!
X =) Xn ; Xm ;!
0 as n; m ! 1.
p
p
p
(iv) Xn ;!
X; Yn ;!
Y =) Xn Yn ;!
X Y.
p
p
(v) Xn ;!
X; k 2 IR a constant =) kXn ;!
kX .
p
p r
(vi) Xn ;!
k; k 2 IR a constant =) Xnr ;!
k 8r 2 IN .
p
p
p
(vii) Xn ;!
a; Yn ;!
b; a; b 2 IR =) Xn Yn ;!
ab.
p
p
(viii) Xn ;!
1 =) Xn;1 ;!
1.
p
p
p a
(ix) Xn ;!
a; Yn ;!
b; a 2 IR; b 2 IR ; f0g =) XYnn ;!
b.
125
p
p
(x) Xn ;!
X; Y an arbitrary rv =) XnY ;!
XY .
p
p
p
(xi) Xn ;!
X; Yn ;!
Y =) XnYn ;!
XY .
Proof:
See Rohatgi, page 244{245 for partial proofs.
Theorem 6.1.12:
p
p
Let Xn ;!
X and let g be a continuous function on IR. Then g(Xn ) ;!
g(X ).
Proof:
Preconditions:
1.) X rv =) 8 > 0 9k = k() : P (jX j > k) < 2
2.) g is continuous on IR
=) g is also uniformly continuous on [;k; k] (see Denition of uniformly continuous
in Theorem 3.3.3 (iii))
=) 9 = (; k) : jX j k; jXn ; X j < ) jg(Xn ) ; g(X )j < Let
A = fjX j kg = f! : jX (!)j kg
B = fjXn ; X j < g = f! : jXn(!) ; X (!)j < g
C = fjg(Xn ) ; g(X )j < g = f! : jg(Xn (!)) ; g(X (!))j < g
If ! 2 A \ B
:)
=2)
!2C
=) A \ B C
=) C C (A \ B )C = AC [ B C
=) P (C C ) P (AC [ B C ) P (AC ) + P (B C )
Now:
P (jg(Xn ) ; g(X )j ) P| (jX{zj > k}) +
2 by 1.)
P| (jXn ;{zX j })
p
X
2 for nn0 (;;k) since Xn ;!
for n n0(; ; k)
Corollary 6.1.13:
p
p
Let Xn ;!
c; c 2 IR and let g be a continuous function on IR. Then g(Xn ) ;!
g(c).
126
Theorem 6.1.14:
p
d X.
Xn ;!
X =) Xn ;!
Proof:
p
Xn ;!
X , P (jXn ; X j > ) ! 0 as n ! 1 8 > 0
It holds:
P (X x ; ) = P (X x ; ; jXn ; X j ) + P (X x ; ; jXn ; X j > )
(A)
P (Xn x) + P (jXn ; X j > )
(A) holds since X x ; and Xn within of X , thus Xn x.
Similarly, it holds:
P (Xn x) = P (Xn x; j Xn ; X j ) + P (Xn x; j Xn ; X j> )
P (X x + ) + P (jXn ; X j > )
Combining the 2 inequalities from above gives:
P (X x ; ) ; P| (jXn ;{zX j > }) P| (Xn{z x}) P (X x + ) + P| (jXn ;{zX j > })
!0 as n!1
Therefore,
=Fn (x)
!0 as n!1
P (X x ; ) Fn(x) P (X x + ) as n ! 1:
Since the cdf's Fn () are not necessarily left continuous, we get the following result for # 0:
P (X < x) Fn (x) P (X x) = FX (x)
Let x be a continuity point of F . Then it holds:
F (x) = P (X < x) Fn (x) F (x)
=) Fn (x) ! F (x)
d X
=) Xn ;!
Theorem 6.1.15:
Let c 2 IR be a constant. Then it holds:
p
d c () X ;!
Xn ;!
c:
n
127
Example 6.1.16:
In this example, we will see that
p
d X 6=) X ;!
Xn ;!
X
n
for some rv X . Let Xn be identically distributed rv's and let (Xn ; X ) have the following joint
distribution:
Xn 0 1
X
0
1
0
1
2
1
2
1
2
0
1
2
1
2
1
2
1
d X since all have exactly the same cdf, but for any 2 (0; 1), it is
Obviously, Xn ;!
P (j Xn ; X j> ) = P (j Xn ; X j= 1) = 1 8n;
so nlim
6 p X.
!1 P (j Xn ; X j> ) 6= 0. Therefore, Xn ;!
Theorem 6.1.17:
1
Let fXn g1
n=1 and fYn gn=1 be sequences of rv's and X be a rv dened on a probability space
(
; L; P ). Then it holds:
p
d X; j X ; Y j;!
d X:
Yn ;!
0 =) Xn ;!
n
n
Proof:
Similar to the proof of Theorem 6.1.14. See also Rohatgi, page 253, Theorem 14.
Lecture 41:
Theorem 6.1.18: Slutsky's Theorem
We 12/06/00
1
Let fXn g1
n=1 and fYn gn=1 be sequences of rv's and X be a rv dened on a probability space
(
; L; P ). Let c 2 IR be a constant. Then it holds:
p
d X; Y ;!
d X + c.
(i) Xn ;!
c =) Xn + Yn ;!
n
p
d X; Y ;!
d cX .
(ii) Xn ;!
c =) Xn Yn ;!
n
p
If c = 0, then also Xn Yn ;!
0.
p
d X if c 6= 0.
d X; Y ;!
(iii) Xn ;!
c =) XYnn ;!
n
c
128
Proof:
p Th:6:1:11(i)
p
(i) Yn ;!
c () Yn ; c ;!
0
p
=) Yn ; c = Yn + (Xn ; Xn ) ; c = (Xn + Yn ) ; (Xn + c) ;!
0 (A)
6:1:9(i)
d X Th:=
d X + c (B )
Xn ;!
) Xn + c ;!
Combining (A) and (B ), it follows from Theorem 6.1.17:
d X +c
Xn + Yn ;!
(ii) Case c = 0:
8 > 0 8k > 0, it is
P (j Xn Yn j> ) = P (j XnYn j> ; Yn k ) + P (j Xn Yn j> ; Yn > k )
P (j Xn k j> ) + P (Yn > k )
P (j Xn j> k) + P (j Yn j> k )
p
d X and Y ;!
Since Xn ;!
0, it follows
n
nlim
!1P (j Xn Yn j> ) P (j Xn j> k) ! 0 as k ! 1:
p
Therefore, Xn Yn ;!
0.
Case c 6= 0:
p
d X and Y ;!
Since Xn ;!
c, it follows from (ii), case c = 0, that Xn Yn ; cXn =
n
p
Xn (Yn ; c) ;! 0.
p
=) Xn Yn ;!
cXn
Th:=6)
:1:14 X Y ;!
d cX
n n
n
d cX by Theorem 6.1.9 (ii), it follows from Theorem 6.1.17:
Since cXn ;!
d cX
Xn Yn ;!
p
(iii) Let Zn ;!
1 and let Yn = cZn .
=c6=)0 Y1n = Z1n 1c
Th:6:1:11(v;viii) 1 p 1
=)
Yn ;! c
With part (ii) above, it follows:
p 1
d X and 1 ;!
Xn ;!
Yn
c
d X
=) XYnn ;!
c
129
Denition 6.1.19:
r
Let fXn g1
n=1 be a sequence of rv's such that E (j Xn j ) < 1 for some r > 0. We say that Xn
r X ) if E (j X jr ) < 1 and
converges in the rth mean to a rv X (Xn ;!
nlim
!1 E (j Xn ; X
jr ) = 0:
Example 6.1.20:
1
1
Let fXn g1
n=1 be a sequence of rv's dened by P (Xn = 0) = 1 ; n and P (Xn = 1) = n .
r 0 8r > 0.
It is E (j Xn jr ) = n1 8r > 0. Therefore, Xn ;!
Note:
The special cases r = 1 and r = 2 are called convergence in absolute mean for r = 1
1 X ) and convergence in mean square for r = 2 (X ;!
ms X or X ;!
2 X ).
(Xn ;!
n
n
Theorem 6.1.21:
p
r X for some r > 0. Then X ;!
Assume that Xn ;!
X.
n
Proof:
Using Markov's Inequality (Corollary 3.5.2), it holds for any > 0:
E (j Xn ; X jr ) P (j X ; X j )
n
r
r X =) lim E (j X ; X jr ) = 0
Xn ;!
n
n!1
E (j Xn ; X jr ) = 0
=) nlim
P
(
j
X
;
X
j
)
lim
n
!1
n!1
r
p
=) Xn ;! X
Example 6.1.22:
1
1
Let fXn g1
n=1 be a sequence of rv's dened by P (Xn = 0) = 1 ; nr and P (Xn = n) = nr for
some r > 0.
p
For any > 0, P (j Xn j> ) ! 0 as n ! 1; so Xn ;!
0.
s 0. But E (j X jr ) = 1 6! 0 as
For 0 < s < r, E (j Xn js ) = nr1;s ! 0 as n ! 1; so Xn ;!
n
n ! 1; so Xn ;!
6 r 0.
130
Theorem 6.1.23:
r X , then it holds:
If Xn ;!
r
r
(i) nlim
!1 E (j Xn j ) = E (j X j ); and
s X for 0 < s < r.
(ii) Xn ;!
Proof:
(i) For 0 < r 1, it holds:
()
E (j Xn jr ) = E (j Xn ; X + X jr ) E (j Xn ; X jr + j X jr )
=) E (j Xn jr ) ; E (j X jr ) E (j Xn ; X jr )
r
r
r
=) nlim
!1 E (j Xn j ) ; nlim
!1 E (j X j ) nlim
!1 E (j Xn ; X j ) = 0
r
r
=) nlim
!1 E (j Xn j ) E (j X j ) (A)
() holds due to Bronstein/Semendjajew (1986), page 36 (see Handout)
Similarly,
E (j X jr ) = E (j X ; Xn + Xn jr ) E (j Xn ; X jr + j Xn jr )
=) E (j X jr ) ; E (j Xn jr ) E (j Xn ; X jr )
r
r
r
=) nlim
!1 E (j X j ) ; nlim
!1 E (j Xn j ) nlim
!1 E (j Xn ; X j ) = 0
r
=) E (j X jr ) nlim
!1 E (j Xn j ) (B )
Combining (A) and (B ) gives
r
r
nlim
!1 E (j Xn j ) = E (j X j )
For r > 1, it follows from Minkowski's Inequality (Theorem 4.8.3):
[E (j X ; Xn + Xn jr )] r1 [E (j X ; Xn jr )] 1r + [E (j Xn jr )] 1r
=) [E (j X jr )] r1 ; [E (j Xn jr )] 1r [E (j X ; Xn jr )] 1r
r
r 1r
r 1r
=) [E (j X jr )] 1r ; nlim
!1[E (j Xn ; X j )] = 0 since Xn ;! X
!1[E (j Xn j )] nlim
1
r r
=) [E (j X jr )] 1r nlim
!1[E (j Xn j )] (C )
Similarly,
[E (j Xn ; X + X jr )] 1r [E (j Xn ; X jr )] 1r + [E (j X jr )] r1
r X
r )] 1r ; lim [E (j X jr )] 1r lim [E (j X ;X jr )] 1r = 0 since X ;!
=) nlim
[
E
(
j
X
j
n
n
n
!1
n!1
n!1
131
1
1
r r
r r
=) nlim
!1[E (j Xn j )] [E (j X j )] (D)
Combining (C ) and (D) gives
r 1r
r r1
nlim
!1[E (j Xn j )] = [E (j X j )]
r
r
=) nlim
!1 E (j Xn j ) = E (j X j )
(ii) For 1 s < r, it follows from Lyapunov's Inequality (Theorem 3.5.4):
[E (j Xn ; X js )] 1s [E (j Xn ; X jr )] 1r
=) E (j Xn ; X js ) [E (j Xn ; X jr )] rs
r X
s ) lim [E (j X ; X jr )] sr = 0 since X ;!
=) nlim
E
(
j
X
;
X
j
n
n
n
!1
n!1
s X
=) Xn ;!
An additional proof is required for 0 < s < r < 1.
Denition 6.1.24:
Let fXn g1
n=1 be a sequence of rv's on (
; L; P ). We say that Xn converges almost surely
1
a:s: X ) or X converges with probability 1 to X (X w:p:
to a rv X (Xn ;!
n
n ;! X ) or Xn
converges strongly to X i
P (f! : Xn (!) ! X (!) as n ! 1g) = 1:
Note:
An interesting characterization of convergence with probability 1 and convergence in probability can be found in Parzen (1960) \Modern Probability Theory and Its Applications" on
page 416 (see Handout).
Example 6.1.25:
Let = [0; 1] and P a uniform distribution on . Let Xn (!) = ! + !n and X (!) = !.
For ! 2 [0; 1), !n ! 0 as n ! 1. So Xn (!) ! X (!) 8! 2 [0; 1).
However, for ! = 1, Xn (1) = 2 6= 1 = X (1) 8n, i.e., convergence fails at ! = 1.
a:s: X .
Anyway, since P (f! : Xn (!) ! X (!) as n ! 1g) = P (f! 2 [0; 1)g) = 1, it is Xn ;!
132
Lecture 42/1:
Fr 12/08/00
Theorem 6.1.26:
p
a:s: X =) X ;!
Xn ;!
X.
n
Proof:
Choose > 0 and > 0. Find n0 = n0 (; ) such that
P
Since
1
\
n=n0
1
\
n=n0
!
fj Xn ; X j g 1 ; :
fj Xn ; X j g fj Xn ; X j g 8n n0, it is
P (fj Xn ; X j g) P
1
\
n=n0
!
fj Xn ; X j g 1 ; 8n n0:
p
Therefore, P (fj Xn ; X j g) ! 1 as n ! 1. Thus, Xn ;!
X.
Example 6.1.27:
p
a:s: X :
Xn ;!
X 6=) Xn ;!
Let = (0; 1] and P a uniform distribution on .
Dene An by
A1 = (0; 12 ]; A2 = ( 21 ; 1]
A3 = (0; 14 ]; A4 = ( 41 ; 12 ]; A5 = ( 21 ; 34 ]; A6 = ( 43 ; 1]
A7 = (0; 18 ]; A8 = ( 81 ; 14 ]; : : :
Let Xn (!) = IAn (!).
p
It is P (j Xn ; 0 j ) ! 0 8 > 0 since Xn is 0 except on An and P (An ) # 0. Thus Xn ;!
0.
But P (f! : Xn (!) ! 0g) = 0 (and not 1) because any ! keeps being in some An beyond any
n0 , i.e., Xn (!) looks like 0 : : : 010 : : : 010 : : : 010 : : :, so Xn ;!
6 a:s: 0.
Example 6.1.28:
r X 6=) X ;!
a:s: X :
Xn ;!
n
Let Xn be independent rv's such that P (Xn = 0) = 1 ; n1 and P (Xn = 1) = n1 .
r 0 8r > 0.
It is E (j Xn ; 0 jr ) = E (j Xn jr ) = E (j Xn j) = n1 ! 0 as n ! 1, so Xn ;!
But
133
P (Xn = 0 8m n n0) =
n0
Y
n=m
m + 1 ) : : : ( n0 ; 2 )( n0 ; 1 ) = m ; 1
(1; n1 ) = ( mm; 1 )( mm+ 1 )( m
+2
n ;1 n
n
0
0
As n0 ! 1, it is P (Xn = 0 8m n n0 ) ! 0 8m, so Xn ;!
6 a:s: 0.
Example 6.1.29:
a:s: X 6=) X ;!
r X:
Xn ;!
n
Let = [0; 1] and P a uniform distribution on .
Let An = [0; ln1n ].
Let Xn (!) = nIAn (!) and X (!) = 0.
It holds that 8! > 0 9n0 : ln1n0 < ! =) Xn (!) = 0 8n > n0 and P (! = 0) = 0. Thus,
a:s: 0.
Xn ;!
But E (j Xn ; 0 jr ) = lnnrn ! 1 8r > 0, so Xn ;!
6 r X.
134
0
Lecture 39:
Fr 12/01/00
6.2 Weak Laws of Large Numbers
Theorem 6.2.1: WLLN: Version I
Let fXi g1
a sequence of iid rv's with mean E (Xi ) = and variance V ar(Xi ) = 2 < 1.
i=1 be
n
X
Let X n = n1 Xi . Then it holds
i=1
nlim
!1 P (j X n ; j ) = 0;
p
i.e., X n ;!
.
Proof:
By Markov's Inequality, it holds for all > 0:
2
2 ;! 0 as n ! 1
P (j X n ; j ) E ((X n2; ) ) = V ar(2X n ) = n
2
Note:
For iid rv's with nite variance, X n is consistent for .
A more general way to derive a \WLLN" follows in the next Denition.
Denition 6.2.2:
n
X
Let fXi g1
be
a
sequence
of
rv's.
Let
T
=
Xi . We say that fXi g obeys the WLLN
n
i=1
i=1
with respect to a sequence of norming constants fBi g1
i=1 , Bi > 0; Bi " 1, if there exists a
1
sequence of centering constants fAi gi=1 such that
p
Bn;1 (Tn ; An) ;!
0:
Theorem 6.2.3:
2
Let fXi g1
i=1n be a sequence of pairwise uncorrelated rv'snwith E (Xi ) = ni and V ar(Xi ) = i ,
X
X
X
i 2 IN . If i2 ! 1 as n ! 1, we can choose An = i and Bn = i2 and get
i=1
n
X
i=1
i=1
(Xi ; i )
n
X
i=1
i2
135
p
;!
0:
i=1
Proof:
By Markov's Inequality, it holds for all > 0:
P (j
n
X
i=1
Xi ;
n
X
i=1
i j> n
X
i=1
n
X
i2 ) E (( (Xi ; ))2 )
i=1
n
X
2 ( i2)2
i=1
1
= X
n
2
i=1
i2
;! 0 as n ! 1
Note:
To obtain Theorem 6.2.1, we choose An = n and Bn = n2 .
Theorem 6.2.4:
n
X
1
Xi . A necessary and sucient condition
Let fXi g1
be
a
sequence
of
rv's.
Let
X
=
n n
i=1
i=1
for fXi g to obey the WLLN with respect to Bn = n is that
!
as n ! 1.
Proof:
Rohatgi, page 258, Theorem 2.
2
X
n
E
!0
1 + X 2n
Example 6.2.5:
Let (X1 ; : : : ; Xn ) be jointly Normal with E (Xi ) = 0, E (Xi2 ) = 1 for all i, and Cov(Xi ; Xj ) = if j i ; j j= 1 and Cov(Xi ; Xj ) = 0 if j i ; j j> 1. Then, Tn N (0; n + 2(n ; 1)) = N (0; 2 ).
!
It is
2 !
2
X
T
n
n
E
=
E n2 + T 2
1 + X 2n
n
Z
1 x2 ; x2
dx
x
22 dx j y = ; dy =
p2
=
e
2
2
2 0 n + x
Z 1 2 y2
; y22 dy
p2
=
e
2
2
2
2 0 n + y
Z 1 (n + 2(n ; 1))y2
; y22 dy
p2
e
=
2
2
2 0 n + (n + 2(n ; 1))y
Z1 2
2
n + 2(n ; 1)
2 e; y2 dy
p
y
n2
| 0 2{z
}
=1; since Var of N (0;1) distribution
!0
as n ! 1
p
=) X n ;! 0
136
Note:
We would like to have a WLLN that just depends on means but does not depend on the
existence of nite variances. To approach this, we consider the following:
Let fXi g1
i=1 be a sequence of rv's. Let Tn =
Xic =
Let Tnc =
n
X
i=1
Xic and mn =
n
X
i=1
(
n
X
i=1
Xi . We truncate each Xi at c > 0 and get
Xi; j Xi j c
0; otherwise
E (Xic ).
Lemma 6.2.6:
For Tn , Tnc and mn as dened in the Note above, it holds:
P (j Tn ; mn j> ) P (j Tnc ; mn j> ) +
n
X
i=1
P (j Xi j> c) 8 > 0
Proof:
It holds for all > 0:
P (j Tn ; mn j> ) = P (j Tn ; mn j> and j Xi j c 8i 2 f1; : : : ; ng) +
P (j Tn ; mn j> and j Xi j> c for at least one i 2 f1; : : : ; ng
()
P (j Tnc ; mn j> ) + P (j Xi j> c for at least one i 2 f1; : : : ; ng)
n
X
P (j Tnc ; mn j> ) + P (j Xi j> c)
i=1
() holds since Tnc = Tn when j Xi j c 8i 2 f1; : : : ; ng.
Note:
If the Xi 's are identically distributed, then
P (j Tn ; mn j> ) P (j Tnc ; mn j> ) + nP (j X1 j> c) 8 > 0:
If the Xi 's are iid, then
c2
P (j Tn ; mn j> ) nE ((X2 1 ) ) + nP (j X1 j> c) 8 > 0 ():
Note that P (j Xi j> c) = P (j X1 j> c) 8i 2 IN if the Xi 's are identically distributed and that
E ((Xic )2 ) = E ((X1c )2 ) 8i 2 IN if the Xi 's are iid.
137
Theorem 6.2.7: Khintchine's WLLN
Let fXi g1
i=1 be a sequence of iid rv's with nite mean E (Xi ) = . Then it holds:
p
X = 1 T ;!
n
n
n
Proof:
If we take c = n and replace by n in () in the Note above, we get
Tn ; mn X1n )2 ) + nP (j X j> n):
P n > = P (j Tn ; mn j> n) E ((n
1
2
Since E (j X1 j) < 1, it is nP (j X1 Zj>1n) ! 0 as n ! 1 by Theorem 3.1.9. From Corollary
3.1.12 we know that E (j X j ) = x;1 P (j X j> x)dx. Therefore,
0
Zn
E ((X1n )2 ) = 2 xP (j X1n j> x)dx
Z0 A
Zn
n
= 2 xP (j X1 j> x)dx + 2 xP (j X1n j> x)dx
0
A
Zn
(+)
K + dx
A
K + n
In (+), A is chosen suciently large such that xP (j X1n j> x) < 2 8x A for an arbitrary
constant > 0 and K > 0 a constant.
Therefore,
E ((X1n )2 ) K + n2
n2 2
Since is arbitrary, we can make the right hand side of this last inequality arbitrarily small
for suciently large n.
Since E (Xi ) = 8i, it is mnn ! as n ! 1.
Note:
Theorem 6.2.7 meets the previously stated goal of not having a nite variance requirement.
Merry Xmas and a Happy New Year!
138
Lecture 42/2:
Fr 12/08/00
Index
{Algebra, 2
{Field, 2
{Field, Borel, 4
{Field, Generated, 2
k{Parameter Exponential Family, 119
kth Central Moment, 43
kth Factorial Moment, 66
kth Moment, 43
n{Dimensional Characteristic Function, 97
n{RV, 71
Consistent, 125
Continuity of Sets, 9
Continuous, 29, 31, 32
Continuous RV, 73
Continuous, Absolutely, 32
Convergence, Almost Sure, 132
Convergence, In rth Mean, 130
Convergence, In Absolute Mean, 130
Convergence, In Distribution, 122
Convergence, In Law, 122
Convergence, In Mean Square, 130
Convergence, In Probability, 125
Convergence, Strong, 132
Convergence, Weak, 122
Convergence, With Probability 1, 132
Convex, 108
Countable Additivity, 4
Counting, Fundamental Theorem of, 11
Covariance Inequality, 111
Cumulative Distribution Function, 27
Cumulative Distribution Function, Joint, 71
Absolutely Continuous, 32
Additivity, Countable, 4
Arithmetic Mean, 109
Bayes' Rule, 19
Binomial Coecient, 12
Binomial Theorem, 14
Bivariate Normal Distribution, 112
Bivariate RV, 72
Bochner's Theorem, 60
Bonferroni's Inequality, 7
Boole's Inequality, 9
Borel {Field, 4
Borel Field, 2
Borel Sets, 4
Discrete, 29, 31
Discrete RV, 72
Distribution, Sampling, 122
Equally Likely, 11
Equivalence, 35
Equivalent RV, 81
Euler's Relation, 56
Event, 1
Expectation, Conditional, 102
Expectation, Multivariate, 91
Expected Value, 42
Exponential Family in Canonical Form, 119
Exponential Family, k{Parameter, 119
Exponential Family, One{Parameter, 119
Canonical Form, 119
Cauchy{Schwarz Inequality, 95
CDF, 27
CDF, Conditional, 75
CDF, Joint, 71
CDF, Marginal, 74
Centering Constants, 135
Characteristic Function, 57
Characteristic Function, n{Dimensional, 97
Chebychev's Inequality, 68
Conite, 2
Completely Independent, 20, 77
Complex Numbers, 56
Complex Numbers, Conjugate, 56
Complex{Valued Random Variable, 57
Concave, 108
Conditional CDF, 75
Conditional Cumulative Distribution Function, 75
Conditional Expectation, 102
Conditional PDF, 75
Conditional PMF, 74
Conditional Probability, 17
Conditional Probability Density Function, 75
Conditional Probability Mass Function, 74
Conjugate Complex Numbers, 56
Factorial, 12
Factorization Theorem, 78
Field, 1
Fundamental Theorem of Counting, 11
Geometric Mean, 109
Holder's Inequality, 106
Harmonic Mean, 109
Identically Distributed, 30
IID, 81
Implication, 35
Inclusion{Exclusion, Principle, 6
Independent, 19, 77, 78
139
Measurable Space, 4
MGF, 50
MIN, 82
Minkowski's Inequality, 107
MMGF, 97
Moivre's Theorem, 56
Moment Generating Function, 50
Moment Generating Function, Multivariate, 97
Moment, kth , 43
Moment, kth Central, 43
Moment, kth Factorial, 66
Moment, Multi{Way, 94
Moment, Multi{Way Central, 94
Monotonic, 37
Multi{Way Central Moment, 94
Multi{Way Moment, 94
Multiplication Rule, 18
Multivariate Expectation, 91
Multivariate Moment Generating Function, 97
Multivariate Normal Distribution, 117
Multivariate RV, 72
Multivariate Transformation, 85
Mutually Independent, 20, 77
Independent Identically Distributed, 81
Independent, Completely, 20, 77
Independent, Mutually, 20, 77
Independent, Pairwise, 19
Indicator Function, 34
Induced Probability, 27
Induction, 6
Induction Base, 6
Induction Step, 7
Inequality, Bonferroni's, 7
Inequality, Boole's, 9
Inequality, Cauchy{Schwarz, 95
Inequality, Chebychev's, 68
Inequality, Covariance, 111
Inequality, Holder's, 106
Inequality, Jensen's, 108
Inequality, Lyapunov's, 69
Inequality, Markov's, 68
Inequality, Minkowski's, 107
Jensen's Inequality, 108
Joint CDF, 71
Joint Cumulative Distribution Function, 71
Joint PDF, 73
Joint PMF, 72
Joint Probability Density Function, 73
Joint Probability Mass Function, 72
Jump Points, 31
Non{Decreasing, 37
Non{Decreasing, Strictly, 37
Non{Increasing, 37
Non{Increasing, Strictly, 37
Normal Distribution, 112
Normal Distribution, Bivariate, 112
Normal Distribution, Multivariate, 117
Normal Distribution, Standard, 112
Normal Distribution, Univariate, 112
Norming Constants, 135
Khintchine's Weak Law of Large Numbers, 138
Kolmogorov Axioms of Probability, 4
L'Hospital's Rule, 52
Law of Total Probability, 18
Lebesque's Dominated Convergence Theorem, 53
Leibnitz's Rule, 52
Levy's Theorem, 60
Logic, 35
Lyapunov's Inequality, 69
One{Parameter Exponential Family, 119
Order Statistic, 89
Pairwise Independent, 19
Partition, 18
PDF, 32
PDF, Conditional, 75
PDF, Joint, 73
PDF, Marginal, 73
PGF, 66
PM, 4
PMF, 31
PMF, Conditional, 74
PMF, Joint, 72
PMF, Marginal, 73
Power Set, 2
Principle of Inclusion{Exclusion, 6
Probability Density Function, 32
Probability Generating Function, 66
Probability Mass Function, 31
Marginal CDF, 74
Marginal Cumulative Distribution Function, 74
Marginal PDF, 73
Marginal PMF, 73
Marginal Probability Density Function, 73
Marginal Probability Mass Function, 73
Markov's Inequality, 68
MAX, 82
Mean, 42
Mean, Arithmetic, 109
Mean, Geometric, 109
Mean, Harmonic, 109
Measurable, 2, 24
Measurable L ; B, 24
Measurable Functions, 25
140
Probability Measure, 4
Probability Space, 4
Probability, Conditional, 17
Probability, Induced, 27
Probability, Total, 18
Quantier, 35
Random Variable, 24
Random Variable, Complex{Valued, 57
Random Vector, 24
Random Vector, n{Dimensional, 71
RV, 24
RV, Bivariate, 72
RV, Continuous, 73
RV, Discrete, 72
RV, Equivalent, 81
RV, Multivariate, 72
Sample Space, 1
Sampling Distribution, 122
Sets, Continuity of, 9
Singular, 33
Slutsky's Theorem, 128
Standard Normal Distribution, 112
Statistic, 122
Support, 37
Total Probability, 18
Transformation, 36
Transformation, Multivariate, 85
Univariate Normal Distribution, 112
Variance, 43
Variance{Covariance Matrix, 114
Weak Law Of Large Numbers, 135
Weak Law Of Large Numbers, Khintchine's, 138
141