Stat 451 – Homework 02 Solutions1

Stat 451 – Homework 02
Solutions 1
1. Consider the function
p(α) = 1 − (πα)1/2 eα/4 {1 − Φ (α/2)1/2 },
α > 0,
which is the probability that a certain random quadratic function has two real roots;
see the Bonus Problem below.
(a) A plot of p(α) as a function of α is shown in Figure 1. It is clearly a strictly
decreasing function, starting at 1 and eventually reaching 0.
(b) By monotonicity of p(α), we know that there is a solution to the equation
p(α) = 1/2. We can apply the bisection method, with bracketing interval
[0, 10], to find the solution: α̃ = 0.7491; see the code in the appendix.
0.6
0.2
0.4
p(α)
0.8
1.0
(c) The R function uniroot gives the same answer (up to significant digits) as
the simple implementation of bisection. It’s a bit more efficient, however, since
(for me) it took fewer than half the iterations as my naive bisection.
0
1
2
3
4
5
6
α
Figure 1: Plot of the function p(α).
2. Let f satisfy f 0 (x) > 0 and f 00 (x) > 0 for all x in the domain (d, ∞). Suppose that
f has a root r, i.e., f (r) = 0, and let (xt : t ≥ 0) be the sequence generated by
Newton’s method starting at x0 targeting the root.
(a) We assume that f has a r and, since f 0 > 0, the function is strictly increasing,
the root r must be unique.
(b) First, suppose x0 = r. This case is not interesting, since xt = r for all t and,
therefore, obviously converge to the root. Next, suppose that x0 > r. The key
1
Prepared by Prof. Ryan Martin ([email protected]) for his Stat 451 course at UIC, Spring 2016.
1
point to this is the fact that f is a strictly convex function, a consequence of
the assumption that f 00 > 0.2 In particular, since f is convex, we have
f (x) ≥ f (y) + f 0 (y)(x − y),
for all (x, y) in the domain.
(1)
So, consider Newton’s update from x0 to x1 :
x1 = x0 − f (x0 )/f 0 (x0 ).
Since x0 > r, we know that f (x0 ) > 0 and f 0 (x0 ) > 0, so x1 must be smaller
than x0 , i.e., the move is in the correct direction. We need to check that the
move is not too far, and for this we need (1). In particular, if we let x = r and
y = x0 , then (1) gives
0 = f (r) ≥ f (x0 ) + f 0 (x0 )(r − x0 ) =⇒ f (x0 )/f 0 (x0 ) ≤ x0 − r,
which implies that
x1 = x0 −
f (x0 )
≥ x0 − (x0 − r) = r,
f 0 (x0 )
i.e., x1 is less than x0 but greater than r. If x0 < r, then the same argument
as above shows that x1 ≥ r. Therefore, since this argument can be applied
recursively, it must be that xt ≥ r for all t ≥ 1.
(c) From Part (b), we have that xt is non-increasing and bounded below by r;
therefore, there is a limit x∞ . By continuity of f and f 0 , x∞ must satisfy
x∞ = x∞ − f (x∞ )/f 0 (x∞ ) ⇐⇒ f (x∞ ) = 0.
Since the root r is unique, it must be that x∞ = r, so Newton’s method
converges to the root for any x0 .
3. Consider the density f (x) = (2π)−1 {1 − cos(x − θ)} where 0 ≤ x ≤ 2π and θ is a
parameter between −π and π. The iid observations (X1 , . . . , Xn ) are
3.91 4.85 2.28 4.06 3.70 4.04 5.46 3.53 2.28 1.96
2.53 3.88 2.22 3.47 4.82 2.46 2.99 2.54 0.52 2.50
(a) Since the observations are iid according to f , the log-likelihood function is
`(θ) =
n
X
log{1 − cos(Xi − θ)}.
i=1
The plot of the log-likelihood is given in Figure 2. This is obviously very
bumpy, so any optimization routine will be highly sensitive to the starting
point, since there are many nearby local maxima. To implement Newton’s
method, we will also need the first and second derivatives:
0
` (θ) = −
n
X
i=1
2
sin(Xi − θ)
1 − cos(Xi − θ)
Sketch a picture to visualize what’s going on.
2
00
and ` (θ) = −
n
X
i=1
1
.
1 − cos(Xi − θ)
0
-10
-20
-30
log L(θ)
-40
-50
-60
-3
-2
-1
0
1
2
3
θ
Figure 2: Log-likelihood function, `(θ)
(b) The method-of-moments
estimator θ̃ is found by setting the mean function
R
µ(θ) = xf (x) dx equal to the sample mean X̄ and solving for θ. First,
1
µ(θ) =
2π
Z
0
2π
1
x[1 − cos(x − θ)] dx = π −
2π
Z
2π
x cos(x − θ) dx.
0
Using integration by parts on the right-hand side above, we get
Z 2π
Z 2π
2π
sin(x − θ) dx = 2π sin(2π − θ).
x cos(x − θ) dx = x sin(x − θ)|0 −
0
0
Next, recall that sin(−x) = sin(x) and sin(2π+x) = sin(x). Putting everything
together, we have
Z 2π
1
µ(θ) = π −
x cos(x − θ) dx = π − sin θ.
2π 0
Now, the method of moments estimator θ̃ is the solution of µ(θ) = X̄. In
particular,
µ(θ) = X̄ ⇐⇒ π − sin θ = X̄ ⇐⇒ θ = arcsin(π − X̄).
Therefore, the method-of-moments estimator is θ̃ = arcsin(π − x̄) = −0.0584,
which is fairly close to the actual maximizer, based on Figure 2.
(c) We can use Newton’s method to find the MLE θ̂. I will use my function
newton provided on the course website. Using the starting value θ(0) = θ̃ =
−0.0584, the method-of-moments estimator, Newton’s method converges to
θ̂ = −0.0199. Alternatively, if we start with θ(0) = ±2.7, the ending estimates
are θ̂ = 2.873 and θ̂ = −2.667, respectively.
3
i
1
2
3
4
5
6
7
8
9
10
θ̂i
−3.0931
−2.7863
−2.7862
−2.6667
−2.5076
−2.3882
−2.2973
−2.232
−1.6583
−1.4475
Lower
−3.110177
−2.796017
−2.764602
−2.733186
−2.576106
−2.387610
−2.356194
−2.230531
−2.199115
−1.445133
Upper
−2.827433
−2.796017
−2.764602
−2.607522
−2.419026
−2.387610
−2.261947
−2.230531
−1.476549
−1.445133
i
11
12
13
14
15
16
17
18
19
20
θ̂i
−0.9533
−0.012
0.7906
2.0036
2.2361
2.3607
2.4754
2.5136
2.8731
3.1901
Lower
−1.413717
−0.816814
0.534071
1.979203
2.230531
2.293363
2.481858
2.513274
2.544690
3.015929
Upper
−0.848230
0.502655
1.947787
2.199115
2.261947
2.450442
2.481858
2.513274
2.984513
3.141593
Table 1: Partition of the interval [−π, π]
(d) For 200 equi-spaced starting values between −π and π, we get 20 different
solutions, denoted by θ̂1 , . . . , θ̂20 . We can partition the interval [−π, π] into
subintervals Ei based on the following recipe: θ ∈ Ei if and only if θ(0) = θ
implies θ(t) → θ̂i . Table 1 gives the θ̂i ’s as well as estimates of the upper and
lower bounds of the corresponding Ei . Notice that the method-of-moments
estimator θ̃ belongs to E12 .
(e) Notice that the solutions (bold entries in Table 1) corresponding to the subintervals E13 and E14 are quite different. In an attempt to find starting values
very near to one another for which Newton’s method converges to different
solutions, we take starting values near the boundary between E13 and E14 .
After several trials, we find that
θ(0) = 1.9580 → 0.7906 and θ(0) = 1.9582 → 1.9563.
Note that these two starting values are within 0.0002 of one another, demonstrating the sensitivity of the Newton’s method to the choice of starting value,
at least for this particular problem.
4. Let F be the Gamma(α, β) distribution function, where α > 0 is the shape parameter
and β > 0 is the scale parameter. The particular problem considers α = 2 and
β = 1. The goal is find the shortest interval with 95% probability. Let f = F 0
be the density function, and notice that this function is unimodal, i.e., there is
exactly one point where the derivative of f changes sign.3 It’s not difficult to
check that that point where f 0 vanishes—called the mode of the distribution—is
x = max{β(α − 1), 0}. Anyway, the fact that the function is unimodal means
that the shortest interval of a given probability is a level set of the density, i.e., an
interval [ac , bc ] = {x : f (x) ≥ c} for some c. So, the goal is to select c such that the
interval has probability 0.95. This requires two things:
• For a given c, we can identify the interval [ac , bc ];
3
This claim holds for any α > 1; for α < 1, the peak is at the left endpoint, which actually simplifies
the problem.
4
• then choose c so that F (bc ) − F (ac ) = 0.95.
R code in the appendix uses the uniroot function in both of these steps; we can use
the fact that the interval [ac , bc ] contains the mode to identify appropriate intervals
to search for the desired roots. I found that the appropriate cut point is c ≈ 0.041,
which corresponds to a 0.95-probability interval [ac , bc ] = [0.0423, 0.4.766].
ind
5. Consider a model Yi ∼ Pois(λi ), where λi = x>
i α, xi is a (column) vector of known
covariates, and α is a vector parameter of interest. The log-likelihood function (up
to additive constants) is
n
n
X
X
>
`(α) = −
xi α +
Yi log(x>
i α).
i=1
i=1
Data {(xi , Yi ) : i = 1, . . . , n} are available on the textbook’s website; see my code.
(a) For Newton’s update, we need both the first and second derivatives of `(α).
The first derivative vector is
n X
Yi
0
−
1
xi ,
` (α) =
x>
i α
i=1
and the second derivative matrix is
00
` (α) = −
n
X
Yi
x x> .
>
2 i i
(x
α)
i
i=1
Then, for a given starting value α(0) , Newton’s update looks like
α(t+1) = α(t) − [`00 (α(t) )]−1 `0 (α(t) ),
t ≥ 0.
(b) For Fisher scoring, we replace the second derivative matrix with the negative
Fisher information matrix:
n
n
X
X
E(Yi )
1
>
−I(α) = E{` (α)} = −
xi xi = −
xi x>
i .
>
>
2
(x
α)
x
α
i
i=1
i=1 i
00
Then, for a given starting value α(0) , the Fisher scoring update looks like
α(t+1) = α(t) − [−I(α(t) )]−1 `0 (α(t) ),
t ≥ 0.
(c) R code to implement both Newton’s method and Fisher scoring are given in
the appendix. Based on several experiments, it seems that Newton’s method
is more sensitive to the choice of starting value α(0) compared to Fisher scoring, i.e., Newton’s method can give very different answers for nearby starting
values, whereas Fisher scoring gives the same answer for these same starting
values. However, if the starting value is good, then Newton’s method converges much more quickly. For α(0) = (0.5, 0.5)> , both Newton and Fisher
scoring converge to the MLE α̂ = (1.0972, 0.9376)> , in 6 and 16 iterations,
respectively.
5
(d) Both Newton and Fisher scoring can produce output that can be used to
estimate the standard error (and correlation) of the MLEs, in particular,
[−`00 (θ̂)]−1 and I(α̂)−1 , respectively. The following results4 were obtained:
0.1518 −0.1680
0.1915 −0.2281
and
−0.1680 0.3077
−0.2281 0.3988
(e) The method of steepest ascent is like Newton’s method but, instead of `00 , it
uses the (scaled) negative identity matrix. So, these updates look like
α(t+1) = α(t) + c(t) `0 (α(t) ),
t ≥ 0.
The number c(t) is chosen to be sure that the update leads to an uphill move on
the likelihood surface. Starting from α(0) = (0.5, 0.5)> , the method of steepest
ascent finds the MLE α̂ = (1.0972, 0.9376)> , the same as the other methods.
However, this method took 62 iterations to reach the MLE, much more than
the others.
(f) R function optim, starting at α(0) = (0.5, 0.5)> and based on the BFGS
method, produces MLE α̂ = (1.0972, 0.9376)> , which agrees with that from
Newton, Fisher scoring, and method of steepest ascent.
(g) A contour plot with the paths taken by the various methods described above
is given in Figure 3. For all three starting values, each method converges
to the same point, the only difference is in how direct the path to the MLE
is. Newton takes the most direct path, followed by Fisher scoring, while the
method of steepest ascent takes some inefficient steps.
6. Graduate only. The logistic model for population growth is given by the differential
equation
ρN (κ − N )
dN
=
,
dx
κ
where N = Nx is the population size at time x, ρ is the growth rate, and κ is the
population carrying capacity. A particular solution of this differential equation is
Nx = fθ (x) =
κN0
.
N0 + (κ − N0 ) exp{−ρx}
Using the data {(Ni , xi ) : 1 ≤ i ≤ n}, we can find the least-squares estimator of
θ = (κ, ρ)> by maximizing
Q(θ) = −
n
X
(Ni − fθ (xi ))2 .
i=1
Figure 4 shows a contour plot of Q(θ). It appears that the maximum value of Q(θ)
occurs in the neighborhood of θ = (840, 0.125)> .
4
Actually, I’ve reported here those two matrices from the iteration previous to termination of the
?
?
updates; it’s probably a bit better to report those matrices based on α̂ = α(t ) rather than α(t −1) , but
?
the difference between α̂ and α(t −1) is small so, by continuity, the matrices should be similar.
6
2.0
1.5
α2
1.0
0.0
0.5
*
0.0
0.5
1.0
1.5
2.0
α1
0.05
0.10
ρ
0.15
0.20
Figure 3: Contour plot of the log-likelihood, with paths taken by various methods. Newton (solid), Fisher scoring (dashed), and steepest ascent (dotted), starting from three
different points (0.5, 0.5), (1.5, 1.5), and (0.5, 1.5).
750
800
850
κ
Figure 4: Contour plot of Q(θ)
7
900
(a) We first find the least-squares estimator using the Gauss–Newton method.
The idea of the Gauss–Newton is to approximate the function fθ (x) by a
linear function. In particular, we will need the gradient vector f˙θ (xi ), which
is the vector of partial derivatives of fθ with respect to κ and ρ, respectively:
N02 (1 − e−ρxi )
∂fθ (xi )
=
∂κ
[N0 (1 − e−ρxi ) + κe−ρxi ]2
xi κN0 (κ − N0 )e−ρxi
∂fθ (xi )
=
.
∂ρ
[N0 (1 − e−ρxi ) + κe−ρxi ]2
Therefore, the gradient vector gi = f˙θ (xi ) is
i2
h
1
−ρxi
−ρxi >
2
(1
−
e
)
,
x
κN
(κ
−
N
)e
.
N
gi =
i
0
0
0
N0 (1 − e−ρxi ) + κe−ρxi
(t)
Now, for iterates θ(t) = (κ(t) , ρ(t) )> , let gi = f˙θ(t) (xi ). Further, we define the
(t)
“residuals” at iteration t as zi = Ni − fθ(t) (xi ), and z (t) as the corresponding
(t)
residual vector. Finally, if we let G(t) be a matrix with gi )> as its i-th row,
then the update of θ(t) is
−1
θ(t+1) = θ(t) + (G(t) )T G(t) (G(t) )T z (t) .
Using my R function gauss.newton defined in the appendix, the least-squares
estimates of κ and ρ are, respectively,
κ̂ = 827.332,
and ρ̂ = 0.1323,
based on the starting value θ(0) = (800, 0.1)> . The algorithm took 9 iterations
to converge, and for θ̂ = (κ̂, ρ̂)> , we have Q(θ̂) = −693025.2.
(b) Next, we want to fit the logistic model using Newton’s method to maximize
Q(θ). Compared to the Gauss–Newton method, Newton’s method approximates Q(θ) itself by a quadratic function and maximizes. Some preliminary
calculations are needed. For notational simplicity, let κ = θ1 and ρ = θ2 . Then
for i, j ∈ {1, 2},
n
X
∂fθ (xk )
∂Q
=2
(Nk − fθ (xk ))
∂θi
∂θi
k=1
n h
X
∂ 2Q
∂ 2 fθ (xk ) ∂f (xk ) ∂fθ (xk ) i
(Nk − fθ (xk ))
=2
−
∂θi ∂θj
∂θi ∂θj
∂θi
∂θj
k=1
Using the first order partials of fθ given in Part (a), we have
n h
ih
i
X
N02 (1 − e−ρxi )
∂Q
κN0
=2
Ni −
∂κ
N0 + (κ − N0 )e−ρxi {N0 + (κ − N0 )e−ρxi }2
i=1
n h
ih x κN (κ − N )e−ρxi i
X
∂Q
κN0
i
0
0
=2
Ni −
.
−ρx
i
∂ρ
N0 + (κ − N0 )e
{N0 + (κ − N0 )e−ρxi }2
i=1
8
We need the second order partial derivatives of f (note that since f has continuous first partial derivatives we know that mixed partials are equal)
∂ 2 fθ (xi )
2N02 e−ρxi (1 − eρxi )
=
∂κ2
[N0 + (κ − N0 )e−ρxi ]3
∂ 2 fθ (xi )
κx2i N0 (κ − N0 )e−ρxi {κe−ρxi − N0 (1 − e−ρxi )}
=
∂ρ2
[N0 + (κ − N0 )e−ρxi ]3
N0 e−ρxi + (κ − N0 )e−2ρxi + 2(1 − e−ρxi )(κ − N0 )
∂ 2 fθ (xi )
=
∂κ∂ρ
[N0 + (κ − N0 )e−ρxi ]3
Then the second partial derivatives of Q(θ) are found by plugging the three
second partial derivatives into the general formulae above. Define the gradient
vector and Hessian matrix as
!
∂2Q
∂2Q
>
2
∂Q ∂Q
.
and H(θ) = ∂∂κ2 Q ∂κ∂ρ
g(θ) = ∂κ , ∂ρ
∂2Q
∂ρ∂κ
∂ρ2
If g (t) = g(θ(t) ) and H (t) = H(θ(t) ), then the Newton update is
θ(t+1) = θ(t) − [H (t) ]−1 g (t) ,
t = 0, 1, 2, . . .
I used my R function mv.newton in the appendix, starting at θ(0) = (830, 0.145)> ,
and the least-squares estimators of κ and ρ are
κ̂ = 827.332 and ρ̂ = 0.1323.
The algorithm took 9 iterations to converge and Q(θ̂) = −693025.4.
Bonus problem: Let X and Y be iid exponential random variables with common mean
1/α. Then the (random) quadratic function x 7→ x2 − 2X x + Y has two real roots iff
X 2 > Y . This probability can be written as
p(α) = P(Y < X 2 )
Z ∞
P(Y < x2 ) αe−αx dx
=
0
Z ∞
2
=1−α
e−α(x +x) dx.
0
Write x2 + x = (x + 12 )2 −
1
4
to complete the square; then we have
Z ∞
1 2
α/4
e−α(x+ 2 ) dx.
p(α) = 1 − αe
0
The remaining integral is clearly related to a normal distribution function, and some
change of variables will lead to the formula for p(α) in Problem 1 above.
9
Appendix: R codes
Problem 1.
p <- function(a) 1 - sqrt(pi * a) * exp(a / 4) * (1 - pnorm(sqrt(a / 2)))
curve(p, xlim=c(0, 6), xlab=expression(alpha), ylab=expression(p(alpha)))
abline(h=0.5, lty=3)
f <- function(a) p(a) - 0.5
o <- bisection(f, 0, 10); print(o)
# function ’bisection’ from course website
abline(v=o$solution, lty=3)
uniroot(f, c(0, 10))
Problem 3.
X <- c(3.91, 4.85, 2.28, 4.06, 3.70, 4.04, 5.46, 3.53, 2.28, 1.96, 2.53,
3.88, 2.22, 3.47, 4.82, 2.46, 2.99, 2.54, 0.52, 2.50)
loglik <- function(x) sum(log(1 - cos(X - x)))
dloglik <- function(x) -sum(sin(X - x) / (1 - cos(X - x)))
ddloglik <- function(x) -sum(1 / (1 - cos(X - x)))
# Part (a)
U <- seq(-pi, pi, len=1000)
loglik.U <- sapply(U, loglik)
plot(U, loglik.U, type="l", xlab=expression(theta), ylab=expression(log~L(theta)))
# Part (b)
mme <- asin(pi - mean(X))
# Part (c)
newton(dloglik, ddloglik, mme) # function ’newton’ from course website
newton(dloglik, ddloglik, -2.7)
newton(dloglik, ddloglik, 2.7)
# Parts (d) and (e)
U <- seq(-pi, pi, len=200)
N <- function(x) round(newton(dloglik, ddloglik, x)$solution, 4)
theta <- sapply(U, N)
soln <- split(U, theta)
m <- sapply(soln, min)
M <- sapply(soln, max)
cbind(min=round(m,6), max=round(M,6))
Problem 4.
prob.cut <- function(x, a, b) {
mode <- max(0, b * (a - 1))
p <- function(y) dgamma(y, shape=a, scale=b) - x
if(x < 0 || p(mode) < 0) stop("cut point is too small/large!")
10
if(a <= 1) left <- 0 else left <- uniroot(p, c(0, mode))$root
right <- uniroot(p, c(mode, qgamma(0.999, shape=a, scale=b)))$root
interval <- c(left, right)
pp <- sum(c(-1, 1) * pgamma(interval, shape=a, scale=b))
return(list(interval=interval, prob=pp))
}
a <- 2
b <- 1
f <- function(x) prob.cut(x, a, b)$prob - 0.95
o <- uniroot(f, c(1e-03, dgamma(b * (a - 1), shape=a, scale=b)))
curve(dgamma(x, shape=a, scale=b), xlim=c(0, 7))
oo <- prob.cut(o$root, a, b); print(oo)
abline(h=o$root, v=oo$interval, lty=3)
Problem 5.
Y <- c(2, 5, 3, 3, 1, 5, 2, 2, 1, 1, 1, 2, 3, 4, 2, 2, 3, 2, 1, 0, 0, 1,
0, 0, 0, 0)
n <- length(Y)
x1 <- c(0.720, 0.850, 1.120, 1.345, 1.290, 1.260, 1.015, 0.870, 0.750,
0.605, 0.570, 0.540, 0.720, 0.790, 0.840, 0.995, 1.030, 0.975,
1.070, 1.190, 1.290, 1.235, 1.340, 1.440, 1.450, 1.510)
x2 <- c(0.22, 0.17, 0.15, 0.20, 0.59, 0.64, 0.84, 0.87, 0.94, 0.99,
0.92, 1.00, 0.99, 1.06, 1.00, 0.88, 0.82, 0.82, 0.76, 0.66,
0.65, 0.59, 0.56, 0.51, 0.42, 0.44)
x <- matrix(c(x1, x2), nrow=n, ncol=2, byrow=FALSE)
loglik <- function(a) {
p <- as.numeric(x %*% a)
if(any(p <= 0)) out <- -Inf else out <- -sum(p) + sum(Y * log(p))
return(out)
}
# Part (c)
oil.newton <- function(a, cov=FALSE, eps=1e-08, maxiter=100) {
t <- 0
A <- matrix(a, nrow=2)
repeat {
t <- t + 1
d <- as.numeric(x %*% a)
g <- t(x) %*% (Y / d - 1)
gg <- matrix(0, 2, 2)
gg[1, 1] <- -sum(Y * x1**2 / d**2)
gg[1, 2] <- gg[2, 1] <- -sum(Y * x1 * x2 / d**2)
gg[2, 2] <- -sum(Y * x2**2 / d**2)
11
ggi <- ginv(gg)
a.new <- a - ggi %*% g
A <- cbind(A, a.new)
if(mean(abs(a.new - a)) < eps || t >= maxiter) break
a <- a.new
}
if(cov) cov <- -ggi else cov=NULL
return(list(iter=t, est=t(A), cov=cov))
}
oil.scoring <- function(a, cov=FALSE, eps=1e-08, maxiter=100) {
t <- 0
A <- matrix(a, nrow=2)
repeat {
t <- t + 1
d <- as.numeric(x %*% a)
g <- t(x) %*% (Y / d - 1)
fi <- matrix(0, 2, 2)
fi[1, 1] <- -sum(x1**2 / d)
fi[1, 2] <- fi[2, 1] <- -sum(x1 * x2 / d)
fi[2, 2] <- -sum(x2**2 / d)
fii <- solve(fi)
a.new <- a - fii %*% g
A <- cbind(A, a.new)
if(mean(abs(a.new - a)) < eps || t >= maxiter) break
a <- a.new
}
if(cov) cov <- -fii else cov=NULL
return(list(iter=t, est=t(A), cov=cov))
}
# Part (d)
o.newton <- oil.newton(c(0.5, 0.5), cov=TRUE)
o.score <- oil.scoring(c(0.5, 0.5), cov=TRUE)
# Part (e)
oil.steep <- function(a, eps=1e-08, maxiter=100) {
t <- 0
ll.a <- loglik(a)
A <- matrix(a, nrow=2)
repeat {
12
b <- 1
t <- t + 1
d <- as.numeric(x %*% a)
g <- t(x) %*% (Y / d - 1)
ascent <- FALSE
while(!ascent) {
a.new <- a + b * g
ll.new <- loglik(a.new)
if(ll.new < ll.a) b <- b / 2 else ascent <- TRUE
}
A <- cbind(A, a.new)
if(mean(abs(a.new - a)) < eps || t >= maxiter) break
a <- a.new
ll.a <- ll.new
}
return(list(iter=t, est=t(A)))
}
o.steep <- oil.steep(c(0.5, 0.5))
# Part (f)
f <- function(a) -loglik(a)
optim(c(0.5, 0.5), f, method="BFGS")
# Part (g)
a1 <- a2 <- seq(0.05, 2, len=150)
LL <- 0 * outer(a1, a2)
for(i in seq_along(a1)) {
for(j in seq_along(a2)) LL[i, j] <- loglik(c(a1[i], a2[j]))
}
contour(a1, a2, LL, col="gray", nlevels=50, drawlabels=FALSE,
xlab=expression(alpha[1]), ylab=expression(alpha[2]))
points(1.097, 0.938, pch="*")
points(0.5, 0.5)
o.newton <- oil.newton(c(0.5, 0.5))
o.score <- oil.scoring(c(0.5, 0.5))
o.steep <- oil.steep(c(0.5, 0.5))
lines(x=o.newton$est[,1], y=o.newton$est[,2])
lines(x=o.score$est[,1], y=o.score$est[,2], lty=2)
lines(x=o.steep$est[,1], y=o.steep$est[,2], lty=3)
points(1.5, 1.5)
o.newton <- oil.newton(c(1.5, 1.5))
o.score <- oil.scoring(c(1.5, 1.5))
o.steep <- oil.steep(c(1.5, 1.5))
13
lines(x=o.newton$est[,1], y=o.newton$est[,2])
lines(x=o.score$est[,1], y=o.score$est[,2], lty=2)
lines(x=o.steep$est[,1], y=o.steep$est[,2], lty=3)
points(0.5, 1.5)
o.newton <- oil.newton(c(0.5, 1.5))
o.score <- oil.scoring(c(0.5, 1.5))
o.steep <- oil.steep(c(0.5, 1.5))
lines(x=o.newton$est[,1], y=o.newton$est[,2])
lines(x=o.score$est[,1], y=o.score$est[,2], lty=2)
lines(x=o.steep$est[,1], y=o.steep$est[,2], lty=3)
Problem 6.
B <- c(2, 47, 192, 256, 768, 896, 120, 896, 1184, 1024)
D <- c(0, 8, 28, 41, 63, 79, 97, 117, 135, 154)
N0 <- B[1]
n <- length(B)
# Part (a)
gauss.newton <- function(y, f, df, x, eps=1e-08, maxiter=1000, ...) {
if(!exists("ginv")) library(MASS)
fx <- f(x, ...)
dfx <- df(x, ...)
t <- 0
repeat {
t <- t + 1
x.new <- x + ginv(t(dfx) %*% dfx) %*% t(dfx) %*% (y - fx)
if(mean(abs(x.new - x)) < eps || t >= maxiter) {
if(t >= maxiter) warning("Maximum number of iterations reached!")
break
}
x <- x.new
fx <- f(x.new, ...)
dfx <- df(x.new, ...)
}
return(list(x=as.numeric(x.new), fx=f(x.new, ...), iter=t))
}
f <- function(u) {
K <- u[1]
r <- u[2]
return(K * N0 / (N0 + (K - N0) * exp(-r * D)))
14
}
df <- function(u) {
fu <- f(u)
K <- u[1]
r <- u[2]
o1 <- fu * (1 - fu * exp(-r * D)) / K
o2 <- fu**2 * (K - N0) * D * exp(-r * D) / K / N0
return(matrix(c(o1, o2), nrow=n, ncol=2, byrow=FALSE))
}
o.gn <- gauss.newton(B, f, df, c(800, 0.1)); print(o.gn)
plot(D, B, xlab="Days", ylab="Number of Beetles")
g <- function(t) o.gn$x[1] * N0 / (N0 + (o.gn$x[1] - N0) * exp(-o.gn$x[2] * t))
curve(g, add=TRUE)
k <- seq(750, 900, by=5)
r <- seq(0.05, 0.2, by=0.025)
Q <- 0 * outer(k, r)
for(i in seq_along(k)) {
for(j in seq_along(r)) Q[i, j] <- -sum((B - f(c(k[i], r[j])))**2)
}
plot(x=0, y=0, xlim=range(k), ylim=range(r), xlab=expression(kappa),
ylab=expression(rho),main="")
contour(k, r, Q, lty="solid", add=TRUE, nlevels=15)
# Part (b)
F <- function(u) {
K <- u[1]
r <- u[2]
fu <- K * N0 / (N0 + (K - N0) * exp(-r * D))
return(sum((B - fu)**2))
}
dF <- function(u) {
K <- u[1]
r <- u[2]
fu <- K * N0 / (N0 + (K - N0) * exp(-r * D))
o1 <- fu * (1 - fu * exp(-r * D)) / K
o2 <- fu**2 * (K - N0) * D * exp(-r * D) / K / N0
dfu <- matrix(c(o1, o2), nrow=2, ncol=n, byrow=TRUE)
return(2 * dfu %*% (B - fu))
}
ddF <- function(u) {
15
H <- matrix(0, 2, 2)
h <- .Machine$double.eps**(1 / 3)
for(i in 1:2) {
for(j in 1:2) {
u2 <- u1 <- u
u2[j] <- u2[j] + h
u1[j] <- u1[j] - h
H[i,j] <- (dF(u2) - dF(u1))[i,1] / 2 / h
}
}
H <- 0.5 * H + 0.5 * t(H) # to guarantee symmetry...
return(H)
}
o.nr <- mv.newton(dF, ddF, c(800, 0.1)) # ’mv.newton’ from course website
print(o.nr)
g <- function(t) o.nr$x[1] * N0 / (N0 + (o.nr$x[1] - N0) * exp(-o.nr$x[2] * t))
curve(g, col=2, add=TRUE)
16