Stata Walkthrough #3: Sampling Distributions We are going to create a bunch of simulated random samples. Here’s the scenario: X is a random variable, which (in truth) is distributed with mean zero and standard deviation of one. Each of our samples will consist of ten observations of X, which we will call x1, x2, x3, x4, x5, x6, x7, x8, x9, and x10. We will create 1000 samples like this. Keep in mind that for this experiment, each “observation” in Stata is a different sample. In reality, as a researcher, you would have only one of these samples. However, what we are trying to see if what would happen if a thousand researchers collected a thousand random samples, and each researcher created his own estimate of the mean and a confidence interval for this estimate.1 Just as a note, any command listed in italics is something that I will not expect you to commit to memory. First, create your 1000 samples: drawnorm x1 x2 x3 x4 x5 x6 x7 x8 x9 x10, n(1000) Each researcher has been asked to come up with an estimate of the mean, and a confidence interval for your estimate. You can calculate the mean of each researcher’s sample: gen xbar = (x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10)/10 You can look at the distribution of xbar. Almost never is it exactly zero: list xbar However, theory tells us that the average of these many sample means should be close to zero; µ X = µ X . We can check that this is true: 1 This type of experiment is often called a “Monte Carlo” simulation, and we use it to verify that statistical theory is correct. In this case, the prediction is that the 95% confidence interval will contain the true mean, 95% of the time. In a Monte Carlo simulation we create a bunch of random samples, where we know the true value of the parameters. We then estimate this parameter for each of the samples, and see whether the prediction is true for these samples. summ xbar The average across these 1000 samples might not be exactly zero, but it should be pretty darned close. Also, we know that the standard deviation in xbar should be ! X = ! X / N = 1 / 10 = 0.3162 . This should also match closely with your standard deviation in xbar. Here’s what I got for my 1000 researchers: . summ xbar Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------xbar | 1000 .0082488 .3030376 -1.314279 .9897205 Next, each researcher will calculate a 95% confidence interval for xbar, which is X ± t 0.05 /2 ! (s X / N " 1) , according to the formula on p. 236 of the textbook. (Why is it t 0.05 /2 ? We want the 95% confidence interval, and 0.05 = (100 ! 95) / 100 .) In order to calculate this interval, each researcher needs to know the s X2 and s X for his sample: gen varx = ( (x1-xbar)^2 + (x2-xbar)^2 + (x3-xbar)^2 + (x4xbar)^2 + (x5-xbar)^2 + (x6-xbar)^2 + (x7-xbar)^2 + (x8xbar)^2 + (x9-xbar)^2 + (x10-xbar)^2 )/(9) gen sx = varx^0.5 (Remember not to confuse s X and ! X ; they both standard deviations, but in different random variables.) When we have 9 = N ! 1 “degrees of freedom” in our sample, the critical value of t 0.025 is 2.262, according to the table in Appendix II. You can also get Stata to show you this value: display invttail(9,0.025) Anyhow, the final task is for each researcher to form a confidence interval. He will set the lower and upper bounds as: gen lowerx = xbar – 2.262 * sx/9^0.5 gen upperx = xbar + 2.262 * sx/9^0.5 This is the 95% confidence interval. This (supposedly) means that for 95% of these samples, the true average (zero) will lie within this interval. Let’s see if this is true. We’ll create a dummy variable that equals one if the researcher’s confidence interval contains the true value: gen containszero = (lowerx < 0 & 0 < upperx) Now let’s look at the distribution of this variable: tab containszero You should find that about 95% of researchers gave confidence intervals that did, in fact, contain the true mean (zero) inside them. In my database: containszer | o | Freq. Percent Cum. ------------+----------------------------------0 | 48 4.80 4.80 1 | 952 95.20 100.00 ------------+----------------------------------Total | 1,000 100.00 You might have slightly different percentages, but approximately 95% of your researchers should have gotten it right.
© Copyright 2026 Paperzz