WEBVTT
Kind: captions
Language: en
00:00:17.949 --> 00:00:30.370
Welcome students to the 7th lecture of the
MOOC's series on statistical inference. In
00:00:30.370 --> 00:00:42.000
the last 6 lectures , I have covered some
very basics of statistical inference including
00:00:42.000 --> 00:00:52.660
simple random sampling, with replacement and
without replacement, and also some probability
00:00:52.660 --> 00:01:06.640
distributions namely chi square t and f which
are very important from the inference point
00:01:06.640 --> 00:01:18.970
of view as we will see later .
Now, let us look at some basics of statistical
00:01:18.970 --> 00:01:51.600
inference . You know that our aim is to estimate
some population parameters
00:01:51.600 --> 00:02:32.680
with the help of some sample statistic .
So, if theta is the parameter of the population
00:02:32.680 --> 00:02:58.340
such as mean variance .
And we want to estimate the value of theta
00:02:58.340 --> 00:03:17.230
, then what we do? First we take a sample
x 1 x 2 up to xn.
00:03:17.230 --> 00:03:43.749
Then we compute the value of and appropriate
statistic .
00:03:43.749 --> 00:04:14.090
Let t x 1 x 2 xn be a statistic to estimate
theta . Now there can be
00:04:14.090 --> 00:04:26.550
many different functions
00:04:26.550 --> 00:04:47.820
of x 1 x 2 xn that we can compute . Therefore,
question is how do you choose an appropriate
00:04:47.820 --> 00:05:20.250
statistic .
Therefore a statistic t of course, of x 1
00:05:20.250 --> 00:05:41.640
of x n should have some desirable properties
. I talk about the simplest of the property
00:05:41.640 --> 00:05:55.030
and easiest of the property to understand
.
00:05:55.030 --> 00:06:02.460
And many of you might have rightly guessed
, the property that I am talking about is
00:06:02.460 --> 00:06:40.130
unbiased ness . What does it mean ? A statistic
t x 1 up to xn is said to be unbiased
00:06:40.130 --> 00:07:02.360
to for parameter theta if expected value of
t x 1 to xn is equal to theta; that is, our
00:07:02.360 --> 00:07:11.930
aim is to estimate the parameter theta, we
have taken a sample x 1 x 2 up to xn, we compute
00:07:11.930 --> 00:07:23.330
the value of t, the expected value of that
statistic is unbiased for theta, then we call
00:07:23.330 --> 00:07:31.920
that particular statistic to be unbiased for
theta, and that value that we compute out
00:07:31.920 --> 00:07:45.930
of the sample taken may be considered an appropriate
value for theta or an estimate for theta.
00:07:45.930 --> 00:08:06.410
Have we seen such unbiased estimator ?
00:08:06.410 --> 00:08:23.370
What do you think ? Yes, we did. For example,
consider simple random sampling with replacement
00:08:23.370 --> 00:08:52.160
from a finite population
X 1 X 2 X N. What we know that we take a sample
00:08:52.160 --> 00:09:06.590
x 1 x 2 xn.
Therefore, expected value of x 1 is equal
00:09:06.590 --> 00:09:33.680
to 1 by n x 1 plus xn is equal to mu is equal
to population mean. Therefore, x 1 is unbiased
00:09:33.680 --> 00:10:09.540
for mu. In a similar way , each xi i is equal
to 2 to n is also unbiased for mu. Because
00:10:09.540 --> 00:10:15.580
expected value of each xi is going to be population
mean.
00:10:15.580 --> 00:10:25.210
And this is true not for simple random sampling
with replacement, even if we do without replacement,
00:10:25.210 --> 00:10:32.750
we have seen earlier that each one of them
will actually be an unbiased estimator for
00:10:32.750 --> 00:10:40.450
mu. Because each xi irrespective of whether
it is with replacement or without replacement.
00:10:40.450 --> 00:10:59.209
We will take the values x 1 x 2 and x capital
N each with probability 1 by n. Not only this
00:10:59.209 --> 00:11:12.840
, can you find some other unbiased estimator
for mu? So, I give you a few consider x 1
00:11:12.840 --> 00:11:21.280
plus 2 x 2 by 3 .
What is going to be it is expectation? It
00:11:21.280 --> 00:11:28.540
is expectation is going to be for this x 1
we will get mu from this x 2 we will get 2
00:11:28.540 --> 00:11:40.260
mu . So, the sum is 3 mu divided by 3 is equal
to mu . In a similar way, let us consider
00:11:40.260 --> 00:11:57.290
x 1 plus x 3 plus 2 x 5 plus x 10 divided
by 10 . What will be it is expectation? So,
00:11:57.290 --> 00:12:04.680
expectation of x 1 will be mu, this will give
another mu, this will give 2 mu, and this
00:12:04.680 --> 00:12:15.270
will give some 6 mu. So, the sum is going
to be 10 mu, that divided by 10 is equal to
00:12:15.270 --> 00:12:30.420
mu.
In fact, if we consider
00:12:30.420 --> 00:12:51.610
sigma wi xi i is equal to 1 to n; such that
sigma wi is equal to 1, then expected value
00:12:51.610 --> 00:13:15.650
of sigma wi xi is equal to mu, that is very
clear. Therefore, each such combination
00:13:15.650 --> 00:13:30.040
gives
an unbiased estimator
00:13:30.040 --> 00:13:41.110
for mu.
Thus we can see , we can find many different
00:13:41.110 --> 00:14:02.750
unbiased estimator. Therefore, the question
is which of the infinitely many
00:14:02.750 --> 00:14:25.110
unbiased estimators
we should use as an estimate for mu .
00:14:25.110 --> 00:14:38.170
The most important concept here is the variance
of the estimator .
00:14:38.170 --> 00:15:21.731
Under SRSWR all the
samples are independent , therefore, variance
00:15:21.731 --> 00:15:45.800
of sigma wi xi i is equal to 1 to n is equal
to sigma wi square i is equal to 1 to n into
00:15:45.800 --> 00:16:09.990
sigma square. Where sigma square is the population
variance . So, the question is how do we choose
00:16:09.990 --> 00:16:20.220
the weights of the different samples. So,
that the linear combination wi xi is an unbiased
00:16:20.220 --> 00:16:34.560
estimator for mu, and variance of sigma wi
xi is the minimum .
00:16:34.560 --> 00:16:53.470
Consider n is equal to 2.
Therefore we want to choose w 1 and w 2 such
00:16:53.470 --> 00:17:13.299
that w 1 plus w 2 is equal to 1 . And w 1
square plus w 2 square is minimum. Do we know
00:17:13.299 --> 00:17:25.829
the answer? We know, but let me still work
it out. So, we are minimizing w 1 square plus
00:17:25.829 --> 00:17:36.710
w 2 square . By putting w 2 is equal to 1
minus w 1 from here is equal to w 1 square
00:17:36.710 --> 00:17:46.960
plus 1 minus w 1 square.
In order to minimize this, let us differentiate
00:17:46.960 --> 00:17:59.899
it with respect to
00:17:59.899 --> 00:18:16.830
w 1, what we get ? 2 times w 1 plus 2 times
1 minus w 1 into minus 1 because of this minus
00:18:16.830 --> 00:18:40.090
sign is equal to 0 .
Or 2 times w 1 plus 2 times w 1 minus 2 is
00:18:40.090 --> 00:18:55.480
equal to 0 . Or 4 times w 1 is equal to 2
or w 1 is equal to half . Therefore, w 2 is
00:18:55.480 --> 00:19:05.509
equal to half as well .
What does it tell you? It tells us that if
00:19:05.509 --> 00:19:32.429
we take 2 samples , then
the weights should be 1 by 2 and 1 by 2; that
00:19:32.429 --> 00:19:53.869
is, w 1 plus w 2 by 2 has minimum variance
. And what is going to be that variance, we
00:19:53.869 --> 00:20:03.809
know that that variance is going to be sigma
square by 2. Or in other words if I take 2
00:20:03.809 --> 00:20:10.889
sample , then arithmetic mean of the sample
values is going to be the minimum variance
00:20:10.889 --> 00:20:22.359
unbiased estimator for mu.
Similarly, we can show
00:20:22.359 --> 00:20:49.899
that if we take a n samples x 1 x 2 up to
xn , then the linear combination
00:20:49.899 --> 00:21:18.910
that gives the minimum variance
is x 1 plus x 2 plus xn by n; that is, x bar
00:21:18.910 --> 00:21:35.009
that is the sample mean . Therefore, sample
mean is not only unbiased estimator for mu,
00:21:35.009 --> 00:21:46.769
it is also having minimum variance among all
linear combinations of the sample values x
00:21:46.769 --> 00:22:11.230
1 x 2 xn .
If we consider SRSWOR we have found what is
00:22:11.230 --> 00:22:24.010
the variance of x bar? Do you remember ? It
is sigma square by n into 1 minus n minus
00:22:24.010 --> 00:22:45.639
1 upon n minus 1 . What does it mean ? It
means that as n increases the variance of
00:22:45.639 --> 00:23:07.629
x bar is getting reduced .
00:23:07.629 --> 00:23:15.659
Because as n increases this value decreases,
and not only these value decreases as n minus
00:23:15.659 --> 00:23:22.519
1 increases, 1 minus n minus 1 upon capital
N minus 1 also decreases. Therefore, the overall
00:23:22.519 --> 00:23:29.769
variance keeps on reducing as we keep on taking
more and more samples.
00:23:29.769 --> 00:23:44.129
In particular, if n is equal to capital N;
that means, my population size is capital
00:23:44.129 --> 00:23:53.009
N, I am taking small n many samples, but basically
here I am seeking , here I am saying that
00:23:53.009 --> 00:24:01.479
I am taking capital N many samples, and since
it is SRSWOR. What does it mean? It means
00:24:01.479 --> 00:24:09.649
I have taken the entire population as my sample.
And therefore, the sample mean is same as
00:24:09.649 --> 00:24:22.350
the population mean, which is mu and the variances
is ; this is n minus 1. So, this part becomes
00:24:22.350 --> 00:24:28.380
0 therefore, variance becomes 0 , which is
understandable.
00:24:28.380 --> 00:24:36.519
Because if I consider the entire population,
and take it is mean then it has to be same
00:24:36.519 --> 00:24:44.499
as the population mean, therefore, there is
no deviation from the population mean and
00:24:44.499 --> 00:24:53.549
therefore, variance is going to be 0; that
is, therefore, there is no dispersion and
00:24:53.549 --> 00:25:02.909
we get the sample mean equal to population
mean and therefore, variance of sample mean
00:25:02.909 --> 00:25:14.359
is 0.
Now, you may ask me why do we need small variance
00:25:14.359 --> 00:25:25.100
?
Because variance is a measure of dispersion?
00:25:25.100 --> 00:25:37.070
So, smaller is the variance; that means, my
estimate is very close to the parameter that
00:25:37.070 --> 00:26:00.279
it is estimating . This can be seen from chebyshevs
inequality ,
00:26:00.279 --> 00:26:08.729
are you familiar with chebyshevs inequality?
I hope all of you have done in the first course
00:26:08.729 --> 00:26:15.659
that you might have done on probability the
concept of chebyshevs inequality. If you do
00:26:15.659 --> 00:26:25.149
not know I am keeping it as a practice problem
in the tutorial sheet one, you should try
00:26:25.149 --> 00:26:32.659
and prove chebyshevs inequality. This is not
to be graded by us it is for your own knowledge,
00:26:32.659 --> 00:26:38.549
that you try to prove chebyshevs inequality
by studying some material.
00:26:38.549 --> 00:26:44.929
If we cannot, we will upload the solution
. But the chebyshevs inequality says that
00:26:44.929 --> 00:27:02.860
if expectation of x is equal to mu , then
probability modulus of X minus mu greater
00:27:02.860 --> 00:27:15.889
than epsilon is less than equal to variance
of X upon epsilon square; that means, mu is
00:27:15.889 --> 00:27:25.960
the expectation that probability X minus mu
greater than epsilon so, if we take this epsilon;
00:27:25.960 --> 00:27:33.929
that means, that the probability that x will
lie outside this , outside this limit, that
00:27:33.929 --> 00:27:39.879
is going to be less than equal to variance
of x upon epsilon square.
00:27:39.879 --> 00:27:48.679
Therefore, as variance of x gets smaller,
the probability that it is going beyond that
00:27:48.679 --> 00:27:56.190
gets smaller, or in other words it says that
that x will remain within epsilon distance
00:27:56.190 --> 00:28:04.460
of the mean that probability increases if
the variance of x gets smaller.
00:28:04.460 --> 00:28:32.239
Now, in practice ,
mostly we get very large population we are
00:28:32.239 --> 00:28:43.710
sampling
00:28:43.710 --> 00:28:58.320
x 1 x 2 xn from
the population. Obviously, if the population
00:28:58.320 --> 00:29:08.779
is large we cannot take a very small sample.
We have to take larger sample otherwise; we
00:29:08.779 --> 00:29:14.139
cannot get meaningful estimate of the population
parameter.
00:29:14.139 --> 00:29:21.919
For example, if you want to compute the average
income of the people of Delhi, we cannot just
00:29:21.919 --> 00:29:28.129
take few samples and based on the average
of the sample we can say that is going to
00:29:28.129 --> 00:29:35.759
be the average income of Delhi population;
no, that will not work we have to take meaningful
00:29:35.759 --> 00:29:43.759
size sample. So, that it represents the population.
Therefore, what will happen the sample size
00:29:43.759 --> 00:29:58.779
is going to increase , n increases, right?
And if n increases we have something called
00:29:58.779 --> 00:30:19.900
central limit theorem .
Again I expect all of you have some basic
00:30:19.900 --> 00:30:25.859
idea of central limit theorem, I am stating
one simple version of this theorem there is
00:30:25.859 --> 00:30:32.190
not a single central limit theorem there are
different versions. But the simple version
00:30:32.190 --> 00:31:02.059
that we will be using is by lindeberg and
levy . And it says that let X 1 Xn be n independent
00:31:02.059 --> 00:31:30.409
random variables, all of which have the same
distribution . Let expected value of Xi is
00:31:30.409 --> 00:31:40.200
equal to mu and variance of Xi is equal to
sigma square for all i.
00:31:40.200 --> 00:31:50.009
What does it mean? It means that each of the
Xi has the same expectation, each of the Xi
00:31:50.009 --> 00:32:05.870
has the same variance .
Then consider s is equal to sigma xi i is
00:32:05.870 --> 00:32:28.330
equal to 1 to n. Therefore, as let me call
it Sn what is Sn? Sn is the sum of n samples.
00:32:28.330 --> 00:32:42.769
Now if n increases, central limit theorem
suggests that Sn minus n mu upon root over
00:32:42.769 --> 00:33:19.190
n sigma converges to normal 0 1 in distribution.
That is, if we call these to be Tn probability
00:33:19.190 --> 00:33:36.960
limit n going to infinity probability Tn less
than x is equal to minus infinity to x fx
00:33:36.960 --> 00:33:56.370
dx, where f is pdf of normal 0 1. So, what
does it mean ? That if I take sample of size
00:33:56.370 --> 00:34:05.330
n, then as n increases Sn minus n mu upon
root n sigma converges to normal 0 1.
00:34:05.330 --> 00:34:21.140
Or in other words, sn therefore
converges to normal with mean n mu, and variance
00:34:21.140 --> 00:34:35.179
n sigma square, ok that is the advantage.
Therefore if we take more and more samples
00:34:35.179 --> 00:34:46.010
we can approximate the distribution of the
sum of the random variable using normal distribution.
00:34:46.010 --> 00:34:57.869
This has a lot of convenience for us, because
then we can approximate many of the statistic
00:34:57.869 --> 00:35:05.950
using normal, and that is very important because
in the last few classes, we have studied normal
00:35:05.950 --> 00:35:12.990
distribution in depth, and we have also obtained
a family of distributions like chi square
00:35:12.990 --> 00:35:24.119
n Tn F m comma n which all depend upon normal
distribution. That is why in large sample
00:35:24.119 --> 00:35:30.760
theory the normal distribution is of prime
importance .
00:35:30.760 --> 00:35:52.029
So, let us examine some more properties of
normal distribution .
00:35:52.029 --> 00:36:07.190
What we have seen? We have seen that if x
is normal with mu sigma square , then a plus
00:36:07.190 --> 00:36:22.900
bx is normal with a plus b mu comma b square
sigma square. This is something that we have
00:36:22.900 --> 00:36:40.170
already proved. Now let us consider a different
problem if X 1 and X 2 are normal 0 1 and
00:36:40.170 --> 00:36:56.510
independent
what is the , distribution of
00:36:56.510 --> 00:37:09.339
X 1 plus X 2? This is very simple.
Because, we know that moment generating function
00:37:09.339 --> 00:37:24.349
of X 1 plus X 2 is equal to moment generating
function of X 1 multiplied by moment generating
00:37:24.349 --> 00:37:33.890
function of X 2.
And we know that moment generating function
00:37:33.890 --> 00:37:57.359
of x 1 is e to the power t square by 2. Similarly,
MGF of x 2 e to the power t square by 2. Therefore,
00:37:57.359 --> 00:38:16.309
this is is equal to e to the power 2 t square
upon 2, is equal to e to the power
00:38:16.309 --> 00:38:35.680
half into 2 t square. Now this is the MGF
of a normal population, normal with mean 0
00:38:35.680 --> 00:38:54.380
, variance is equal to 2, right ? Because
we know that MGF of normal mu sigma square
00:38:54.380 --> 00:39:04.630
is equal to e to the power mu t plus half
sigma square t square.
00:39:04.630 --> 00:39:11.750
Here mu is equal to 0 so, you are looking
at only e to the power half sigma square t
00:39:11.750 --> 00:39:21.940
square and that sigma square is coming out
to be 2. Therefore, X 1 plus X 2 distributed
00:39:21.940 --> 00:39:31.710
as with mean 0 and variance 2. We can even
elaborate it more further.
00:39:31.710 --> 00:39:49.460
Suppose we consider
X 1 to be normal with mu 1 sigma 1 square
00:39:49.460 --> 00:40:08.380
X 2 is normal with mu 2 sigma 2 square, then
what is the distribution of
00:40:08.380 --> 00:40:21.589
X 1 plus X 2.
As before MGF of X 1 plus X 2 at t is equal
00:40:21.589 --> 00:40:34.680
to e to the power mu 1 t plus half sigma 1
square t square multiplied by e to the power
00:40:34.680 --> 00:40:45.859
mu 2 t plus half sigma 2 square t square.
Therefore, this is equal to e to the power
00:40:45.859 --> 00:40:58.501
mu 1 plus mu 2 t plus half sigma 1 square
plus sigma 2 square into t square .
00:40:58.501 --> 00:41:16.279
Therefore, it says that
if x 1 is normal with some arbitrary mean
00:41:16.279 --> 00:41:23.220
and arbitrary variance, and x 2 is normal
with some other arbitrary mean and arbitrary
00:41:23.220 --> 00:41:32.819
variance, then if x 1 and x 2 are independent
,
00:41:32.819 --> 00:41:43.130
then x 1 plus x 2 is distributed as normal
with mu 1 plus mu 2 and variance with sigma
00:41:43.130 --> 00:42:03.150
1 square plus sigma 2 square .
I have proved it for 2, we can use induction
00:42:03.150 --> 00:42:25.569
that if X 1 X 2 Xn are independent such that
Xi is normal with mu i and sigma i square,
00:42:25.569 --> 00:42:39.970
then x 1 plus x 2 up to xn is distributed
as normal with sigma mu i and variance is
00:42:39.970 --> 00:42:47.029
equal to sigma i square 1 to n.
So, this is a very strong result, and that
00:42:47.029 --> 00:42:56.510
helps us a lot, because when we are taking
samples from a population, we know that if
00:42:56.510 --> 00:43:04.289
the sample size is large we can sort of approximate
it with normal, and not only that we can find
00:43:04.289 --> 00:43:10.010
the distribution of the sum of the samples,
and therefore, from here we can calculate
00:43:10.010 --> 00:43:19.930
the um sample mean and we can see that sample
mean will be distributed as normal as well.
00:43:19.930 --> 00:43:27.299
Ok students now, we can understand the utility
of normal distribution. And we have already
00:43:27.299 --> 00:43:37.140
seen that chi squared T F are all derived
from normal distribution. In particular, the
00:43:37.140 --> 00:44:04.599
chi squared distribution
is very important as it is defined as sum
00:44:04.599 --> 00:44:19.970
of square
of normal 0 1 variants. In the next lecture,
00:44:19.970 --> 00:44:29.940
I will be talking about estimation of sigma
square which is the population variance. And
00:44:29.940 --> 00:44:41.579
we will be seen that we can use chi square
distribution there for estimating the population
00:44:41.579 --> 00:44:50.480
variance.
So, so far what we have studied that if x
00:44:50.480 --> 00:45:14.490
1 x 2 xn are samples from normal distribution,
then sigma xi is normal with mean is equal
00:45:14.490 --> 00:45:25.990
to n mu and variance is equal to sigma square
by variance is equal to n sigma square. Therefore,
00:45:25.990 --> 00:45:41.910
x bar which is sigma xi by n will be distributed
as
00:45:41.910 --> 00:45:49.490
normal with since I am dividing it by n I
can divide it by n here. That is going to
00:45:49.490 --> 00:46:07.500
be mu, and since I am dividing by n the variance
is going to be divided by n square.
00:46:07.500 --> 00:46:19.680
Therefore, sample mean or expectation of sample
mean is going to be mu. Or sample mean is
00:46:19.680 --> 00:46:25.820
going to be an unbiased estimator for population
mean, and as n increases the variance will
00:46:25.820 --> 00:46:36.200
be decreasing. So, so far we got an estimator
for population mean. Our next target is population
00:46:36.200 --> 00:46:45.420
variance that is sigma square. And therefore,
we need to find estimate for sigma square.
00:46:45.420 --> 00:46:54.560
So, I stop here now, in the next lecture I
shall talk about how to find unbiased estimator
00:46:54.560 --> 00:47:00.599
for sigma square which is the population variance.
Thank you .