WEBVTT
Kind: captions
Language: en
00:00:17.759 --> 00:00:27.710
Welcome students to the MOOCs lecture on Statistical
Inference . This is lecture number 12. And
00:00:27.710 --> 00:00:45.570
today what I will start is called Theory of
Estimation . As I told you in my earlier classes
00:00:45.570 --> 00:01:02.809
that the most important aspect of statistics
is statistical inference . To learn about
00:01:02.809 --> 00:01:29.310
the population parameter with the help of
sample statistics .
00:01:29.310 --> 00:01:45.729
And I have mentioned that there are two basic
ways of statistical inference, one is estimation
00:01:45.729 --> 00:02:04.179
of parameters . Here, the underlying assumption
is that the form of the distribution is known,
00:02:04.179 --> 00:02:10.940
and our job is to identify the parameters
of the distribution .
00:02:10.940 --> 00:02:21.629
And the other one is testing of hypothesis
.
00:02:21.629 --> 00:02:33.150
Here we want to check whether the sample gives
us enough evidence, so that some hypothesis
00:02:33.150 --> 00:02:45.280
can be accepted . Here, by hypothesis we mean
some pre assigned values of the parameters.
00:02:45.280 --> 00:03:04.610
As you can understand now I am going to do
on estimation. So, what is an estimator? If
00:03:04.610 --> 00:03:39.109
x 1 up to x n is a sample from a population,
our aim is to estimate the value of some parameter
00:03:39.109 --> 00:04:00.200
theta of the distribution . So, what is theta,
theta is something, so that if we know the
00:04:00.200 --> 00:04:06.489
value, we can understand the distribution
completely.
00:04:06.489 --> 00:04:20.750
For example, Bernoulli distribution . Here
P is the only unknown parameter. If we know
00:04:20.750 --> 00:04:36.000
the value of P, then we know what the distribution
is. So, in this case theta is equal to P . Similarly,
00:04:36.000 --> 00:04:46.740
for geometric
here also the parameter is P, therefore theta
00:04:46.740 --> 00:05:04.950
is equal to P, but suppose we consider normal
mu sigma square. Here, in order to know the
00:05:04.950 --> 00:05:21.610
distribution completely we need to know the
value of both mu and sigma square . Therefore,
00:05:21.610 --> 00:05:39.130
here theta is new sigma or say sigma square
ok, or in other words we are looking at a
00:05:39.130 --> 00:05:59.900
bivariate parameter . You often find the term
capital theta is called the parameter space
00:05:59.900 --> 00:06:09.680
such that theta can belong to theta .
For example, when we are looking at P the
00:06:09.680 --> 00:06:23.550
capital theta is equal to 0 to 1 on the real
line . When we are looking at mu, mu can belong
00:06:23.550 --> 00:06:35.920
to minus infinity to infinity or on the real
line, sigma square may belong to 0 to infinity
00:06:35.920 --> 00:06:47.680
on the real line, so that is the parameter
space to which that parameter of the distribution
00:06:47.680 --> 00:07:07.460
can belong to we have seen that, if
00:07:07.460 --> 00:07:35.980
we consider a finite population
X 1, X2 up to X N . And we take sample x 1,
00:07:35.980 --> 00:07:49.110
x 2 up to x n, then expected value of each
x i is equal to mu is equal to the population
00:07:49.110 --> 00:08:25.710
mean . Therefore, each x i can be an estimator
for theta is equal to mu. Thus there can be
00:08:25.710 --> 00:08:53.610
a large number of estimators . Hence, the
question comes, which of the different estimators
00:08:53.610 --> 00:09:07.510
we should choose for our purpose.
Hence, the focus is on
00:09:07.510 --> 00:09:30.000
the properties of an estimator . So, if an
estimator satisfies those properties, then
00:09:30.000 --> 00:09:47.060
we consider it to be better suited for our
purpose . One such property is you have already
00:09:47.060 --> 00:10:21.900
seen unbiasedness . A statistic T x 1, x 2,
x n what does it mean, it means that I have
00:10:21.900 --> 00:10:30.339
taken the samples x 1, x 2, x n. And T is
the function that is defined on the sample.
00:10:30.339 --> 00:10:51.550
So, once the sample is taken we can compute
T. It is said to be unbiased for parameter
00:10:51.550 --> 00:11:09.360
theta. If expected value of T x 1, x n is
equal to theta .
00:11:09.360 --> 00:11:50.170
Therefore, from my earlier example each x
i is an unbiased estimator
00:11:50.170 --> 00:12:12.580
for mu .
What about the following
00:12:12.580 --> 00:12:38.080
I have the sample x 1, x 2, x n . So let me
consider, x 1 plus x 2 by 2 . b, x 1 plus
00:12:38.080 --> 00:13:07.910
x 2 plus x n by n . c, x 1 plus x 2 plus x
3 by 2 . d, x 1 plus 2 x 2 plus n x n by n
00:13:07.910 --> 00:13:21.000
into n plus 1 by 2 .
Let us see their expectations, expected value
00:13:21.000 --> 00:13:35.890
of x 1 plus x 2 by 2, because of the linearity
is equal to half of expected value of x 1
00:13:35.890 --> 00:13:51.460
plus expected value of x 2 is equal to half
times mu plus mu is equal to mu . Therefore,
00:13:51.460 --> 00:14:22.190
x 1 plus x 2 by 2 is unbiased for mu . 2,
x 1 plus x 2 plus x n by n or in other words
00:14:22.190 --> 00:14:47.240
we are looking at sample mean . Its expectation
is equal to 1 by n times, expectation of x
00:14:47.240 --> 00:15:01.110
1 plus up to expectation of x n is equal to
1 by n times n times mu is equal to mu . Therefore,
00:15:01.110 --> 00:15:24.300
sample mean is also an unbiased estimator
for mu .
00:15:24.300 --> 00:15:39.060
x 1 plus x 2 plus x 3 by 2 you can now easily
understand that its expectation is equal to
00:15:39.060 --> 00:15:55.149
half into mu plus mu plus mu is equal to 3
by 2 mu. Therefore, x 1 plus x 2 plus x 3
00:15:55.149 --> 00:16:21.220
by 2 is not unbiased for mu . Number 4, x
1 plus 2 x 2 plus 3 x 3 up to n x n divided
00:16:21.220 --> 00:16:37.720
by n into n plus 1 by 2 . Its expectation
is equal to 1 upon n in to n plus 1 by 2 multiplied
00:16:37.720 --> 00:16:57.030
by mu plus 2 mu plus n mu is equal to mu n
into n plus 1 by 2 into 1 plus 2 plus n . And
00:16:57.030 --> 00:17:07.339
we know that the sum of first n natural numbers
is n into n plus 1 by 2, therefore this cancels
00:17:07.339 --> 00:17:21.679
with this . Therefore, the expectation of
x 1 plus 2 x 2 up to n x n upon n into n plus
00:17:21.679 --> 00:17:36.100
1 by 2 is equal to mu . Therefore, this is
also an unbiased estimator for the population
00:17:36.100 --> 00:17:48.990
mean mu .
Another example consider Bernoulli P what
00:17:48.990 --> 00:17:59.010
is a Bernoulli distribution we know that.
If I toss a coin, then P is the probability
00:17:59.010 --> 00:18:15.910
of head or mathematically . If x follows Bernoulli
of P, then x can take two values 1 with probability
00:18:15.910 --> 00:18:45.429
P, and 0 with probability 1 minus P . What
is an estimator for P. Suppose, we take
00:18:45.429 --> 00:19:16.480
n samples, then consider x 1 plus x 2 plus
x n is equal to sample sum .
00:19:16.480 --> 00:19:35.929
We know that x 1 plus x 2 plus x n is distributed
as binomial with n comma. Therefore, expected
00:19:35.929 --> 00:19:52.640
value of x 1 plus x n is equal to n P . Therefore,
expected value of x 1 plus x 2 plus x n by
00:19:52.640 --> 00:20:20.720
n is equal to n p by n is equal to p . Therefore,
mean of the samples is unbiased for P . Let
00:20:20.720 --> 00:20:37.169
me, warn you on a point .
Unbiasedness does not mean
00:20:37.169 --> 00:21:13.590
that the obtained value will be equal to the
parameter theta . Example suppose, we toss
00:21:13.590 --> 00:21:51.419
a coin n times, and we obtain the sample mean
after each toss . So, consider n is equal
00:21:51.419 --> 00:22:06.010
to say 1, 2, 3, 4, 5, 6 . Suppose, I am looking
at 6 tosses, and outcomes are that is x i
00:22:06.010 --> 00:22:23.230
are say 1, 0, 0, 1, 1, 0, . Therefore, x bar
is equal to 1, because it is the mean of one
00:22:23.230 --> 00:22:37.710
sample it is 0.5 at this point, at this point
it is 1 by 3, at this point it is again 0.5
00:22:37.710 --> 00:22:47.310
or half, at this point it is 3 by 5, at this
point it is again half .
00:22:47.310 --> 00:22:58.720
So, depending upon how many tosses you have
actually done, the obtained value of the statistic
00:22:58.720 --> 00:23:10.220
can change. It is not mandatory that they
will be equal to the actual value of the parameter,
00:23:10.220 --> 00:23:19.740
but what we are saying is that, if we consider
its expectation, that is going to be equal
00:23:19.740 --> 00:23:46.669
to P. I hope the distinction is clear .
We have also seen that, if theta is equal
00:23:46.669 --> 00:24:08.850
to sigma square is equal to population variance,
then sample variance a square is equal to
00:24:08.850 --> 00:24:29.900
1 by n sigma x i minus x bar whole square
is not unbiased for sigma square. But, S square
00:24:29.900 --> 00:24:47.490
is equal to 1 upon n minus 1 sigma x i minus
x bar whole square is unbiased for sigma square.
00:24:47.490 --> 00:25:05.750
Therefore, given a parameter, if we want to
estimate it, we can get many different statistic
00:25:05.750 --> 00:25:18.600
. Such that each of them is an unbiased estimator
for the population parameter under consideration,
00:25:18.600 --> 00:25:41.580
which we generally denote as theta .
Therefore, the question is, which of the different
00:25:41.580 --> 00:26:03.340
estimators we should choose. An important
property
00:26:03.340 --> 00:26:49.010
here is consistency definition, an estimator
T x 1, x 2, x n is said to be consistent for
00:26:49.010 --> 00:27:20.780
estimating theta. If probability T x 1, x
2, x n minus theta less than epsilon is greater
00:27:20.780 --> 00:27:44.970
than one minus eta for all n, greater than
equal to n naught for given epsilon, and eta
00:27:44.970 --> 00:28:04.580
greater than 0, however small they are . Let
us, understand what it means .
00:28:04.580 --> 00:28:26.770
We are given
two positive quantities epsilon and eta . They
00:28:26.770 --> 00:28:51.909
can be very very small, but greater than 0
. So, the definition says, that probability
00:28:51.909 --> 00:29:06.370
T of x 1, x 2, x n the obtained value of the
statistic from the sample of size n minus
00:29:06.370 --> 00:29:15.840
theta less than epsilon . So, this is the
event. We have the n samples, we have calculated
00:29:15.840 --> 00:29:23.669
the statistic, and we are saying that, this
statistic should be very close to the actual
00:29:23.669 --> 00:29:32.740
value of the parameter . How much close that,
the absolute difference between them is less
00:29:32.740 --> 00:29:42.280
than epsilon that, probability can be made
arbitrarily large that is that is greater
00:29:42.280 --> 00:29:50.980
than equal to 1 minus eta . For all n greater
than n naught .
00:29:50.980 --> 00:30:06.220
So, what it means suppose this is the actual
parameter theta. And we have given an epsilon
00:30:06.220 --> 00:30:16.360
bound around it. And we are saying that, if
we keep on taking the samples, then there
00:30:16.360 --> 00:30:25.210
will be an integer n naught . Such that if
the sample size is greater than n naught,
00:30:25.210 --> 00:30:32.710
then the probability that the obtained value
of the statistic will remain within this interval,
00:30:32.710 --> 00:30:40.400
that probability is going to be arbitrarily
large that is it can be made as large as you
00:30:40.400 --> 00:30:51.070
want that is it is greater than 1 minus eta
that, and eta can be, however small you want
00:30:51.070 --> 00:31:03.100
.
Obviously, this n naught depends upon both
00:31:03.100 --> 00:31:14.000
epsilon, and eta . It is not mandatory that
the same n naught will work for all values
00:31:14.000 --> 00:31:21.320
of epsilon and eta, but given epsilon and
eta we can choose an n naught, or we can find
00:31:21.320 --> 00:31:27.780
an n naught. Such that, if the number of samples
is more than n naught, then the probability
00:31:27.780 --> 00:31:34.929
that this sample mean, sample statistic will
be very close to the parameter that probability
00:31:34.929 --> 00:31:44.540
is going to be very very high .
The question is, how do we obtain a consistent
00:31:44.540 --> 00:32:24.070
statistic . So, I give you a theorem . A statistic
T n is equal to T of x 1 up to x n will be
00:32:24.070 --> 00:32:48.981
consistent . If expected value of T n goes
to theta, as n goes to infinity, and variance
00:32:48.981 --> 00:33:11.990
of T n goes to 0, as n goes to infinity . So,
first thing we note that, T n need not be
00:33:11.990 --> 00:33:31.820
unbiased .
For example, T n is equal to x n plus 1 by
00:33:31.820 --> 00:33:52.010
n . What is the expected value, is this mu
plus 1 by n, and these goes to mu, as n goes
00:33:52.010 --> 00:33:59.990
to infinity, because this is the bias, this
is the quantity by which it is different from
00:33:59.990 --> 00:34:10.109
the parameter or the intended parameter mu
. Therefore, x n plus 1 by n is not an unbiased
00:34:10.109 --> 00:34:20.520
estimator for mu for any n, but as n goes
to infinity, its expected value converges
00:34:20.520 --> 00:34:27.980
to mu . Therefore, in order to be consistent
unbiasedness is not necessary .
00:34:27.980 --> 00:34:47.480
But, we have to also look at that variance
of T n should go to 0, as n goes to infinity
00:34:47.480 --> 00:35:00.130
. We know that, variance gives us the measure
of dispersion . So, if expected value of T
00:35:00.130 --> 00:35:12.260
n converges to the mu, then as if variance
of T n goes to 0, then what we can say that,
00:35:12.260 --> 00:35:24.660
it is coming within arbitrary closeness of
the parameter mu . So, the above is a sufficient
00:35:24.660 --> 00:36:22.140
condition for consistence . If each T n is
unbiased, then we are even better off . So,
00:36:22.140 --> 00:36:33.890
we need to check the variants of T n is going
to 0, as n is going to infinity .
00:36:33.890 --> 00:37:13.650
Let us now consider, we had x 1 plus x 2 by
2, it is unbiased .
00:37:13.650 --> 00:37:27.020
And variance of T n is equal to 1 by 4 sigma
square plus sigma square is equal to sigma
00:37:27.020 --> 00:37:38.660
square by 2 . First, you notice that, even
if we have chosen samples x 1 up to x n, we
00:37:38.660 --> 00:37:46.380
are trying to estimate the parameter on the
basis of only the first two samples, actually
00:37:46.380 --> 00:37:57.270
here we are not using the remaining samples
. Therefore, the variance of T n will not
00:37:57.270 --> 00:38:10.970
change, even if we take many different samples,
it will remain sigma square by 2 . And therefore,
00:38:10.970 --> 00:38:21.640
variance of T n does not go to 0, as n goes
to infinity .
00:38:21.640 --> 00:38:36.461
Suppose, instead of x 1 plus x 2 by 2, I would
have chosen x 1 plus x n by 2 . Then, this
00:38:36.461 --> 00:38:45.380
is also unbiased, also I am taking care of
the large sample, that I have taken, but here
00:38:45.380 --> 00:38:55.680
also variance of T n will not go to 0 . Therefore,
neither this, nor this is a consistent estimator
00:38:55.680 --> 00:39:13.390
of mu in this case .
The second example was x 1 plus x n by x 1
00:39:13.390 --> 00:39:28.460
plus x 2 plus x n by n is equal to x bar.
Expected value of x bar is equal to mu . Therefore,
00:39:28.460 --> 00:39:37.849
the first condition is automatically satisfied
. Variance of x bar we know is equal to sigma
00:39:37.849 --> 00:39:58.070
square by n . Therefore, as n goes to infinity,
variance of T n, which is nothing but x bar
00:39:58.070 --> 00:40:06.220
goes to 0, because sigma square by n sigma
square is fixed, therefore as n increases,
00:40:06.220 --> 00:40:20.460
this goes to 0. Therefore, by the above theorem,
x bar is unbiased, but that is not important,
00:40:20.460 --> 00:40:33.750
and what is important is that, it is consistent
for mu .
00:40:33.750 --> 00:40:53.670
Let me consider this also x 1 plus 2 x 2 plus
n x n upon n into n plus 1 by 2 . What is
00:40:53.670 --> 00:41:13.060
the variance, its expectation is mu, and variance
is equal to 1 upon n into n plus 1 by 2 whole
00:41:13.060 --> 00:41:28.550
square into sigma square plus 4 sigma square
say plus 9 sigma square plus n square sigma
00:41:28.550 --> 00:41:40.599
square is equal to 1 upon n into n plus 1
by 2 whole square multiplied by sigma square
00:41:40.599 --> 00:41:53.359
into 1 square plus 2 square plus up to n square
is equal to 1 upon n into n plus 1 by 2 whole
00:41:53.359 --> 00:42:03.560
square into sum of square of first n natural
numbers is equal to n into n plus 1 into 2
00:42:03.560 --> 00:42:23.270
n plus 1 by 6 multiplied by sigma square .
Which is is equal to 4 by 6 n into n plus
00:42:23.270 --> 00:42:37.720
1 into 2 n plus 1 divided by n square into
n plus 1 square sigma square is equal to 2
00:42:37.720 --> 00:42:53.060
by 3 into one of them cancels 2 n plus 1 upon
n into n plus 1 . Since, the numerator is
00:42:53.060 --> 00:43:05.630
linear in n, but the denominator is quadratic
in n, we know that its limit is 0, as n goes
00:43:05.630 --> 00:43:32.690
to infinity . Therefore, this is consistent
for mu .
00:43:32.690 --> 00:43:51.880
We need to show that, given epsilon and eta
greater than 0, there exist n naught, such
00:43:51.880 --> 00:44:05.589
that for all n greater than n naught probability
modulus of T n minus theta less than epsilon
00:44:05.589 --> 00:44:15.500
is greater than 1 minus eta . So, this is
what we will have to prove, so we are given
00:44:15.500 --> 00:44:42.260
epsilon and eta . First thing that is given
is
00:44:42.260 --> 00:44:55.850
expected value of T n is converges to theta,
as n goes to infinity that means, there exist
00:44:55.850 --> 00:45:12.560
n 1 such that for all n greater than equal
to n 1 modulus of expected value of T n minus
00:45:12.560 --> 00:45:27.650
theta less than epsilon by 2 . So, we have
been given an epsilon, we are trying to identify
00:45:27.650 --> 00:45:37.579
an n 1, such that for all n greater than equal
to n 1 . This difference expected value of
00:45:37.579 --> 00:45:59.320
T n minus theta less than epsilon by 2 .
Again, since variance of T n is going to 0
00:45:59.320 --> 00:46:27.630
. We know from Chebyshev's inequality, probability
modulus of T n minus expected value of T n
00:46:27.630 --> 00:46:41.870
less than epsilon by 2 is greater than equal
to 1 minus 4 epsilon square into variance
00:46:41.870 --> 00:46:56.780
of T n . This is known
because of Chebyshev's inequality . Therefore,
00:46:56.780 --> 00:47:13.170
what we get it, that as variance of T n is
going to 0 . This quantity
00:47:13.170 --> 00:47:43.750
can be made very very small right .
In particular, we can choose or we can find
00:47:43.750 --> 00:48:00.470
n 2, such that for all n greater than equal
to n 2 4 by epsilon square into variance of
00:48:00.470 --> 00:48:14.529
T n is less than eta . This is possible, because
these quantity is going to 0, and eta is fixed,
00:48:14.529 --> 00:48:25.130
that is given to us . Therefore, we can say
that
00:48:25.130 --> 00:48:36.040
for all n greater than equal to n 2 . Probability
modulus of T n minus expected value of T n
00:48:36.040 --> 00:48:48.190
less than epsilon by 2 is greater than equal
to 1 minus 4 by epsilon square into variance
00:48:48.190 --> 00:48:58.089
of T n, which is greater than equal to 1 minus
eta .
00:48:58.089 --> 00:49:09.400
Therefore, what we got, so we have an n 1,
such that for all n greater than equal to
00:49:09.400 --> 00:49:17.800
n 1, expected value of T n minus theta is
less than epsilon by 2 . There is an n 2,
00:49:17.800 --> 00:49:26.010
such that for all n greater than equal to
n 2, probability T n minus expected value
00:49:26.010 --> 00:49:32.190
of T n less than epsilon by 2 is greater than
1 minus eta .
00:49:32.190 --> 00:49:50.619
Or we have to find n naught, such that probability
modulus of T n minus theta less than epsilon
00:49:50.619 --> 00:50:04.020
is greater than 1 minus eta . This is what
we have to find . Let n naught be maximum
00:50:04.020 --> 00:50:19.359
of n 1 and n 2 . Therefore, for all n greater
than equal to n naught, we have expected value
00:50:19.359 --> 00:50:35.450
of T n minus theta is less than epsilon by
2, also we have probability modulus of T n
00:50:35.450 --> 00:50:48.000
minus expected value of T n less than epsilon
by 2 is greater than equal to 1 minus eta
00:50:48.000 --> 00:50:56.000
.
Now, modulus of T n minus theta is equal to
00:50:56.000 --> 00:51:08.100
modulus of T n minus expected value of T n
plus expected value of T n minus theta, which
00:51:08.100 --> 00:51:19.349
is less than equal to modulus of T n minus
expected value of T n plus modulus of expected
00:51:19.349 --> 00:51:31.829
value of T n minus theta . This is less than
epsilon by 2 for all n greater than equal
00:51:31.829 --> 00:51:47.369
to n naught . This probability
less than epsilon by 2 is greater than 1 minus
00:51:47.369 --> 00:51:54.730
eta for all n greater than equal to n naught
.
00:51:54.730 --> 00:52:06.240
Therefore, from these two, we can see that
probability modulus of T n minus theta less
00:52:06.240 --> 00:52:20.190
than epsilon is greater than 1 minus eta for
all n greater than equal to n naught . So,
00:52:20.190 --> 00:52:33.180
these proves the sufficiency of the condition
that expected value of T n has to go to theta,
00:52:33.180 --> 00:52:43.180
and variance of T n has to go to 0, as n goes
to infinity. Then the T n is going to be a
00:52:43.180 --> 00:52:51.800
consistent estimator for the parameter theta
.
00:52:51.800 --> 00:53:22.660
Suppose, a coin is tossed n times, we have
seen T n is equal to x 1 plus x n by n is
00:53:22.660 --> 00:53:41.069
unbiased for p . And variance of T n is equal
to 1 by n square into variance of x 1 plus
00:53:41.069 --> 00:53:53.910
x 2 plus x n is equal to 1 by n square into
n p q . This we know, because it is a binomial
00:53:53.910 --> 00:54:02.050
random variable, its variance has to be n
by n p q, therefore this is equal to p q by
00:54:02.050 --> 00:54:19.089
n . Therefore, this goes to 0, as n goes to
infinity . Therefore, this sample mean is
00:54:19.089 --> 00:54:32.740
not only unbiased, it is consistent for estimating
p. Ok students, I stop here. In the next class,
00:54:32.740 --> 00:54:48.570
I shall examine some more properties of an
estimator.
00:54:48.570 --> 00:55:02.250
Thank you.