WEBVTT
Kind: captions
Language: en
00:00:18.050 --> 00:00:29.769
Welcome students to the MOOC's lecture on
Statistical Inference. This is lecture number
00:00:29.769 --> 00:00:41.150
19. As I said at the end of the last lecture
that today what I am going to start is called
00:00:41.150 --> 00:00:54.579
Testing of Hypothesis.
At the very beginning I said that parametric
00:00:54.579 --> 00:01:20.060
inference has two forms; theory of estimation
and testing of hypothesis .
00:01:20.060 --> 00:01:28.110
It is parametric means we have an idea of
the distribution. Only thing we want to know
00:01:28.110 --> 00:01:42.351
from the sample that what are the possible
values of the parameter of the distribution.
00:01:42.351 --> 00:01:52.090
In theory of estimation, we have seen we try
to obtain those values either asone specific
00:01:52.090 --> 00:02:00.460
value which we call point estimation where
we have learnt the method of moments and method
00:02:00.460 --> 00:02:09.200
of maximum likelihood and also, we have learnt
how to estimate a confidence interval, so
00:02:09.200 --> 00:02:20.480
that we are confident which gives us the probability
that the parameter of the distribution will
00:02:20.480 --> 00:02:29.660
lie within this interval with very high probability
say 98 percent or 99 percent at least.
00:02:29.660 --> 00:02:38.190
In this class, we have dealt with problems
associated with these two probabilities. Testing
00:02:38.190 --> 00:03:07.260
of hypothesis is slightly different here . We
do not estimate the value rather we come up
00:03:07.260 --> 00:03:29.780
with a hypothesis and check whether the sample
gives
00:03:29.780 --> 00:03:51.680
enough evidence
that the hypothesis can be accepted or perhaps
00:03:51.680 --> 00:04:03.750
it can be rejected also if the sample does
not give enough evidence in support of the
00:04:03.750 --> 00:04:21.900
acceptance.
So, our statistical hypothesis is a statement
00:04:21.900 --> 00:04:54.210
related to some characteristics of the population
under study or alternatively we can say that
00:04:54.210 --> 00:05:17.300
hypothesis is a statement
about the probability
00:05:17.300 --> 00:05:46.949
distribution
characterizing a population which we want
00:05:46.949 --> 00:06:09.639
to verify
on the basis of the information available
00:06:09.639 --> 00:06:29.509
from a sample.
So, let me explain this. Suppose there is
00:06:29.509 --> 00:06:45.129
a coin and our hypothesis is that it is probability
of obtaining a head is 0.5 we want to verify
00:06:45.129 --> 00:06:56.270
that. So, what we do? We toss the coin certain
number of times and we have already decided
00:06:56.270 --> 00:07:04.400
that if the number of heads shows certain
property, then we are going to accept the
00:07:04.400 --> 00:07:13.659
hypothesis that indeed that coin has the probability
of a head or probability of a success to be
00:07:13.659 --> 00:07:23.550
0.5 and otherwise, you are going to say that
the coin does not have the probability of
00:07:23.550 --> 00:07:43.589
getting a head with 0.5.
So, for example, suppose the coin is tossed
00:07:43.589 --> 00:08:06.729
100 times and we obtain 45 heads , we are
more likely
00:08:06.729 --> 00:08:37.380
to accept that the coin has a probability
of success is equal to 0.5 . On the other
00:08:37.380 --> 00:08:59.819
hand, suppose the number of head is say 20
are we likely to accept
00:08:59.819 --> 00:09:17.990
that probability of a head is equal to 0.5
very unlikely. Similarly, if number of head
00:09:17.990 --> 00:09:41.230
is equal to say 75, still we are not likely
to accept the hypothesis that probability
00:09:41.230 --> 00:09:55.990
of head is equal to 0.5 . So, the testing
of hypothesis is all about designing the scheme
00:09:55.990 --> 00:10:05.220
such that based on the sample evidence, we
can accept the hypothesis or we reject the
00:10:05.220 --> 00:10:24.990
hypothesis .
So, typically we consider two types of hypothesis
00:10:24.990 --> 00:10:47.730
.
Simple hypothesis is said to be simple
00:10:47.730 --> 00:11:09.920
if when it is accepted the underlined distribution
00:11:09.920 --> 00:11:33.360
is completely known .
Example Bernoulli p and hypothesis is p is
00:11:33.360 --> 00:11:51.509
equal to 0.45 or more generally p is equal
to p 0 which is fixed , so that if we accept
00:11:51.509 --> 00:12:09.439
that p is equal to p 0, then we know that
distribution completely . Similarly say poisson
00:12:09.439 --> 00:12:17.180
with mu and our hypothesis is mu is equal
to mu naught, where mu naught is some fixed
00:12:17.180 --> 00:12:26.189
value and if the sample evidence establishes
that mu is equal to mu naught can be accepted,
00:12:26.189 --> 00:12:40.579
then we know the distribution completely . When
you are looking at normal mu, sigma square
00:12:40.579 --> 00:12:49.269
hypothesis like mu is equal to mu naught and
sigma square is equal to sigma naught square.
00:12:49.269 --> 00:13:01.529
So, these are all simple hypothesis, so that
if the hypothesis is accepted, then we know
00:13:01.529 --> 00:13:21.000
the distribution completely . On the other
hand, suppose we consider Bernoulli p and
00:13:21.000 --> 00:13:36.610
our hypothesis is p is greater than equal
to 0.55 or if it is a Poisson distribution
00:13:36.610 --> 00:13:50.339
with mu and our hypothesis is mu is less than
equal to 2 and normal mu, sigma square and
00:13:50.339 --> 00:13:58.790
we can have hypothesis like mu is equal to
mu naught sigma square is greater than equal
00:13:58.790 --> 00:14:11.819
to 4 or suppose we have mu less than equal
to say 2 and sigma square less than or equal
00:14:11.819 --> 00:14:24.319
to 10 or we can have something like mu not
equal to 5 and sigma square is equal to 10
00:14:24.319 --> 00:14:35.139
say in all these cases, you can see that even
if the hypothesis is accepted, we do not get
00:14:35.139 --> 00:14:43.380
the distribution completely because there
will be a family of distributions each of
00:14:43.380 --> 00:14:51.670
which will actually fall into this category
satisfying this hypothesis.
00:14:51.670 --> 00:15:05.500
So, such hypothesis is called composite
hypothesis and a test of a statistical hypothesis
00:15:05.500 --> 00:15:54.390
is a two way decision making problem . So,
on the basis of this obtained sample, one
00:15:54.390 --> 00:16:34.449
decides whether the hypothesis will be accepted
or rejected . So, to proceed further the hypothesis
00:16:34.449 --> 00:17:04.650
that we test for possible acceptance that
is for which we seek support from the sample
00:17:04.650 --> 00:17:26.570
is called null hypothesis .
00:17:26.570 --> 00:17:39.010
It is denoted by H naught
00:17:39.010 --> 00:18:11.740
and the name given by R.A. Fisher.
Typically the acceptance or rejection
00:18:11.740 --> 00:18:26.260
of the null hypothesis
00:18:26.260 --> 00:18:57.400
depends upon against which other hypothesis
it is being tested .
00:18:57.400 --> 00:19:42.370
For example, suppose we get 45 heads in 100
tosses of a coin . We want to test that P
00:19:42.370 --> 00:20:08.931
is equal to probability of head is equal to
0.5 , then if the alternative is probability
00:20:08.931 --> 00:20:35.210
of head is equal to 0.45, then we may reject
the null hypothesis. This is the null hypothesis
00:20:35.210 --> 00:21:05.660
H naught in favor of this hypothesis .
On the other hand if H1 is that probability
00:21:05.660 --> 00:21:32.360
of H is equal to 0.6, then we may accept H
naught against the above H1. Therefore, in
00:21:32.360 --> 00:21:39.870
testing of hypothesis it is not only the null
hypothesis or H naught, one has to see what
00:21:39.870 --> 00:21:46.040
is the alternative hypothesis because the
acceptance and rejection will depend upon
00:21:46.040 --> 00:22:09.150
the alternative hypothesis as well.
Note that the role of H naught and H1 are
00:22:09.150 --> 00:22:21.060
not symmetric. Our focus is on H naught and
we want to see if H naught is accepted or
00:22:21.060 --> 00:22:30.720
rejected when tested against H1, that is the
focus of testing of hypothesis and second
00:22:30.720 --> 00:22:56.750
point is that typically
one should try that both H naught and H1 are
00:22:56.750 --> 00:23:19.540
simple. If that is not possible , the
next better option is
00:23:19.540 --> 00:23:37.740
H naught is simple and H1 is composite
because if H naught is composite , then by
00:23:37.740 --> 00:23:46.610
accepting that we do not really learn much
about the characteristic of the population
00:23:46.610 --> 00:24:03.220
question .
How to achieve this ? So, the whole purpose
00:24:03.220 --> 00:24:26.120
of testing of hypothesis
is to divide
00:24:26.120 --> 00:24:58.920
the total sample space into two parts . One
is called the critical region
00:24:58.920 --> 00:25:12.960
or rejection region
and we will denote it by
00:25:12.960 --> 00:25:29.280
W and the other one is the acceptance region
that is . We choose an appropriate statistics
00:25:29.280 --> 00:25:57.850
T x 1 x 2 x n and consider its distribution
.
00:25:57.850 --> 00:26:12.060
For example ,
consider
00:26:12.060 --> 00:26:47.220
the test of the hypothesis H naught P is equal
to 0.5 and the experiment designed is to toss
00:26:47.220 --> 00:27:03.620
the coin 1000 times and count the number of
heads .
00:27:03.620 --> 00:27:35.450
Therefore, T is equal to number of H in an
obtained sample of 1000 tosses and we decide
00:27:35.450 --> 00:27:51.530
that the null hypothesis
that is H naught P is equal to 0.5 will be
00:27:51.530 --> 00:28:19.660
accepted if T is greater than or equal to
400 against H1 that let us assume that a simple
00:28:19.660 --> 00:28:41.490
alternative that P is equal to 0.35.
What is the sample space ? Sample space is
00:28:41.490 --> 00:29:02.630
2 to the power 1000 points of strings made
of 0 and 1 because each toss ends up in either
00:29:02.630 --> 00:29:09.970
0 or 1 depending upon whether it is a tail
or it is the head. So, 1000 process means
00:29:09.970 --> 00:29:28.800
2 to the power 1000 points and suppose this
is my W, that means all those streams
00:29:28.800 --> 00:29:48.860
having number of head is less than equal to
400. So, it is less than 400. So, if the number
00:29:48.860 --> 00:29:57.130
of heads obtained is less than 100, we are
going to reject the null hypothesis or if
00:29:57.130 --> 00:30:10.580
the number of heads is greater than or equal
to 400, then we are going to accept the null
00:30:10.580 --> 00:30:19.690
hypothesis.
So, this is called W complement. So, what
00:30:19.690 --> 00:30:42.570
is our T? T is equal to sum of values in the
obtained sample . What is the rejection criteria?
00:30:42.570 --> 00:30:50.540
The rejection criteria is that the number
has to be less than 400, otherwise you are
00:30:50.540 --> 00:30:58.730
going to accept it . Therefore, we shall reject
the null hypothesis .
00:30:58.730 --> 00:31:56.710
Now, any decision any decision making process
, we will have two possible errors associated
00:31:56.710 --> 00:32:36.560
with it . What are these ? The hypothesis
is correct , but we have rejected it . For
00:32:36.560 --> 00:32:55.720
example, P is equal to 0.5 with respect to
the above example of tossing the coin 1000
00:32:55.720 --> 00:33:28.970
times and suppose the number of heads obtained
is 395 , then we are going to reject
00:33:28.970 --> 00:33:45.110
H naught even if P is equal to 0.5 is true
.
00:33:45.110 --> 00:34:24.490
The above type of error is called type I error
that is rejecting a hypothesis
00:34:24.490 --> 00:35:01.440
when it is true . The other type of error
is called type II error that is accepting
00:35:01.440 --> 00:35:20.330
the null hypothesis
when it is false .
00:35:20.330 --> 00:36:00.520
For example suppose the actual value of P
is 0.4 , but because of the sample obtained
00:36:00.520 --> 00:36:22.830
that has say 410 heads, we accept the null
hypothesis
00:36:22.830 --> 00:36:33.720
H naught P is equal to 0.5 because that is
our acceptance and rejection criterion although
00:36:33.720 --> 00:36:56.710
the null hypothesis is false.
So, you get these type of table accept H naught
00:36:56.710 --> 00:37:18.940
reject H naught . These are two actions when
H naught is true H naught is false. So, when
00:37:18.940 --> 00:37:29.450
H naught is true and we have accepted that
, then we are making a correct decision. When
00:37:29.450 --> 00:37:36.920
H naught is false and you are rejecting H
naught , we are making another correct decision
00:37:36.920 --> 00:37:49.000
, but H naught is true, but we are rejecting
we are committing type I error. If H naught
00:37:49.000 --> 00:37:58.020
is false, but we are accepting, then we are
committing a type II error .
00:37:58.020 --> 00:38:25.490
Ideally
we like to minimize both the errors . Why?
00:38:25.490 --> 00:38:41.600
Because the acceptance region and the rejection
region are decided beforehand. So, suppose
00:38:41.600 --> 00:39:02.380
this is my sample space , this is the rejection
region, this is the acceptance region .
00:39:02.380 --> 00:39:10.400
We want to reduce the extent of type I error,
then what we will do is, we will make the
00:39:10.400 --> 00:39:19.480
rejection region smaller, so that probability
of rejecting a null hypothesis when it is
00:39:19.480 --> 00:39:35.440
true is less say for example, I was talking
about that rejection region
00:39:35.440 --> 00:39:48.700
is that T less than 400 in order to reduce
type I error. Suppose we make it new rejection
00:39:48.700 --> 00:40:10.310
criteria
that T is less than 375, that means if number
00:40:10.310 --> 00:40:25.810
of heads is greater than 375, we are going
to accept that the coin is having 0.5 probability
00:40:25.810 --> 00:40:39.680
of getting a head. What is the effect ? The
effect is that
00:40:39.680 --> 00:40:57.700
we are increasing the size of acceptance region
, right. As you can see now the acceptance
00:40:57.700 --> 00:41:09.470
region is bigger .
So, when the null hypothesis is not correct
00:41:09.470 --> 00:41:17.880
or actual value, if value of P is not equal
to 0.5, now we have higher chance of accepting
00:41:17.880 --> 00:41:29.200
the null hypothesis say for example, if the
number of heads is 390. In earlier cases we
00:41:29.200 --> 00:41:37.289
would have rejected it, but now we are going
to accept it . I hope the concept is clear
00:41:37.289 --> 00:41:43.140
.
Therefore, what it suggests that if you want
00:41:43.140 --> 00:41:51.310
to reduce the type I error, we are going to
increase the probability of type II error
00:41:51.310 --> 00:42:20.990
and vice versa . We denote the probability
of type I error by alpha and probability of
00:42:20.990 --> 00:42:54.720
type II error as beta ideally both alpha and
beta are to be reduced simultaneously, but
00:42:54.720 --> 00:42:58.400
as we have just seen that is very difficult
.
00:42:58.400 --> 00:43:25.590
So, what we do? We have to come to a compromise
. What is the compromise? The compromise is
00:43:25.590 --> 00:43:50.210
we want to put a bound on alpha say 5 percent.
That means, that we have to design the rejection
00:43:50.210 --> 00:44:21.040
in such a way that the rejection region
in such a way that the probability of type
00:44:21.040 --> 00:44:34.250
I error is less than or equal to 5 percent
. That means, only in 5 percent of cases even
00:44:34.250 --> 00:44:46.460
if the null hypothesis is true, we are going
to reject that null hypothesis not more than
00:44:46.460 --> 00:45:03.100
5 percent of cases .
Now, given the size alpha
00:45:03.100 --> 00:45:33.030
there can be many rejection regions say this
is one rejection region of size alph, suppose
00:45:33.030 --> 00:45:44.460
this is another rejection region of size alpha
and suppose this is another rejection region
00:45:44.460 --> 00:45:56.170
of size alpha. The question is which one of
these we should consider for ultimate decision
00:45:56.170 --> 00:46:09.190
making . So, all 3 are alpha and which one
we should take, that is the question.
00:46:09.190 --> 00:46:31.810
Here comes the role of type II error . Of
all the different
00:46:31.810 --> 00:47:08.460
rejection regions of size alpha, we choose
the one that has the least probability of
00:47:08.460 --> 00:47:38.340
committing type II error that is for all W
such that size of W is less than or equal
00:47:38.340 --> 00:47:45.340
to alpha that is the size of the critical
region is less than equal to alpha that is
00:47:45.340 --> 00:47:50.390
the probability of committing type I error
is less than equal to alpha.
00:47:50.390 --> 00:48:26.530
We choose W0 such that probability of type
II error for W naught is minimum . Over all
00:48:26.530 --> 00:48:48.970
W such that size of W is less than or equal
to alpha or in other words, we want to minimize
00:48:48.970 --> 00:49:22.020
beta among all W's of size less than or equal
to alpha . In other words , we want to maximize
00:49:22.020 --> 00:49:54.400
1 minus beta among all the W's such that size
of W is less than equal to alpha . Hence 1
00:49:54.400 --> 00:50:33.500
minus beta associated with a critical region
is called the power of the test and our aim
00:50:33.500 --> 00:51:06.180
is to maximize the power while maintaining
the bound on type I error .
00:51:06.180 --> 00:51:25.181
Before proceeding any further, I give you
an example . Consider uniform 0 theta . Suppose
00:51:25.181 --> 00:51:48.230
H naught is theta is equal to 1.5 and H1 is
theta is equal to 2.0 and we take a single
00:51:48.230 --> 00:52:05.380
sample
x 1 . So, what is the situation ?
00:52:05.380 --> 00:52:15.700
We take a value from uniform 0 theta. Our
aim is whether to accept that theta is equal
00:52:15.700 --> 00:52:33.030
to 1.5 or we reject it in favor of theta is
equal to 2.0 . Obviously, if the sample falls
00:52:33.030 --> 00:52:58.840
in the interval 1.5 to 2.0 , we are going
to reject naught because under H naught,
00:52:58.840 --> 00:53:13.670
the theta is 1.5, but what about some value
say 1 .
00:53:13.670 --> 00:53:48.710
We can obtain this sample under both H naught
and H1 . So, we decide as follows . Accept
00:53:48.710 --> 00:54:10.600
H naught if x 1 is less than 1.0 and reject
H naught if x 1 is greater than 1.0 that is
00:54:10.600 --> 00:54:31.890
if this is 1, this is 1.5 and this is 2. So,
this is my rejection region
00:54:31.890 --> 00:54:44.210
and this is the acceptance region . So, what
are the probabilities of the errors?
00:54:44.210 --> 00:55:05.260
So, probability of type I error is equal to
rejecting H naught when it is true is equal
00:55:05.260 --> 00:55:20.930
to say probability probability of getting
a value
00:55:20.930 --> 00:55:38.380
greater than 1 when theta is equal to 1.5
is equal to 0.5 because the value of x will
00:55:38.380 --> 00:55:46.340
lie between 1 to 1.5 upon 1.5 is equal to
1 by 3 .
00:55:46.340 --> 00:55:59.800
So, this is my alpha . What is the probability
of type II error ? It is equal to beta is
00:55:59.800 --> 00:56:17.920
equal to probability of accepting H naught
when H1 is true that is probability 0 less
00:56:17.920 --> 00:56:29.770
than x less than 1 when theta is equal to
2 is equal to 1 upon 2 is equal to half .
00:56:29.770 --> 00:56:43.099
So, I hope the concept of type I error and
type II error is well understood. In the next
00:56:43.099 --> 00:56:50.640
classes, I shall do some problems on testing
of hypothesis and also, I will conclude my
00:56:50.640 --> 00:56:58.380
lecture by focusing on an important theorem
with respect to testing of hypothesis which
00:56:58.380 --> 00:57:18.480
is called Neyman Pearson Lemma. Ok students
thank you so much. See you in the next class.
00:57:18.480 --> 00:57:22.349
Thank you.