WEBVTT
00:17.250 --> 00:23.070
Hello and welcome to this lecture,thisisbasicallythird
lecture of this module.Andin this lecture,
00:23.070 --> 00:29.890
we will be discussing about hypothesis testing.You
know that in the last lecture, we actually
00:29.890 --> 00:36.890
continued thathypothesistesting, we started
that one.So, in this lecture, we will continue
00:38.370 --> 00:45.370
the samethe hypothesis regardingone mean orconcerning
one mean, so one mean means here, we are having
00:45.620 --> 00:52.620
one sample.So, we will do sometesting, basically,
this hypothesis testing are drawing some inferenceprobabilistically
00:55.649 --> 01:01.899
from thesample mean with respect to thepopulation
mean,those we discussin this last lecture.
01:01.899 --> 01:08.820
So, our outline for todayâ€™slecture is hypothesisconcerning
one mean, hypothesis concerning two means,
01:08.820 --> 01:14.289
hypothesis concerning one variance and hypothesis
concerning two variance.And if time permits,we
01:14.289 --> 01:21.289
will also seethe probability paper that we
aretalking aboutthatonce the data is available,
01:21.810 --> 01:26.880
generally so far what we have done?We have
assumed thatit is following some particular
01:26.880 --> 01:33.220
distribution.So,this probability paper is
basicallythatfirst step of for a graphical
01:33.220 --> 01:37.799
inspectionto test thatwhat the data possibly
couldfollow.
01:37.799 --> 01:44.799
So, this is basically the test of this sample
data that what is thedistribution that population
01:44.830 --> 01:50.009
may follow, so that if the time permits after
this hypothesis testing,we will do that one
01:50.009 --> 01:50.860
also.
01:50.860 --> 01:57.860
Well, so, hypothesis concerning one mean,let
us consider a population with mean mu a sample
01:58.759 --> 02:04.590
size n is selected randomlyfrom the population.It
is to be tested whether this mu is equals
02:04.590 --> 02:11.180
to mu naught or not, so this mu naught is
basically one threshold value that weneed
02:11.180 --> 02:18.180
to test whether from the sample whatever.The
mean that we are estimating from thisestimation
02:19.790 --> 02:26.060
technique that we discussearlier, whether
that is having some relationshipso that whether
02:26.060 --> 02:31.120
probabilistically, we can infersomething about
it some specific value.
02:31.120 --> 02:38.120
So, that specific value is your mu naught,what
is ourtest,what is our goal to testthis population
02:41.000 --> 02:47.280
mean.So, here the test, sothat from our last
lecture, we have to define some test statistics,
02:47.280 --> 02:52.510
so here the statistics is that Z equals to
X bar minus mu naught divided by sigma by
02:52.510 --> 02:59.510
squareroot n.This sigma by squareroot n thatis
thestandard deviation for the sampling distributionof
03:00.940 --> 03:07.739
this mean and this X bar is your mean and
thismu naught is the valuewhich we are expecting
03:07.739 --> 03:09.599
that populationmean might be this mu naught.
03:09.599 --> 03:15.390
So, hereyou know that this is a reduce standard
variate now, so this z isa random variable
03:15.390 --> 03:22.159
which is following the standard normal distribution.And
the critical regionsfor testing this single
03:22.159 --> 03:27.900
mean, and again, here theassumption is that
the population is normal and it is a large
03:27.900 --> 03:32.709
sample.This large means, here you can say
that approximately if thesample size is more
03:32.709 --> 03:38.330
than 30, we can consider that this is a large
sample.So, this critical region at the significance
03:38.330 --> 03:43.920
level alpha, and the discussion on the significance
level also we have covered in this last lecture.
03:43.920 --> 03:49.099
So,if it there are two possible case it can
be one sided or it can be two sided; for this
03:49.099 --> 03:54.790
one sided, it can be the left hand side or
the right hand side, and reject the null hypothesis
03:54.790 --> 04:01.409
if this Z is less than minus Z alpha.In this
case, when the alternative hypothesisis mu
04:01.409 --> 04:06.700
is greater than mu naught, then Z is greater
thanZ alpha.And if it isa two sidedtesting,mu
04:06.700 --> 04:13.700
is not equal to mu naught then, whether Z
isless thanminus Z alpha by 2 or Z is greater
04:15.500 --> 04:21.620
thanZ alpha by 2 andthat this Z alpha by 2
is the quintile for which thatremaining probability
04:21.620 --> 04:28.620
is youralpha by t 2 or the right hand side
of this one that we have discussed while discussing
04:29.240 --> 04:31.250
that critical zone.
04:31.250 --> 04:38.250
So, if that test statistics falls in thiscritical
zone then, we have to reject the null hypothesis.So,
04:39.870 --> 04:46.870
let us consider a normal population with the
mean mu,the population variance sigma square
04:49.330 --> 04:54.600
is unknown a small sample n is selected randomly
from thispopulation.
04:54.600 --> 04:59.979
Now, in this case, what we are doing so far,what
we are telling that, if thissample is large,
04:59.979 --> 05:05.300
nowwe are discussing if the sample is small.And
as I was telling that, if the sample size
05:05.300 --> 05:12.300
is less than 30 then, we canwe have to consider
that this is a small sample.And here,if thatthe
05:13.080 --> 05:17.639
test statistics for this null hypothesis and
the alternative hypothesis, if we consider
05:17.639 --> 05:22.740
is to be this central one or in fact, it is
not only for the central, it can be one sided
05:22.740 --> 05:27.199
also.
So, this quantity now,which we were just estimating
05:27.199 --> 05:34.199
as a Zvalue that we have seen, now this s
is replacedhas replaced that sigma.So, sigma
05:35.460 --> 05:39.259
by squareroot n that sigma was a standard
deviation for the population.Here, it is a
05:39.259 --> 05:44.389
small sample, so we do not, and thevariance
is unknown - population variance is unknown
05:44.389 --> 05:47.150
- so we have to estimate again, this one from
the sample.
05:47.150 --> 05:52.860
So, this s is basically the standard deviation
estimated for this sample and so this quintile
05:52.860 --> 05:59.860
now is a is arandom variable having t distribution
with n minus one degrees of freedom.And for
06:01.240 --> 06:07.100
this case, the critical regions will be for
this one sidedtest, it will bet less than
06:07.100 --> 06:13.060
equal to minus t alpha, and if it is right
hand side then, it is t greater than t alpha
06:13.060 --> 06:17.630
and for the central test, it is less than
minus t alpha by 2 or greater than t alpha
06:17.630 --> 06:21.259
by 2.
So, now, you can see that for the large sample,
06:21.259 --> 06:26.610
we are approximatingit with the standard normal
distribution and if it is ansmall sample,
06:26.610 --> 06:31.900
we have the approximated to thet distribution
with the degrees of freedom n minus 1, where
06:31.900 --> 06:34.770
n is thesample size.
06:34.770 --> 06:41.770
Well, take up one problem here,the problem
states the specificationsfor a certain construction
06:42.009 --> 06:47.400
project require the steel rods having a mean
breaking strength of 4000kg per centimeter
06:47.400 --> 06:54.400
square.If6 steel specimen selected at random
have a mean breaking strength of3750 kg per
06:56.530 --> 07:01.949
centimeter square with a standard deviation
of 200 kg per centimeter square,test whether
07:01.949 --> 07:08.949
the mean breaking strength is below 4000 kgper
centimeter square at the 0.01 level of significance.Assume
07:10.550 --> 07:16.050
that the population is distribution normal.
So, you see that one small sample, which issample
07:16.050 --> 07:23.050
size is 6 and it is showing that mean strength
is3750, which numerically if we test that
07:23.639 --> 07:30.580
it is below this4000 kg per centimeter square,
but this differencewhether should we consider
07:30.580 --> 07:37.090
that whatever thespecimens that is being supplied
for that project is really less than thespecification,
07:37.090 --> 07:42.039
so that statistically we have toinfer about
that.
07:42.039 --> 07:48.740
So, here thesample size is small and this
population variance is unknown and the sampleis
07:48.740 --> 07:53.050
drawn from a normal population.So, the test
statistics as we have discuss so far is this,
07:53.050 --> 07:58.940
t equals to X bar minus mu naught divided
by sby squareroot of n, which is following
07:58.940 --> 08:05.940
a t distribution with n minus 1,n is equals
to 6 years, so 5 degrees of freedom.So, our
08:06.069 --> 08:10.800
null hypothesis is, whether this is greater
than or equal to this4000 kg per centimeter
08:10.800 --> 08:16.199
square which is ourspecification.So, we have
to test that whether, it is below that our
08:16.199 --> 08:20.520
specification, so that is why in the alternative
hypothesis, we put that mu is less than 4000
08:20.520 --> 08:24.430
kg per centimeter square and this alpha is
equals to 1.01.
08:24.430 --> 08:31.430
So, the criteria here is that asthis depends
on this, what is the level of significance,
08:33.699 --> 08:40.650
so this t 0 alpha and this is an one sided
test.So, this t 0 alpha for 5 degrees of freedom
08:40.650 --> 08:47.650
from thestandardtable from the standard text
book if you see, it is 3.365.So,that test
08:49.649 --> 08:55.220
statistics if it is greater than this then,
we have to reject the null hypothesis.If we
08:55.220 --> 09:02.220
calculate this one then, we see that,so this
is t is less than minus 3.365.So, this test
09:05.950 --> 09:11.930
statistics now from the data that is availableX
bar is 3750 and s equals to 200, we are getting
09:11.930 --> 09:17.209
that it is minus 3.06.
So, as this minus 3.06 is greater than this
09:17.209 --> 09:24.209
minus3.365, so the null hypothesis cannot
berejected at the significance level of 0.01.So,
09:26.870 --> 09:33.870
we cannot claim, we fail toprove that this
mean isreally less than the specification
09:34.250 --> 09:39.990
which is 4000 kg per centimeter square even
though the mean that we have seen for this
09:39.990 --> 09:42.930
6 specimen is 3750.
09:42.930 --> 09:49.930
Well, now,we will take that hypothesis concerning
two means, now this two means there are two
09:51.200 --> 09:57.089
samples are there and two samplemeans are
there.So, we can test whethermany questions
09:57.089 --> 10:03.060
you can ask that whether both thesample is
having the same mean.Even though that so far
10:03.060 --> 10:08.880
as the samplemeans should not be exactly same,
that we can test and then, we can test that
10:08.880 --> 10:12.790
whether the one is really greater than the
other one, that also we can test.
10:12.790 --> 10:17.490
So, here what we aretaking is that basically
the difference between this two means, that
10:17.490 --> 10:22.380
is,X1 is the mean for the first population
and the X2 is the mean for the second population.So,
10:22.380 --> 10:29.380
if we take this difference then the mean,
basically, now it is becoming a singlerandom
10:30.290 --> 10:35.310
variable with that delta value.Now, if I put
that delta equals to 0; that means, both the
10:35.310 --> 10:39.820
means are affectively what we are testing
whether the means are same or not ordepending
10:39.820 --> 10:45.230
on what value of this delta that we are selecting,depending
on the whether we are testing that whether
10:45.230 --> 10:50.850
X1,the mean for the first population is greater
than the second population and so on.
10:50.850 --> 10:57.850
And this sigmathisdenominator sigma X bar
minusX2 baris the standard deviation for thisdifference
11:01.390 --> 11:08.279
andthat this Z is now the random variable
which is following again the standard normaldistribution,
11:08.279 --> 11:13.959
in case that this both this n 1 and n 2 which
is the sample size are large enough. You can
11:13.959 --> 11:20.149
say thatthe if it is greater than 30 then,
we can say that it is a standard normal distribution.
11:20.149 --> 11:27.149
Once we get this one we can inferabout this,about
their sample means, so thatteststatistics
11:28.550 --> 11:34.519
here can be the this one, this X1 bar minus
X2 bar minus delta and that standard deviation
11:34.519 --> 11:40.029
that we aregetting for this difference is
basically a sigma 1square by n 1 plus sigma
11:40.029 --> 11:45.550
2square by n 2 and their whole squarerootwhich
is giving you this estimate for thisstandard
11:45.550 --> 11:49.910
deviation for this difference.
And for the large sample that I was telling
11:49.910 --> 11:56.690
that if they are greater thanthis30, then
also we can say that this can be easily replace
11:56.690 --> 12:00.990
by their samplestandard deviation, which is
again that standard normal distributionwe
12:00.990 --> 12:02.640
can say.
12:02.640 --> 12:08.040
This is for this large sampleas we haveshown
and then after this one we will also see that
12:08.040 --> 12:13.839
if it is the small sample. Now, so far as
this is large sample, this will follow a standard
12:13.839 --> 12:18.100
normal distribution and the critical regions
are now same for this,this are the different
12:18.100 --> 12:23.370
alternative hypothesis whether it isâ€¦whether.Now,
this case is basically showingthat whether
12:23.370 --> 12:30.300
this mu 1 isless than mu 2 and this is showing
that whether mu 1 is greater thanmu 2 and
12:30.300 --> 12:36.570
this is a two sided test whether mu 1 isnot
equal to mu 2.
12:36.570 --> 12:41.889
So, these are the critical regions as samethat
we discuss earlier for this single samplemeanfewslides
12:41.889 --> 12:48.889
before.And the second thing is thefirst,we
will take one problem on this if the sample
12:50.010 --> 12:56.450
size is large.So, asample n 1 is equals to
40 rods made of the certain alloy steel has
12:56.450 --> 13:03.209
mean strength of 4400 kg per centimeter square
with a standard deviation of450 kg per centimeter
13:03.209 --> 13:10.209
square.The another sample n 2 equals to 40
row rods havingthis different kind of alloy
13:10.339 --> 13:17.100
steel which is having a mean strength of 4200
kg per centimeter square with a standard deviation
13:17.100 --> 13:22.310
of 650 kg percentimeter square.
Can it be claimed that the first kind of alloy
13:22.310 --> 13:27.839
is superior in terms of this strength at 0.05
level of significance. So, numerically if
13:27.839 --> 13:34.389
I just take this mean then, I can say that
this 4400 is greater than 4200,but if we see
13:34.389 --> 13:39.380
their spread their standard deviation,so this
is 450 and here it is 650.So, the statistically
13:39.380 --> 13:46.380
we have to test whether really that firstalloy
steel is superior than the second one.
13:47.410 --> 13:53.510
So, at the null hypothesis is that whether
mu 1 minus mu 2 is less than0, here we place
13:53.510 --> 13:58.540
is delta equals to 0.And this alternative
hypothesis basically what we are testing is
13:58.540 --> 14:02.389
this, mu 1 minus mu 2 greater than 0;that
means, we are testing whether mu 1 is greater
14:02.389 --> 14:08.649
than mu 2 or not and the significance level
is 0.05.The test statistics the criteria here
14:08.649 --> 14:15.649
it should be at 0.05, you knowthat this Z
quintile is 1.645 at 95 percentprobability
14:16.160 --> 14:21.070
level, for this standard normal distribution
the value is 1.645.
14:21.070 --> 14:28.070
So, this is ourcritical value, and greater
than this value if we get that z then, we
14:29.800 --> 14:36.800
should reject thenullhypothesis,so this is
the value of this Z at 0.05.So, this Z now,
14:38.089 --> 14:44.430
if we just use thosethe sample statistics
and we will get that is 1.6, so which is 1.6
14:44.430 --> 14:50.660
is less than 1.645, so the null hypothesis
cannot be rejected at 0.05 level of significance.
14:50.660 --> 14:55.459
So, we cannot say that whether the sample
one is really superior than the sample two,
14:55.459 --> 15:00.320
because we have seen that we failed to reject
thenull hypothesis.As I mention earlier also
15:00.320 --> 15:06.870
we should not say that null hypothesis is
accepted,we generally we should say that either
15:06.870 --> 15:12.920
the null hypothesis isrejected or it cannot
be rejected with the sample data, because
15:12.920 --> 15:15.449
we have to take then larger sample and forfurther
test.
15:15.449 --> 15:22.449
Well, so, regarding this two sample means,
again,ifeither of these twosample the sample
15:25.769 --> 15:30.930
size is less than 30 then, this should be
consider as asmall sample and this small sample
15:30.930 --> 15:35.420
as we have seen in thissinglemean also it
follows a t distribution.
15:35.420 --> 15:42.420
So, here, the statistics is your X1 bar minus
X2 bar minus delta and this sigma hat X1 bar
15:43.310 --> 15:50.310
minus X2 hat, this is a estimate of thisdifferencebetween
two mean standard deviation.So, which is athis
15:51.490 --> 15:57.570
This test statistics is a random variable
having a t distribution with n 1 plus n 2
15:57.570 --> 16:04.570
minus 2 degrees offreedom.Andthis estimate
for this standard deviationis a squareroot
16:05.130 --> 16:10.389
of an estimate of this one, so this variance
is sigma 1square by n 1 plus sigma 2square
16:10.389 --> 16:14.610
n 2 like this.
So, if this population variance is known and
16:14.610 --> 16:21.610
if it is not known then, it should be estimated
by the pooled estimator, it is called thepooled
16:23.380 --> 16:30.380
standard deviationorpooled variance Spsquare,which
is,you will get from this one.So,X1 minus
16:31.139 --> 16:37.970
X1 bar square plus the summation of X2 minus
X2 bar square divided by n 1 plus n 2 by 2,
16:37.970 --> 16:43.310
which is basically that n 1 minus 1 times
of this standard variance of this first sample
16:43.310 --> 16:50.310
and n 2 minus 1 times the variance of second
sample divided by n 1 plus n 2 by 2, where
16:50.519 --> 16:57.139
these two quantity as the sum of the square
deviation from the mean of the respective
16:57.139 --> 16:59.029
samples.
16:59.029 --> 17:06.029
So, if we substitute the estimate of this
sigma square in the expression of thisstandard
17:07.070 --> 17:12.260
this variance then, we get that this t is
equals to X1 bar minus X2 bar minus delta
17:12.260 --> 17:18.579
divided by S t multiplied by squareroot1 by
n 1 plus 1 by n 2, which is a random variable
17:18.579 --> 17:25.579
having t distribution with n 1 plus n 2 minus
2 degrees of freedom.And this S pcan be obtained
17:25.750 --> 17:32.750
as this S p square equals to n 1 minus1 into
S1square plus n 2 minus 1 into S2square divided
17:32.980 --> 17:37.710
by n 1 plus n 2 minus 2.
17:37.710 --> 17:44.260
And the criticalregions are these for that
teststatistics, for thisleft sided test, that
17:44.260 --> 17:51.260
is, whether the mu 1 isless than mu 2, then
t whether the t is less than minus t alpha
17:52.480 --> 17:57.880
then this case t is greater than t alpha and
two sided test whether t is less than minus
17:57.880 --> 18:02.440
t alpha by 2 or t is greater than plust alpha
by 2.
18:02.440 --> 18:09.440
So, will take one problem on thissmall sample
test islike this,that therandom sample readings
18:14.490 --> 18:21.490
of the concentration of the pollutant in the
water at two locations A and B are as follows.So,
18:23.600 --> 18:30.600
there are two locations one is that location
A and other one is this locationB; and for
18:32.890 --> 18:39.570
this location A we are having 5 samples, and
for this location B we are having 6samples.So,
18:39.570 --> 18:45.059
this problem is taken because, you see that
it is not required that we should have this
18:45.059 --> 18:51.320
same length of this data, so that n 1 and
n 2 need not be same.
18:51.320 --> 18:58.320
So, for the sample one we are having5 samples
and for thesample two we are having 6 samples.So,
18:59.490 --> 19:06.490
usethe 0.01 level of significanceto test whether
the difference between the means of the concentration
19:08.429 --> 19:14.490
at two locations issignificant or not.
19:14.490 --> 19:21.490
So, then what we have to do?We have to first,weformulate
that hypothesis then, the null hypothesis
19:22.940 --> 19:29.080
is that, whether the difference is significant
or not.So, this is a two sided test and the
19:29.080 --> 19:34.990
null hypothesis, we put that mu 1 minus mu
2 is equals to0, because we are trying to
19:34.990 --> 19:41.820
test whether these are different or not.So,
mu 1 minus mu 2 is not equal to 0 that is
19:41.820 --> 19:48.820
what ourtest goal, so this we have put in
this alternativehypothesis and this mu 1 minus
19:52.169 --> 19:59.169
mu 2 equals to 0 is your null hypothesis and
here the level of significance alpha is your0.1.
20:01.679 --> 20:08.679
The criteria forrejection of the null hypothesis
is that if this t is less than minus 3.25
20:09.820 --> 20:16.820
or t is greater than3.25.So, these two values
are basically the statistics value at the
20:21.780 --> 20:28.780
significance level 0.01 and for thet distribution
having the degrees of freedom equals tothissample
20:30.220 --> 20:36.789
lengths,say,this is your 5, this is your 6,
so 5 plus 6 equals to 11 minus 2 so 9.So,
20:36.789 --> 20:43.789
with the degrees offreedom 9at significance
level 0.01, we get these two values that 3.25
20:45.630 --> 20:52.630
and 3.minus3.25 and 3.25.
So, if ourtest statistics is fallingin this
20:53.120 --> 21:00.120
region, then we shouldreject the null hypothesis.And
the teststatistics thatis, that X1 bar minus
21:00.700 --> 21:07.700
X2 barminus delta divided by S p square root
of 1 by n 1 plus1 by n 2.So, this X1 bar is
21:12.250 --> 21:16.909
the sample mean for the first sample,X2 bar
is the sample mean for this second sample
21:16.909 --> 21:23.270
that we can get whatever the data that we
are having.And this S p, that is, the pooled
21:23.270 --> 21:30.039
variance,sorry, pooledstandard deviation we
should also calculate whatever we have discussed
21:30.039 --> 21:30.419
so far.
21:30.419 --> 21:35.919
So, here, are the calculations, this X1 bar
which is your mean for the first sample which
21:35.919 --> 21:42.919
is 8.23 and mean for the second sample is
your 7.94,variance for this first sample is
21:46.770 --> 21:53.770
0.01575 and variance for this second sample
is 0.01092. So, thissample estimates we have
21:55.000 --> 22:01.570
already learned, so we can apply that one
for whatever the data is given and we will
22:01.570 --> 22:07.720
get thissample estimates.
Now, this pooled variance is that n 1 minus
22:07.720 --> 22:14.150
1 times thisstandard deviation of this first
sample, sorry,variance of this first sample
22:14.150 --> 22:20.740
multiplied byplus that n 2 minus 1 times of
this variance of second sample divided by
22:20.740 --> 22:25.950
n 1 plus n 2 minus 2.So,we will just use this
one, so this first sample asthat there are
22:25.950 --> 22:32.610
5 samples that is 4 multiplied by it is variance.And
then second sample size is 6, so 6 minus 1is5
22:32.610 --> 22:39.610
times, it is variance divided by 5 plus 6minus
2 which is 9, so we will get that 0.01306,
22:41.340 --> 22:48.340
so the S p here is equals to your 0.1143.
Hence, the t isequal toX1 bar minus X2 bar
22:50.870 --> 22:57.870
minus delta divided by S t square root of
1 by n 1 plus 1 byn2, so if we put this one,we
22:57.970 --> 23:02.000
will get the statistics as 4.19.
23:02.000 --> 23:09.000
Now, this 4.19 which is greater than your
3.25, it is falling in thisright side of their
23:09.470 --> 23:16.470
distribution and which is obviously greater
than the criticalvalue.So, the null hypothesismust
23:16.990 --> 23:23.990
be rejected at this0.01 level of significance,
we can say so, in this null you can say that
23:25.710 --> 23:32.710
it wasthere that this both this sample means
aresame.So, we shouldmust reject this null
23:33.210 --> 23:40.210
hypothesis; that means, we can infer that
themean from both the locations are not same.
23:42.679 --> 23:49.679
Well, so far we have seen that what is thetest
for theone mean and two means and now we will
23:58.090 --> 24:03.640
see thatthevariance.First,we will start with
this one variance and thus then we will go
24:03.640 --> 24:10.640
for the second variance.And let usconsider
a sample size ndrawn from a normal population
24:11.120 --> 24:18.120
withvariancesigma square and this null hypothesis
is h naught equals to sigma square is equal
24:18.750 --> 24:25.120
tosigma naught square.
And the alternative hypothesis is H1can be,
24:25.120 --> 24:30.049
say that less thansigma 2square, this is greater
than sigma naught square and this is naught
24:30.049 --> 24:34.529
equal to sigma 2square, so anything this can
happen so.And depending on this null hypothesis
24:34.529 --> 24:38.610
also will change, so this is basically for
this alternative hypothesis when this is naught
24:38.610 --> 24:44.679
equal to this.
And in ourlast lecture, when we aredoing thissampling
24:44.679 --> 24:51.679
distribution also, we have seen that this
variance,that is, n minus 1 where n is the
24:52.429 --> 24:58.380
sample size that times they are variance divided
by sigma naught square this ratio is basically
24:58.380 --> 25:04.200
following a chi square distribution. So, this
chi square, so this statistics we can use
25:04.200 --> 25:11.200
to testfor the single samplevariance, so this
is whether one variance and this sigma naught
25:12.059 --> 25:18.000
is your somethreshold value for this which
we are inferring from this population.And
25:18.000 --> 25:24.980
this quantity, this statistics is followingwhich
is having a chi square distribution with n
25:24.980 --> 25:27.659
minus 1 degrees of freedom.
25:27.659 --> 25:34.659
Now, the critical region for testing this
one sample variance andhere theassumption
25:37.970 --> 25:41.950
is that, this population from where the sample
is drawn is having a normal distribution.At
25:41.950 --> 25:48.679
this significance level alpha, if it is one
sided test and if it is left sided test then,whether
25:48.679 --> 25:53.950
the chi square the statistics that we have
discuss so far, that is, n minus 1 time variance
25:53.950 --> 25:57.860
divided by sample variance divided by this
population variance, that statistics if it
25:57.860 --> 26:04.860
is less than thischi square valueat1 minus
alphacumulative distribution,if it is less
26:10.870 --> 26:16.260
than that then, we should reject the null
hypothesis.Andif it is for this right side
26:16.260 --> 26:23.260
test, if it is greater than thatchi square
alpha then,we will reject this null hypothesis.And
26:23.480 --> 26:29.470
for the two sided test, if it is less than
1 minus chi square 1 minus alpha by 2 or greater
26:29.470 --> 26:34.309
than chi squarealpha by 2, then we should
reject this null hypothesis.
26:34.309 --> 26:40.809
If we just see quickly thatthis typical shape
of 1 chi square distribution it looks like
26:40.809 --> 26:47.809
this, so this is not symmetrical.So, here
what we are referring to that this left side,
26:48.080 --> 26:54.080
so this area is basically what we are telling,
so this value what we are referring to as
26:54.080 --> 27:01.080
this chi square 1 minus alpha and the same
area for this thatwhen we are doing for this
27:02.010 --> 27:09.010
right handside test, suppose, that this is
yourthis area if is your what is that?Alpha,
27:10.120 --> 27:16.760
then this value what we are saying that is
the chi square alpha.So, here the chi square
27:16.760 --> 27:21.929
1 minus alpha means, the total area below
this curve is one, so this full area apart
27:21.929 --> 27:25.480
from this red one, this starting from the
white area as well as this shaded area is
27:25.480 --> 27:29.860
your 1 minus alpha.
And these are for thisone sided test, now
27:29.860 --> 27:35.860
at this significance level of this alpha if
we go for this two sided test then, obviously
27:35.860 --> 27:42.270
we are testing that 1 minus alpha by 2 which
is obviously will belittle smaller thanthis
27:42.270 --> 27:49.270
1.So, this 1 and this will be somewhat greater
thanhere, so these two values which is your
27:51.850 --> 27:58.850
chi square 1 minus alpha by 2 and this is
your chi square alpha by 2.
27:59.880 --> 28:04.940
And thisarea, this now,what I am now, I am
shedding the blue area here plus the blue
28:04.940 --> 28:11.940
area here should be equal to alpha.And as
this chi square distribution is not symmetrical,so
28:13.500 --> 28:20.340
here these twovalues that we are looking for
this critical value will not besymmetrical,
28:20.340 --> 28:27.340
as we have seen so far in case of thatstandard
normal distribution and as well asin the t
28:29.080 --> 28:34.809
distribution.So, these are the critical regions
for those for the one sample for the chi square
28:34.809 --> 28:38.679
distribution forone sample variance.
28:38.679 --> 28:45.679
Well, take one example here,the maximum permitted
population standard deviation sigma naught
28:46.659 --> 28:53.659
in the strength of a concrete cube is 5 kilo
newton per meter square,so this is our requirement.Use
28:53.700 --> 28:59.960
that 0.05 level of significance to test whether
the population standard deviation is greater
28:59.960 --> 29:06.470
than thisthreshold value, that is,5 kilo newton
per meter square, if the strength of 15 randomly
29:06.470 --> 29:11.950
selected concrete cube cured under a certain
process have a standard deviation equals to
29:11.950 --> 29:18.950
6.4 kilo newton per meter square.So, we have
taken randomly 15 samples and that sample
29:19.179 --> 29:25.309
standard deviation is 6.4, if we just compare
this6.4 and 5kilo newton per meter square,
29:25.309 --> 29:26.880
this is obviously greater than that.
29:26.880 --> 29:33.880
Now, through this hypothesis testing for thisvariance,
one variancewe have to say that whetherprobabilistically
29:35.080 --> 29:38.980
we have to infer, whether this value is really
greater than what is ourthatrequirement.So,
29:38.980 --> 29:45.980
our null hypothesis is that whether this sigma
is less than equal to this 5 kilo Newton per
29:49.330 --> 29:55.919
meter square, because we are targeting to
test whether thusthat populationstandard deviation
29:55.919 --> 30:00.520
from which the sample is taken is really greater
than 5 kilo Newton per meter square or not.
30:00.520 --> 30:07.520
So, the level of significance here is that
0.05, the criteria perrejection of this null
30:08.520 --> 30:15.520
hypothesis isthis chi square is greater than
23.685.Now, where from we got this23.685 is
30:18.909 --> 30:25.909
the value for the alpha 0.05, where this chi
square distribution having a degrees of freedom
30:26.380 --> 30:29.559
equals to 15 minus 1 equals to 14.
30:29.559 --> 30:35.809
So, now, if we see that, if I say here that,
this is your the chi square distribution with
30:35.809 --> 30:42.809
14 degrees of freedom then, that this chi
square alpha that we are saying that, if we
30:45.000 --> 30:52.000
just say that this red area is your 0.05 then,
this value is your 23.685, which you can get
30:56.820 --> 31:02.320
from anystandard text book, these tables are
this chi square tables aregiven here. So,
31:02.320 --> 31:07.240
if my test statistics falls in this one,that
this is your critical zone, so if the test
31:07.240 --> 31:11.990
statistics fall in this zone, then we have
to reject thenull hypothesis, and if it is
31:11.990 --> 31:18.149
belowthis one, we cannot reject the null hypothesis
with the sample data that is availablewith
31:18.149 --> 31:19.080
us.
31:19.080 --> 31:26.080
So, this is why this 23.685 is ourcritical
value, and if wecalculate this teststatistics
31:30.710 --> 31:37.710
thenwith this sample data, it is now n equals
to 15 minus 1 and 6.4and divided by this 5
31:38.019 --> 31:43.880
square, then we get that22.94 which is definitely
less than that our critical value.So, null
31:43.880 --> 31:48.590
hypothesis cannot be rejected at 0.05 level
of significance; that means, we cannot say
31:48.590 --> 31:55.590
that whatever the sample data that we are
having from that we cannot infer thatthepopulationstandard
31:55.750 --> 32:02.140
deviation is greater than ours,what is our
target; that is,what is ourrequirement 5 kilo
32:02.140 --> 32:04.480
newton per meter square it is not greater
than that.
32:04.480 --> 32:11.480
Even though we have seen that numerically,
the samplestandard deviation issomething 6.4which
32:12.950 --> 32:18.360
is aclearly greater than this 5, but probabilistically
we cannot infer that.
32:18.360 --> 32:25.360
Well, now,we will take that hypothesis concerning
two variances;let usconsidertwo sample size
32:30.919 --> 32:37.919
of n 1 and n 2 drawn from two normal populations,
so this is thatrequirement that whatever sample
32:38.220 --> 32:45.220
we aredrawing it is from thenormal population
with their variancessigma 1 square and sigma
32:46.470 --> 32:51.320
2square.
So, here, the null hypothesis iscan be like
32:51.320 --> 32:58.320
this, so sigma 1 square is equals whether
they are equal to or1 is less than or1 is
32:58.529 --> 33:05.529
less than other like that.So,here the test
statistics are depending on what is your alternative
33:05.649 --> 33:09.370
hypothesis, whether the sigma 1 square is
less than sigma 2 square or sigma 1 square
33:09.370 --> 33:13.850
is greater than sigma 2 square or they are
not equal.Depending on that the test statistics
33:13.850 --> 33:20.850
is S2 square by S1 square or f S1 square by
s 2 square or f is equal to S m square by
33:22.149 --> 33:28.159
s small m square, where the numerator is the
larger sample variance.
33:28.159 --> 33:33.169
So, you see, for the first test, when we are
saying that this sigma 1 testing, we are trying
33:33.169 --> 33:37.539
to test that whether sigma 1 square is less
than sigma 2 square then, we have to test
33:37.539 --> 33:43.039
that the sample variance for the second sample
divided by sample variance of the first sample,
33:43.039 --> 33:48.690
this statistics we should use.
On the other hand, we should use that sigmathe
33:48.690 --> 33:52.260
variance for the first sample divided by variance
for the second sample, if the first sample
33:52.260 --> 33:58.559
is greater than thesecondone.And if we are
testing the two sided one then, we have to
33:58.559 --> 34:05.559
find out which one is thegreater.So, that
greater value should be in the numerator and
34:05.690 --> 34:12.690
this f is having aF distribution withn minus
1 and n minus 2, so this is basically, thatthis
34:16.140 --> 34:23.140
nwhen we arewriting here,this isthere is something
typing mistake here so this the first one
34:24.609 --> 34:30.109
is for the numerator.
So,whatever in this statistics whatever is
34:30.109 --> 34:34.690
there in the numerator that should be your
first degrees of freedom and whatever is there
34:34.690 --> 34:39.909
in the denominator, that should be your second
degrees of freedom.Now, if I just want to
34:39.909 --> 34:46.909
take all these thing in a singlestatement,
then whatever we are testing always the F
34:47.429 --> 34:52.159
statistics that we are calculating, it should
be always we have to keep in mind that that
34:52.159 --> 34:57.640
should be always greater than 1; that means,
in the numerator always we are putting the
34:57.640 --> 35:02.780
greater value and this numeratorand thedenominator
we are putting the smaller value.
35:02.780 --> 35:09.780
So, if we can ensure thatthen we can say that
f statistics is having a F distribution which
35:12.030 --> 35:19.030
is following which is following the F distribution
with the first degrees of freedom is thelarger
35:21.109 --> 35:26.510
sample size minus 1 and second degrees of
freedom is the smaller sample size minus 1,
35:26.510 --> 35:28.170
this is not 2, this is also 1.
35:28.170 --> 35:35.160
So, these two degrees of freedom with that
is aF distribution,so there are again the
35:35.160 --> 35:42.160
3 differentcases and there are 3 different
teststatistics.And if we are justtaking in
35:47.050 --> 35:51.660
this way whetherwith the first one is less
than the second one or first one is greater
35:51.660 --> 35:58.010
than the second one or they are not equal
then,here you candepending on that you have
35:58.010 --> 36:02.690
to find out what is your critical region here.
And on the other hand, if you just want toadjust
36:02.690 --> 36:08.000
what I just now mention that, if you just
simply want to use that which one is yourlarger
36:08.000 --> 36:14.690
variance, which one is your smaller variance,
then always for all thesecases you can calculate
36:14.690 --> 36:21.690
these Fstatistics, which is following thatF
distribution having this n m minus 1 degrees
36:23.060 --> 36:28.180
of first degrees of freedom and second degrees
of freedom is n small m minus 1, where n capital
36:28.180 --> 36:34.420
subscript capital m is the larger sample size
and n subscript small m is a smaller sample
36:34.420 --> 36:35.869
size.
36:35.869 --> 36:42.869
Well, take up one example here,it is proposed
to determine whether there is less variability
36:44.089 --> 36:50.410
in the strength of concrete cubes cured under
process one, than those cured under process
36:50.410 --> 36:56.040
two.So, there are two different process is
identified and we have to test whether the
36:56.040 --> 37:03.040
strength of concrete in one process is different
from theother process orhere we are testing
37:03.619 --> 37:07.750
that whether in this process one, what we
are getting is the strength is less than this
37:07.750 --> 37:13.170
process two, before I can declare thatwe should
follow the process two for curing the concrete,so
37:13.170 --> 37:20.170
that we can achieve that greater strength.
So, sample data istaken, so like that there
37:20.290 --> 37:26.460
are 12 randomly selected cubes are there,
under two processes are tested, it is found
37:26.460 --> 37:33.460
that this s 1 is 3.5 kilo newton per meter
square and s 2 is 6.2 kilo newton per meter
37:34.660 --> 37:39.480
square.Test the null hypothesiswhether thereare
seen against that alternative hypothesis,whether
37:39.480 --> 37:46.480
the first one is lesser than the second one
at the level of significance 0.105.Here, the
37:48.790 --> 37:53.849
null hypothesis is this and alternative hypothesis
whether the sigma 1 is less than sigma 2 square
37:53.849 --> 37:56.820
level of significance is 0.05.
37:56.820 --> 38:01.490
And this criteria for rejection of this null
hypothesis whether if thestatistics is greater
38:01.490 --> 38:08.490
than 2.42,so how we are getting this one.So,this
is the f distribution having the degrees of
38:11.750 --> 38:16.900
freedom of the sample size, that is, in both
the cases the sample size is 12 here, so 12
38:16.900 --> 38:21.730
minus 1, so 11; first degrees of freedom is
11, second degrees of freedom is also 11.
38:21.730 --> 38:28.730
So, where this F is equals to s 2 squareby
s 1; s 2 square bys 1 square and you have
38:35.670 --> 38:42.550
seen that this s 2 is your 6.2 and s 1 is
3.5, so s 2 is greater than that, so if we
38:42.550 --> 38:49.550
use this teststatistics s 2 square by s 1
square, so it isthat anyway here in this case
38:50.890 --> 38:54.060
both the sample sizes are same.
38:54.060 --> 39:01.060
Here, the test statistics counts to be 3.14
as this 3.14 isgreater than2.82, so the null
39:03.540 --> 39:10.540
hypothesis must be rejected at 0.05 level
ofsignificance.
39:13.070 --> 39:19.390
So far in thishypothesis testing what we have
seen that, we have firstfound out in the earlier
39:19.390 --> 39:26.390
lecture, what are the sampling distribution
of thosetestof thosethe that estimated value
39:27.890 --> 39:33.550
the sample estimation.Andin this hypothesis
testing, we are basically trying to infer
39:33.550 --> 39:38.470
something aboutthe population from which the
sample is wrong.Suppose, that we are having
39:38.470 --> 39:45.470
thissomesample of size n and we estimate what
is it is mean and if we want to infer something
39:45.650 --> 39:52.040
from that sample estimateand we want to infer
something regarding the populationthen, we
39:52.040 --> 39:57.109
have seen that, we have tested for thissingle
mean,we have tested for the two means and
39:57.109 --> 40:01.140
again, we have taken the single variance,
we have taken the two variances.
40:01.140 --> 40:06.920
Like this, we can test and there are several,
we have even usedifferent examples from this
40:06.920 --> 40:12.150
civil engineering.There should be somemany
exampleslike this, where you can use this
40:12.150 --> 40:18.970
theory to infer something about their population,
because always the sample that we get is limited.So,
40:18.970 --> 40:25.970
before we caninfer something about their population,
we have to use this testproperly, so that
40:26.810 --> 40:31.410
we canjudge something.
Even though we get some numericalvalues from
40:31.410 --> 40:37.230
the sample, just by this comparison of this
numerical value can show you something else,
40:37.230 --> 40:43.180
but whether that is really significant, because
always you take one sample and take the another
40:43.180 --> 40:48.170
sample from the same population, the statistics
may not be same then, mean for both the samples
40:48.170 --> 40:52.890
may not be exactly same.
So, those differences whether they are really
40:52.890 --> 40:56.770
significant from the statistical point of
view or not to test that one whatever the
40:56.770 --> 41:02.440
discussion we have done so far, with respect
to the single mean,two means, single variance,
41:02.440 --> 41:09.440
two variances that we have to follow.
Now,we will take some time to spend on that,because
41:11.510 --> 41:18.510
we have seen thatin many times I mention thatlook
at this data isfollowing that distribution.Now,
41:20.400 --> 41:27.400
we are having thatdata,now with the data how
can we say that this data is coming from a
41:28.900 --> 41:33.619
from a population which is following this
distribution.So, there are basically two things,
41:33.619 --> 41:37.420
one is that first we will do some graphical
approach and there are some statistical test
41:37.420 --> 41:44.420
also for these twoinfers that whether, this
is really following thatpopulation from which
41:45.880 --> 41:49.690
the sample is drawn is really following a
particular distribution or not.
41:49.690 --> 41:56.690
So,we will start that one and the first thing
is thegraphicalrepresentation which may bewhich
41:59.349 --> 42:03.960
we generally test through thisprobabilitypaper.So,
the construction of this probability paper
42:03.960 --> 42:09.390
and the testing that how, whether it is really
following or not graphically that will see
42:09.390 --> 42:12.880
now.
So, the empirical determination of the probability
42:12.880 --> 42:17.290
distribution of a random variable, so in many
real life scenario the actual probability
42:17.290 --> 42:22.780
distribution of a random process is unknown.So,
on the basis of the frequency distribution
42:22.780 --> 42:26.819
of the sample data that we are having, so
determine from this observed data which is
42:26.819 --> 42:33.240
the sampleavailable to ussome probability
distribution may be assumed empirically.
42:33.240 --> 42:39.010
Probability papers are useful to check the
assumption of a particular probability distribution
42:39.010 --> 42:43.490
of a random variable.Say that, I have a sample
I am saying thatthis sample is taken from
42:43.490 --> 42:48.050
a population which is distributed normally.Now,
we have to use a normal probability paper
42:48.050 --> 42:55.050
and plot that data and then,we will just discuss
how we can infer thator the visually how we
42:55.640 --> 43:00.680
can inspect that whether it is really following
the normal distribution or not.And again,we
43:00.680 --> 43:05.310
will take somestatistical test to probabilistically
infer whether really that is following the
43:05.310 --> 43:07.220
distribution or not.
43:07.220 --> 43:14.220
So, a probability paper is aspecially constructed
plotting paper, where the one of the axis,
43:16.480 --> 43:20.660
where the random variable is plotted is an
arithmetic axis and the probability axis is
43:20.660 --> 43:26.930
distorted in such a way that the cumulative
probability distribution of the random variable
43:26.930 --> 43:33.930
plots appears as a straight line.So,there
will be one appears as a straight line, so
43:34.829 --> 43:39.780
for the CDF- that cumulative distribution
function - of different probability distribution
43:39.780 --> 43:42.730
to plot as a straight line separate probability
papers are needed.
43:42.730 --> 43:47.339
So, if I want to test that whether it is following
a normal distribution then, I have to use
43:47.339 --> 43:52.020
a normal probability paper, and if I want
to test whether it is following a exponential
43:52.020 --> 43:59.020
distribution, so we have to use the differentpaper.Now,
how these papers areconstructedand how it
44:00.450 --> 44:02.680
is tested we will see now.
44:02.680 --> 44:09.680
So, first,we are taking this normal probabilitypaper
which is most widely used so to test that
44:10.079 --> 44:17.079
whether thesample data belongs to a population
which isnormally distributed.So, thisnormal
44:18.890 --> 44:23.420
probability paper is constructed on the basis
of standard normal probability distribution
44:23.420 --> 44:30.420
function. The random variable X is represented
on the horizontal or sometime in some casesvertical
44:30.930 --> 44:37.930
axis also, but mostlythis random variable
is generally represented on thein the horizontal
44:38.500 --> 44:45.500
axis and that axis is a arithmetic scale.The
vertical axis or horizontal, if I justreverse
44:46.280 --> 44:51.700
this one as I was telling that if this X is
on the vertical end, this one will be the
44:51.700 --> 44:57.230
horizontal otherwise in most of the cases
it isvertical axis.So, the vertical axis represents
44:57.230 --> 45:04.160
the two scales, the standard normal variateZ
equals to X minus mu by sigma and the cumulative
45:04.160 --> 45:09.480
probability values Fx ranging from 0 to 1.
45:09.480 --> 45:16.480
Now,before I go further, if I just seeit graphically
how these things, how the concept is taken,
45:17.750 --> 45:24.750
you know thatif this is your thatXthis is
the axis for your the random variable and
45:25.430 --> 45:29.910
if this is your that cumulative probability
axis which is that FX for this specific value
45:29.910 --> 45:34.980
x,which is a cumulative distribution, you
have seen earliertowards the beginning of
45:34.980 --> 45:40.800
this course.This distribution is generally
looks like this, so which is asymptotic to
45:40.800 --> 45:47.800
0 at this minus infinity and which is asymptotic
to one atplus infinity.And we have also seen
45:48.400 --> 45:53.940
that it is from this, if it is a standardnormal
distribution then, it is from this minus 3
45:53.940 --> 45:59.650
to plus 3 almost, thatmost of this probability
isexhausted here.
45:59.650 --> 46:04.460
Now, if I justtake that this is your, say,
that for this standard normaldistribution,
46:04.460 --> 46:11.460
if we just see,so here the mean is coming
approximately say 0, and this is a minus 3and
46:12.349 --> 46:18.750
this is a something plus 3 or in somecomputer
application, sometimes even we can go that
46:18.750 --> 46:24.380
minus 5 to plus 5 almost very closed one probability
is exhausted there.
46:24.380 --> 46:31.210
So, what we generally want to do is that,basically,
whatever the sample data that we are havingnow
46:31.210 --> 46:36.400
if I just watchnow,that this is also that
arithmetic scale and this is also the arithmetic
46:36.400 --> 46:41.619
scale.Now, if you plot that data, whatever
the data that we are havingwith respect to
46:41.619 --> 46:45.450
their cumulative probability, how to get the
cumulative probability,we will discuss.
46:45.450 --> 46:50.640
So, if we can plot that one and if it follow
approximately thisline then, we can infer
46:50.640 --> 46:55.819
that yet it is following this particular distribution,
but by that I inspection, it is very difficult
46:55.819 --> 47:00.950
to say whether this particular shape is following
or not.Rather, we can say, if we can distortthis
47:00.950 --> 47:07.800
axis in such a way that this will appear as
a straight line then inspecting by I that
47:07.800 --> 47:11.829
whether, this is following a straight line
or not that is easier than comparing that
47:11.829 --> 47:15.400
whether it is following this particularcurve
linear path or not.
47:15.400 --> 47:22.400
So, to make this axis distorted what we generally
do is that, we generally take it straight
47:22.560 --> 47:29.230
line between this.And now, let us create one
new axis here such that, suppose, that this
47:29.230 --> 47:36.230
I am starting from 0.25, so I am starting
from thisline going up to this point and then,
47:36.900 --> 47:43.050
I am going to this straight line, and from
this one I am just giving the name, giving
47:43.050 --> 47:47.980
the number as 0.25.
Similarly, suppose, that I am starting from
47:47.980 --> 47:54.690
0.75 here,I will go to this one, first come
to this straight line and go here, so that
47:54.690 --> 48:01.690
I write that 0.75.Similarly,from 0.05,0.1,0.15
like this all thesepoints whatever is there,
48:04.109 --> 48:09.970
if I just go there and distort this axis,
basically, this axis I am just squeezing some
48:09.970 --> 48:15.119
part expanding some part, in such a way that
this red curve linier path is becoming stretching
48:15.119 --> 48:20.620
to a straight line.
And in that axis - in that distorted axis
48:20.620 --> 48:26.200
- if I plot whatever thedata that we are having
that if it appears that straight line, then
48:26.200 --> 48:31.359
we can conclude that it is following thenormal
distribution.Now, with this x axis keeping
48:31.359 --> 48:37.910
in the sameaxis and this distorted probability
axis, whatever the paper that we get that
48:37.910 --> 48:39.540
is known as this probability paper.
48:39.540 --> 48:43.680
Now, this example, that I have shown this
is for the normal distribution and this similarly,this
48:43.680 --> 48:49.059
can be done for this other distributions as
well.Now, second question is,how we will get
48:49.059 --> 48:53.750
what is their cumulative probability?For that
there are different plotting position formula
48:53.750 --> 48:59.680
is available;for example,the California gave
that m by N,Hazen gave the 2 m minus 1 by
48:59.680 --> 49:05.170
2 N,Weibull gave theformula m by N plus 1.
Where thisN is the total number of observation
49:05.170 --> 49:10.760
that is the sample size and m is the rank
of the data point when the observed values
49:10.760 --> 49:16.920
are arranged in ascending order.So,thisplotting
from this plotting position, generally, this
49:16.920 --> 49:23.920
Weibull plotting position is mostly used,using
this one we get this cumulative probability
49:29.520 --> 49:33.690
distribution, andwe get that probabilityvalue
and then, we plot it on this different probability
49:33.690 --> 49:37.550
paper.So, we can first plot it on this normal
probability paper and check whether it is
49:37.550 --> 49:42.250
coming to be the straight line or not and
other probability papers also from what it
49:42.250 --> 49:48.040
is following this straight line.We should
conclude that this the population of this
49:48.040 --> 49:52.030
sample isfollowing that particular distribution.
49:52.030 --> 49:58.490
Well, take one example here, the observe strength
of 30 concrete cubes is given below means,
49:58.490 --> 50:02.010
in this table next slide, checkwhether the
strength of the concrete cube follows the
50:02.010 --> 50:06.690
normal distribution or not, by plotting on
this normal probability paper.
50:06.690 --> 50:12.260
So, these are the strength of this 30 samples
are given here, and after doing this using
50:12.260 --> 50:17.630
that Weibull formula for their plotting position
we get all those we just arrange it in this
50:17.630 --> 50:23.390
ascending order and get their respective values
and then, we plot it here.Now, you see here,
50:23.390 --> 50:29.410
this x axis is yourthat actual strength, thepictorial
representation that I have given it is for
50:29.410 --> 50:35.150
thisstandard normal distribution.
Now, this axis can easily be, for this any
50:35.150 --> 50:41.730
axis that and so here that this actual axis
is shown, so what it is the strength range.And
50:41.730 --> 50:48.550
this you can see that this axis is now distorted
to get that what is theirprobability values.Now,
50:48.550 --> 50:55.550
this one is coming, this blue plus sign are
the data and now you have to buy your judgment
50:55.599 --> 51:02.599
you have to test whether this bluelines are
your that this blueplus dots are really following
51:03.430 --> 51:04.849
a straight line or not.
51:04.849 --> 51:10.559
So, this is just by your Iinspection and there
are some test also,statistical test that also
51:10.559 --> 51:17.559
we will see next.Thisis for this normal probability
paper and that the general probability paper
51:20.059 --> 51:25.230
and the probability plot the random variable
X is represented on the horizontal axis in
51:25.230 --> 51:29.690
the arithmetic scale.The vertical axis represents
the probability distribution in such a way,that
51:29.690 --> 51:34.589
if it follows a particular distribution for
which the probability paper is prepared, whether
51:34.589 --> 51:38.980
the probability paper is prepared for the
normal distribution or gamma distribution
51:38.980 --> 51:42.950
or exponential distribution depending on that.If
it appears on that paper as a straight line
51:42.950 --> 51:46.750
then, we can say that this is following that
particular distribution.
51:46.750 --> 51:53.750
Thus, if the plotteddata pointsgive rise to
the straight line on the paper then, the data
51:56.740 --> 52:03.740
points belongs to the particular probability
distribution for which the paperisconstructed.Now,so
52:05.410 --> 52:12.130
this is for this general case, now say that,so
this cumulative distribution that I have shown
52:12.130 --> 52:15.250
ithere,this is that, this is for this normaldistribution.
52:15.250 --> 52:20.960
Now, I can take for this exponential distribution
and this exponential distribution,alsothat
52:20.960 --> 52:27.450
will be if I just take, so this exponential
distribution, the cumulative distribution
52:27.450 --> 52:34.280
looks like this, thatnow, this one also where
this almost is approximately almost equals
52:34.280 --> 52:38.410
to one.So, from there if you have to draw
a straight line and then, you have this probability
52:38.410 --> 52:43.500
axis, you have to distort thus within the
same method, as I havediscussed now.
52:43.500 --> 52:47.460
So, whatever the new axis you will get with
respect to this actual axis thishorizontal
52:47.460 --> 52:54.460
axis of this random variable you willget,
the paper which is for this exponential distribution.And
52:54.970 --> 52:59.550
on that paper if it appears to be the straight
line then, we can say that this is following
52:59.550 --> 53:02.890
theexponential distribution.
53:02.890 --> 53:09.890
Now,one thing may ask thatNow again, I am
nowgoing back to the problem we havediscussedthat.Now,
53:10.470 --> 53:15.430
this blue stars whether this is really a straight
line or not.So, this is now, whether you can
53:15.430 --> 53:19.720
say just by your Iinspection you say whether
it is really following the straight line or
53:19.720 --> 53:25.520
not.So,that is your, basically, a personal
judgment that whether I can say that it is
53:25.520 --> 53:29.780
following the straight line or sometime we
can say that no it is not following the straight
53:29.780 --> 53:36.780
line.
So, what we need actually?We need one probabilistic
53:37.630 --> 53:44.630
test forwhich, basically, we cansay, we can
infer probabilistically at this significance
53:48.619 --> 53:55.619
level, this particulardata sample ordata sample
fromthe population of thatdata sample is following
53:56.000 --> 54:01.400
that distribution.So, there also we need the
hypothesis testing, thereare different teststatistics
54:01.400 --> 54:08.400
are there,different tests are there, the chi
square test, of test, through that test we
54:09.069 --> 54:16.069
caninfer that whichdistributionthat the population
is following, and that we will take up in
54:17.290 --> 54:19.010
our next lecture; thank you.