WEBVTT
00:18.480 --> 00:22.890
Hello and welcome to this lecture today. We
are starting a new module.
00:22.890 --> 00:29.890
This module is on probability and statistics.
In last couple of modules, we have seen, we
00:31.009 --> 00:36.750
have discussed about different theories related
to probability. Basicallythose, whatever we
00:36.750 --> 00:42.370
have discussed, is associated with the population.
Now, when we talk about the statistics, this
00:42.370 --> 00:49.370
statistics is related to the to few samples
taken from this population.More precisely
00:50.000 --> 00:56.609
speaking, it is some of this random sampling.
So, the the sample that we take that should
00:56.609 --> 01:03.609
be the random sample for from the population.
So, now, as we are using the word the population
01:04.250 --> 01:10.200
sample, so, at the starting of this lecture,
we should know what does this actually means.You
01:10.200 --> 01:16.920
know that when we are talking about some of
this parameters for those distribution, that
01:16.920 --> 01:22.830
we have seen in the last lecture in this last
module, that there are certain parameters
01:22.830 --> 01:29.280
are there which, is associated for some standard
probability distribution, that we have seen.
01:29.280 --> 01:34.410
So, basically we have to estimate those parameters.
Those parameters when we estimate, we should
01:34.410 --> 01:41.410
estimate, we should have some sample data.From
there, we should estimate those those parameters.Now
01:42.009 --> 01:47.690
again, the samples means, when we are taking
the random samples from the population, now
01:47.690 --> 01:54.569
once we change the sample; obviously, that
that estimation of those parameters may also
01:54.569 --> 01:58.700
change.Obviously, it will change because two
samples will never be exactly identical.So,
01:58.700 --> 02:04.340
those those estimations also we will have
some some distribution. So, those are basically
02:04.340 --> 02:10.410
called the sampling distribution. So, basically
our first lecture that we are going to take
02:10.410 --> 02:17.280
in this module is is on this sampling distributions
and the parameter estimation.
02:17.280 --> 02:22.510
So, in this parameter estimation that what
we will do we will do is based on this, some
02:22.510 --> 02:28.930
of this sampling distribution.You know that,
whenever we talk about the mean or standard
02:28.930 --> 02:35.930
deviation or this type of parameters and which
is estimated from the samples, those are also
02:36.450 --> 02:42.530
follow some distribution. So, we will see
what that distribution refers to and then
02:42.530 --> 02:49.290
we can go for its estimation.Now, in the in
the estimation theory also, we can do this
02:49.290 --> 02:55.530
estimation in terms of for two different types.One
is that, that it is the point estimation.The
02:55.530 --> 03:00.249
point estimation means just one value. I can
estimate from whatever information that is
03:00.249 --> 03:06.219
available to me through the random samples.Or,
what I can do is that, I can look for some
03:06.219 --> 03:11.349
of this interval of this of this estimation.
So that, this is the interval, this is the
03:11.349 --> 03:14.769
lower limit,and this is the upper limit of
this particular parameter. So, that it will
03:14.769 --> 03:21.379
be more logical to say or in the in the statistical
sense, that instead of giving a single value
03:21.379 --> 03:27.269
as the estimate, sometimes it is preferable
to give some of this interval.So, the all
03:27.269 --> 03:28.989
these things we will discuss in this lecture.
03:28.989 --> 03:34.370
So, our outline of today’s lecture is that,
first we will discuss about the population
03:34.370 --> 03:39.040
and sample, then the random sampling and the
point estimation.As I was mentioning, thus
03:39.040 --> 03:45.730
the single value, we will just pick up and
we will just estimate from the sample.Then,
03:45.730 --> 03:51.010
properties of this point estimation and there
are two methods for this estimation.One is
03:51.010 --> 03:56.540
that called Method of moments and other one
is Method of maximum likelihood.Then, the
03:56.540 --> 04:03.540
interval estimation of this mean, and theninterval
estimation of the variance, and then estimation
04:04.680 --> 04:09.569
of the proportion.
So, all these things will go, basically this
04:09.569 --> 04:16.569
outlines should not be a should not be a dead
end here. So, it should continue and we should
04:17.650 --> 04:22.800
have this hypothesis testing also.Because,
these are all very well related to whatever
04:22.800 --> 04:27.970
we are going to discuss. So, may be one after
another, we will take all these topics. So,
04:27.970 --> 04:34.970
to discuss about our the statistics, which
is related to each and every point, whatever
04:36.190 --> 04:42.700
we will be discussing. We will see that what
and where these things are useful to handle
04:42.700 --> 04:47.990
in some of the civil engineering problems.
So, that, we will see. So to start with, we
04:47.990 --> 04:52.820
will start that about the concept of this
population and sample.
04:52.820 --> 04:59.820
So, the population is the complete set of
all values representing a particular random
05:00.030 --> 05:06.100
process. So, when we talk about this, all
values which represents a particular random
05:06.100 --> 05:13.100
random process; that means, this all is very
very I should say, that is including of, whatever
05:17.560 --> 05:21.230
the possible range is there, that should include
everything.
05:21.230 --> 05:27.690
So, what I can give the example is that, the
stream flow in a certain stream over the infinite
05:27.690 --> 05:32.570
timeline. So, if we are collecting the time
series of the of the stream flow value, so,
05:32.570 --> 05:38.850
there should not be a finite time over which
we are looking the data. So, it should be
05:38.850 --> 05:43.380
on this infinite timeline of whatever has
happened and what is going to happen, everything.
05:43.380 --> 05:50.380
So, that is what is our our population is.When
we talk about a sample, that sample is any
05:51.580 --> 05:57.660
subset of the entire population. Its corresponding
example for this stream flow that we have
05:57.660 --> 06:03.140
used here is, the stream flow in the stream
over the last 30 years. So, if I just take
06:03.140 --> 06:09.350
over the last 30 years, then that particulardata
set that is available to us is the sample
06:09.350 --> 06:13.960
of this population.
Now, you can see that whatever the probability
06:13.960 --> 06:19.770
theory that we have discussed earlier, is
basically related to its itsentire entire
06:19.770 --> 06:26.770
possible values.That is, all possible values
of that of that random process, but in reality,
06:29.610 --> 06:36.610
we never get, we never have this entire or
whatever whateverall possible values, that
06:38.370 --> 06:45.210
is possible that we may not have.
So, we have to rely on some some sample and
06:45.210 --> 06:50.190
somehow we have to relate whatever we can
estimate from the sample.We have to relate
06:50.190 --> 06:57.190
it to to its population. So, this is basically
the role of this statistics and with this,
06:57.750 --> 07:03.710
we will just see how we can estimate the parameters
how we can estimate the required information
07:03.710 --> 07:10.710
from the samples.Then we will infer something
about its population.Because as I told that,
07:11.580 --> 07:18.580
having the entire population is not possible
in for almost all the cases.
07:21.050 --> 07:26.240
So in now, in sample, I was mentioning that
the random samples, when should we call that
07:26.240 --> 07:33.240
a sample is a random is a random sample. So,
as it is impractical or and or uneconomical
07:33.690 --> 07:38.960
to observe the entire population that I was
mentioning. So, a sample that is a subset
07:38.960 --> 07:45.960
is selected from the population for analysis.A
sample is said to be random sample, when these
07:45.990 --> 07:52.990
two conditions are satisfying.One is that,
it is the representative of the population.
07:54.750 --> 08:01.750
So, whatever the possible cases that we can
see in the population, the sample should have
08:02.840 --> 08:09.840
have that representation in it.Thesecond one
is that, when the probability theory can be
08:10.210 --> 08:15.190
applied to it, to infer the results that pertain
to the entire population.
08:15.190 --> 08:21.380
So, the probability theory as I was telling,
it is basically relates to the population.Now,
08:21.380 --> 08:27.260
if we can apply the probability theory to
the sample to infer the results that pertains
08:27.260 --> 08:33.050
to the population, then we can say that, whatever
the sample we are taking, that is the random
08:33.050 --> 08:39.000
sample. So, these are just the concept and
with this concept, what we can do is that,
08:39.000 --> 08:45.170
we can use whatever we have seen from the
probability theory, we can apply to the to
08:45.170 --> 08:48.709
this random samples.
08:48.709 --> 08:55.310
Now, the random sample for finite and infinite
populations. So, you know that there are certain
08:55.310 --> 09:02.310
cases where the population can be finite,
or in other cases it can be infinite, or sometimes
09:03.129 --> 09:09.230
it can be count ably infinite.Even though
I know that; obviously, there is some maximum
09:09.230 --> 09:14.839
number is there. But we can sometimes; we
can also say that that is count ably infinite.
09:14.839 --> 09:17.730
So, in such cases, what should be the random
sample?
09:17.730 --> 09:24.730
So, first we will take the case of this finite
population.An observation set x 1, x 2, x
09:25.589 --> 09:32.350
3 up to x n is selected from a finite population
of size n. So, that entire size of the population
09:32.350 --> 09:39.350
is n is said to be a random sample, if its
values are such that, each x I of n has the
09:43.379 --> 09:50.379
same probability of being selected.Now, so
as we can see that there is this, the size
09:52.230 --> 09:59.230
of this population is finite that its maximum
number is n. So, this x 1, x 2, x 3,so, we
09:59.689 --> 10:06.689
can say that it is a random sample, if each
of this item have the, so this, each of this
10:07.430 --> 10:14.430
element of this set have the equal probability
of being selected from the population.
10:15.959 --> 10:21.629
The second one, if the population is infinite,
an observation set again that x 1, x 2, x
10:21.629 --> 10:27.699
3, x n is selected from an infinite population
f x.So obviously, when we talk about this
10:27.699 --> 10:32.360
infinite series, we generally denote it through
its probability density function.Here, it
10:32.360 --> 10:39.230
is that f x is said to be a random sample.If
its values are such that, each x I of n have
10:39.230 --> 10:46.230
the same distribution f x and the n random
variables are independent.
10:46.480 --> 10:52.680
So each observation, now the first observation,second
observation,third up to n, these are also;
10:52.680 --> 10:59.339
we can say that these are also a random variables.
So, these variables should have the same distribution,
10:59.339 --> 11:06.339
which is f x and which is same to the distribution
of the population.And all these n elements
11:07.540 --> 11:14.230
of this set are independent to each other,
then we can say that this set is one random
11:14.230 --> 11:19.870
sample.
11:19.870 --> 11:25.939
The classical approach to estimation of this
parameter. So, the classical approach of the
11:25.939 --> 11:32.110
estimation of this of the distribution parameters
is of two types.One is the point estimation
11:32.110 --> 11:37.110
and other one is the interval estimation.
So, this point estimation means a single parameter
11:37.110 --> 11:44.110
value is estimated from the observed dataset
that is from the sample. So, from the sample
11:44.480 --> 11:49.559
a single parameter value is estimated.
Now,we will we will discuss how to estimate
11:49.559 --> 11:55.410
that parameter.That we will see so, but, here
I want to trace the word single. So, when
11:55.410 --> 12:00.040
the single parameter is estimated, that is
known as the point estimation.On the other
12:00.040 --> 12:05.999
hand, it will be the interval estimation,
when a certain interval is determined from
12:05.999 --> 12:12.930
the observed dataset.It can be said with a
definite confidence level that the parameter
12:12.930 --> 12:18.889
value will lie within that interval.
So, there are now, might be many new things
12:18.889 --> 12:24.269
here.One is that it is the interval. So, this
word, the certain interval when we are predicting,
12:24.269 --> 12:29.410
then it is the interval estimation.Now, when
we say that this is the interval estimation,
12:29.410 --> 12:36.069
then certain confidence level comes. So that,
with this much confidence or with this much
12:36.069 --> 12:41.480
confidence and this confidence are also in
this in the statistical sense. So, sometimees
12:41.480 --> 12:47.259
the general values that we use for this confidence
levels are the 90 percent confidence,95 percent
12:47.259 --> 12:53.619
confidence, 99 percent confidence. So, this
confidence level can vary from the 0 to 1
12:53.619 --> 13:00.230
basically. So, means 0 percent to 100 percent.
So…So, so whenever we say that it is an
13:00.230 --> 13:04.529
interval estimate, we this interval estimations
are always associated with some confidence
13:04.529 --> 13:10.379
level.We will see all these things, how we
can how we candeclare that, this is the confidence
13:10.379 --> 13:17.379
for this this interval. But, one thing is
should be cleared here now is that, this is
13:17.730 --> 13:23.949
this confidence level.If I increase the confidence
level, so that means, as you can read from
13:23.949 --> 13:28.569
this sentence, that it can be said with a
definite confidence level, that the parameter
13:28.569 --> 13:33.480
value will lie within that interval.
Now, suppose that I take two cases; one is
13:33.480 --> 13:39.819
that 95 percent confidence level, other one
is the 99 percent confidence level.Then the,
13:39.819 --> 13:46.819
obviously, the 99 percent confidence level
is more. So that, thethat the chance of the
13:47.839 --> 13:52.759
exact parameter value will lie within that
interval.In case of the 99 percent confidence,
13:52.759 --> 13:58.160
should be more so; obviously, the interval
estimate that we are doing should be more
13:58.160 --> 14:04.050
should be more wider in case of 99 percent
confidence interval, compared to what we can
14:04.050 --> 14:11.050
estimate in case of the 95 percent confidence
interval. So, more the confidence level, wider
14:11.399 --> 14:16.559
is the interval. So, that is what you can
we we can see at least at this test.We will
14:16.559 --> 14:18.910
discuss all these things.
14:18.910 --> 14:25.910
Now, the random sampling and point estimation.
So, first we will take that point estimation
14:26.619 --> 14:32.399
out of these two estimates. So, as the parameters
of the distribution of a population are unknown
14:32.399 --> 14:38.499
and it is not feasible to obtain them by studying
the entire population, hence, the random sample
14:38.499 --> 14:44.660
is generally selected.The parameters of the
distribution that are computed based on the
14:44.660 --> 14:50.509
analysis of sample values are called the estimator
of the parameters.
14:50.509 --> 14:57.360
So, whatever the quantity that we try to estimate
as the estimate of that parameter is known
14:57.360 --> 15:03.470
as the estimator. So, there are two things,
one is the estimation and the other one is
15:03.470 --> 15:09.519
the estimators. So, through what through what
function I can say, through what function
15:09.519 --> 15:13.619
I am trying to estimate that parameter that
is known as the estimator.
15:13.619 --> 15:20.619
Thus parameters correspond to the population;
while estimators corresponds to the sample.
15:21.889 --> 15:26.990
So that, so this is what, when we are talking
about the parameters of a distribution or
15:26.990 --> 15:32.209
of a of a particular probability distribution,
say so, that is generally corresponds to the
15:32.209 --> 15:38.220
population.When we talk about the estimator
that is correspond to the sample. So, if I
15:38.220 --> 15:42.699
take the example of this normal distribution
that we have that we have seen earlier, is
15:42.699 --> 15:46.459
that there are two parameters now.One is the
mu and the other one is the sigma square,
15:46.459 --> 15:49.839
mu is the mean and the sigma square is the
variance.
15:49.839 --> 15:55.399
So that, mu and sigma square, these are the
properties,these are the these are related,
15:55.399 --> 16:00.369
these are associated with the population.Now,
how we will estimate that mean that is mu
16:00.369 --> 16:06.519
and how we will estimate that variance from
a sample. So, those are the estimator and
16:06.519 --> 16:11.009
those are associated with the sample. That
we will see now.
16:11.009 --> 16:17.959
So, when we say that there is a point estimator
and that point estimator is, you know that
16:17.959 --> 16:24.300
should have some properties.This properties,
there are four different properties that a
16:24.300 --> 16:30.319
point estimator should have before we can
use that one.Because, you see that there could
16:30.319 --> 16:37.319
be the many functions that we can use to estimate
that particular parameter. But, so, we have
16:37.499 --> 16:44.499
to take that particular function, which is
a satisfying all these all these properties
16:45.220 --> 16:51.149
or at least the the more maximum number of
this properties should be satisfied by those
16:51.149 --> 16:57.079
estimators.
That these four properties are the unbiasedness,
16:57.079 --> 17:03.050
consistency, efficiency and sufficiency. So,
we will take one by one.The unbiasedness is
17:03.050 --> 17:10.050
the bias of an estimator is equal to the difference
between the estimator’s expected value and
17:10.500 --> 17:17.230
its true value.For an unbiased estimator of
the parameter, expected value should be equal
17:17.230 --> 17:20.699
to the true value.
Now, when we are talking about this true value;
17:20.699 --> 17:26.709
that means, it is the value for the population
that is mu. Now, whatever the estimator, from
17:26.709 --> 17:33.709
the estimator, which is, we are getting which
we should, for which we should use the sample
17:33.980 --> 17:40.220
that is available with us.And I, as I told
that this estimators also have some distribution.
17:40.220 --> 17:45.320
So, as these estimators are also having some
distributions so, we can also calculate, whatever
17:45.320 --> 17:51.470
the properties that we knew from our earlier
lectures and modules is that, so one is that
17:51.470 --> 17:57.730
the expected value. So, if I take the expectation
of that estimator itself, so, that will reach
17:57.730 --> 18:03.919
to one one value.
Now, the difference between the the this expected
18:03.919 --> 18:09.179
value of the estimator and its true value
should be obviously, the desirable thing is
18:09.179 --> 18:14.590
that, it should be as minimum as possible.
So, and that is the bias. So, the estimators
18:14.590 --> 18:19.480
should be unbiased and this is the property
that is written here.At least, when when we
18:19.480 --> 18:26.159
see that if the n tends to infinity, means
n, when the n is the number of sample that
18:26.159 --> 18:33.159
we take. So, when the n tends to infinity,
then if that expectation of the estimator
18:33.230 --> 18:38.090
should be equal to its the populationparameter.
So, if that is the case that is basically
18:38.090 --> 18:42.830
the check for this unbiasedness.
18:42.830 --> 18:48.740
Second is the consistency.It refers to the
asymptotic property, whereby, the error in
18:48.740 --> 18:55.740
the estimator decreases with the increase
in the sample size of n.Thus as n tends to
18:55.860 --> 19:02.860
infinity, estimated value approaches to the
true value of the parameter. So, this is the
19:03.110 --> 19:07.690
consistency.That is, as we are increasing,
the increasing the number of element in the
19:07.690 --> 19:13.929
sample, the estimators should be such that
that, the value of the estimators should be
19:13.929 --> 19:20.929
should approach to the to the true value of
the parameter, that is the parameter of the
19:22.019 --> 19:23.850
population.
19:23.850 --> 19:30.850
Third one is the efficiency.An estimator with
lesser variance is said to be more efficient
19:31.630 --> 19:38.630
compared to that with a greater variance keeping,keeping
other conditions same. So here, you can see
19:40.390 --> 19:47.029
that suppose, we got two estimators and both
the estimators can satisfy the first two properties;
19:47.029 --> 19:54.029
one is the both are unbiased and both are
what is called, that consistent. So, if both,
19:54.669 --> 20:01.669
are both are satisfying the first two condition,
then we have to we have to select the estimator,
20:03.080 --> 20:09.309
which is having the less amount of the variance.
So, this is basically related thing, which
20:09.309 --> 20:15.539
one is more efficient more efficient estimator,
the the estimator having the lesser variance
20:15.539 --> 20:22.539
is is more efficient in such cases. Sufficiency,
if a point estimator utilizes all the information
20:24.470 --> 20:31.049
that is available from the random sample,
then it is called a sufficient estimator.
20:31.049 --> 20:38.049
Now, now we will discuss this thing through,
there are two commonly used method of this
20:38.090 --> 20:45.090
point estimator. This will apply to one of
the known distribution and can easily be handled
20:48.149 --> 20:55.149
through the, this hand calculation problem,
that we will see. So, basically the two commonly
20:55.750 --> 21:01.590
used method for this point estimation is are
the point estimation of the parameter are
21:01.590 --> 21:05.380
the method of moments and other one is method
of maximum likelihood.
21:05.380 --> 21:12.380
So, first we will take this method of moments.The
method of moment is based on the fact, that
21:13.960 --> 21:20.960
the moments of a random variable have some
relationship with the parameters of the distribution.
21:22.059 --> 21:28.179
So, whatever the sample that we are having,
we will calculate what are the moments of
21:28.179 --> 21:35.179
thatrandom variable and that should have some
relationship with the parameters of the distribution.We
21:35.779 --> 21:39.809
have discussed this in the first moment, second
moment with respect to the origin.
21:39.809 --> 21:44.769
You know the first moment with respect to
the origin is the mean, and second moment
21:44.769 --> 21:50.690
with respect to the origin, we generally we
can take, but, we generally do not take from
21:50.690 --> 21:57.399
the second moment onwards.We take it with
respect to the mean and that gives the expression
21:57.399 --> 22:03.299
for the spread of the distribution.All these
things, we have discussed earlier and we have
22:03.299 --> 22:08.200
also discussed that the first moment with
respect to the mean is zero. So, that concept
22:08.200 --> 22:13.809
we have discussed earlier. So, here we will
use that concept to use in this method of
22:13.809 --> 22:18.500
moments.
So, if a probability distribution has m number
22:18.500 --> 22:25.450
of parameters, then the first m moments of
the distribution are equated to the first
22:25.450 --> 22:32.450
m sample moments.The resulting m number of
equations can then be solved to determine
22:33.010 --> 22:40.010
the m number of parameters. So, we will take
two examples.One is that; say for example,
22:40.080 --> 22:45.279
that exponential distribution. So, exponential
distribution is having one parameter, that
22:45.279 --> 22:50.240
you know, the lambda, one parameter is having.
So, the first moments, the first moment of
22:50.240 --> 22:55.309
the sample that you are having should be equated
to the first sample moment.Whatever the sample
22:55.309 --> 23:00.139
that you are having, we can also calculate
what is this first moment, with respect to
23:00.139 --> 23:05.480
origin,of course, and that we can equate to
to get that what is the estimate for this
23:05.480 --> 23:09.870
lambda.
We take another example.That is, say that
23:09.870 --> 23:16.010
normal distribution or gamma distribution,
both are having two parameters. So,for normal
23:16.010 --> 23:21.659
it is mu and sigma square and for gamma you
know that it is alpha and beta. So that, two
23:21.659 --> 23:26.399
parameters are there, then the first two moments,
first moment and the second moment should
23:26.399 --> 23:33.059
be equated to the first two moments of the
sample and this should be equated to those
23:33.059 --> 23:38.210
those parameters.That is, first one should
be equated to the mu in case of normal distribution,
23:38.210 --> 23:44.950
and second one should be equated to the sigma
square in case of normal distribution. So,
23:44.950 --> 23:49.620
there are two unknowns now and there are two
equations also.That we can solve, to get what
23:49.620 --> 23:52.080
are the estimate of those parameters.
23:52.080 --> 23:59.080
Now, for a sample size n, the sample mean
and sample variance that what we use as their
24:02.110 --> 24:09.039
estimator, is that x bar is 1 by n summation
of all the all the sample that we are having.Here,
24:09.039 --> 24:13.730
the sample size is n as it is mentioned here.
So, we will sum up all the sample and divide
24:13.730 --> 24:20.730
by n, that is, basically the arithmetic mean.And
this is, we can show that this is the estimator
24:22.509 --> 24:29.509
for the sample mean, which is satisfying all
the properties that we can, that we have discussed.That
24:30.700 --> 24:37.700
is, four requirements for the estimator.
Now this this is s square, which is the sample
24:38.700 --> 24:44.759
variance, which is 1 by n summation of x I
minus x bar.Again, this x bar is that mean
24:44.759 --> 24:51.759
estimated through this equation and this is
square summing up from this I minus I equals
24:51.779 --> 24:57.039
to 1 to n.Now, this is the point estimator
for the variance.
24:57.039 --> 25:04.039
Now, this one, it can, we can show thatif
we use this one, this will not be not be unbiased.Now,
25:07.299 --> 25:14.299
to make this one unbiased, that is, as I was
telling that if the n tends to infinity, it
25:14.320 --> 25:20.360
should reach to the actual value.Then, what
we have to use?We have to use that 1 by n
25:20.360 --> 25:26.269
minus 1. So, if we use that n minus 1, so,
in many text what you will see that this estimator
25:26.269 --> 25:33.269
for this variance is that a 1 by n minus 1.
So, that minus 1 is basically to make the
25:33.460 --> 25:39.830
estimator unbiased.And you know that, this
minus 1, that we are talking about is basically
25:39.830 --> 25:46.820
a, we can say that it is a, it is that degrees
of freedom. So, one degree of freedom is lost
25:46.820 --> 25:53.820
here, that is why we have to, we need to get
that n minus 1.Why it is lost is that, we
25:55.419 --> 26:01.539
are using this mean which is also estimated
from the sample itself.
26:01.539 --> 26:07.730
So, that is why. So, one degrees of freedom
is lost here, and so, we have to write that
26:07.730 --> 26:13.419
n minus 1.Now, somehow if you know what is
the population mean and if you can replace
26:13.419 --> 26:18.630
this x bar with respect to the population
mean, that is x I minus mu, that square and
26:18.630 --> 26:24.940
if you if we use this quantity, then there
is no need to make that n minus 1, then that
26:24.940 --> 26:31.559
1 minus n is sufficient.
So, thus this x bar and s square are the point
26:31.559 --> 26:36.059
estimates of the population mean and the population
variance. So, these are the point estimate
26:36.059 --> 26:41.230
for those things, which is obtained from the
sample and the parameters of the distribution
26:41.230 --> 26:48.230
can be determined from these. So, these are
the first two,that mean the population are
26:48.659 --> 26:55.659
are the estimate for the population mean and
population variance. So if needed, other higher
26:56.110 --> 27:02.539
order sample moments can also be obtained,
to calculate all the parameters.
27:02.539 --> 27:06.870
So that means, that we can go to the skewness.With
sample estimate of the skewness, we can go
27:06.870 --> 27:10.929
to the sample estimate of the catharsis and
all.
27:10.929 --> 27:16.700
So now, the relation between the parameter
of some common distribution and the moments
27:16.700 --> 27:20.750
are say for example, here we are taking the
normal distribution first.There are two parameters
27:20.750 --> 27:27.620
that you know, one is that mu and other one
is the sigma square are equal to the mean
27:27.620 --> 27:34.519
and variance, like this. So, expectation of
x is equals to mean and the variance is equal
27:34.519 --> 27:40.129
to sigma square.
So, what you can see is that, directly we
27:40.129 --> 27:44.889
can use whatever, we, whatever the estimate
that we have done, that we can put here to
27:44.889 --> 27:48.840
get what is this population mean and the population
variance.
27:48.840 --> 27:55.840
Now, in case of the gamma distribution, now
the parameters are the alpha and beta, that
27:56.080 --> 28:02.259
relates to the mean and variance as as follows.That
you know, that expectation of a x in case
28:02.259 --> 28:07.690
of the gamma distribution is the alpha beta
and the and the variance is alpha beta square.
28:07.690 --> 28:12.850
So, all this things we have discussed in the
earlier modules.You can refer to that those
28:12.850 --> 28:18.809
lectures.Here, we are just using that what
we have seen in the earlier modules.
28:18.809 --> 28:25.809
So, fine. So, as to conclude this one is that,
now you are having, basically, this you can
28:27.879 --> 28:32.620
estimate from this sample.This also you can
estimate from the sample.Now, if you equate
28:32.620 --> 28:36.500
this one with these two parameters, that is,
what we are saying, so, we are having that
28:36.500 --> 28:42.960
two unknown shear alpha and beta and there
are two equations. So, these two can be can
28:42.960 --> 28:48.899
be solved to get what is the estimate of this
alpha and beta.And here, it is straightforward
28:48.899 --> 28:53.919
because, we are getting this mu, is the expectation
which is directly should be the equal to this
28:53.919 --> 28:58.850
x bar.Whatever you have seen and the variance,
should be that sigma square, what you have
28:58.850 --> 29:05.289
seen in this slide. Obviously, if you are
using this x bar, that is, which is also estimated
29:05.289 --> 29:09.679
from this sample, this instead of 1 by n,
it should be n minus 1.
29:09.679 --> 29:16.679
Now, the second method is that method of maximum
likelihood.The method of maximum likelihood,
29:17.549 --> 29:24.549
can be used to obtain the point estimator
of the of the parameters of a distribution
29:26.179 --> 29:31.039
directly.
So, there are some shortcomings of this method
29:31.039 --> 29:37.799
of moments, which we generally suppose that,
sometimes the estimator that is using the
29:37.799 --> 29:44.799
method of moments, what we get is that, sometimes
the solutions are once it is solved, we get
29:44.929 --> 29:50.860
that the, it is not within the within the
range of these parameters. So, sometimes this
29:50.860 --> 29:56.950
kind of things are observed. So, that is the
criticism over the method of moments. So there,
29:56.950 --> 30:01.970
this method of likelihood has found to be
moremore effective.Because, here directly
30:01.970 --> 30:07.340
the distribution we are using and we are we
are developing a likelihood function and that
30:07.340 --> 30:14.070
likelihood function is maximized to estimate
thatthat particular parameter.Now, you know
30:14.070 --> 30:18.320
that how to maximize that likelihood function.First
of all, we will see what is the likelihood
30:18.320 --> 30:23.970
function and then we will maximize that one.Now,
as many parameters are there, so, we have
30:23.970 --> 30:29.409
to maximize all those parameters. So that,
we will get those many equation to have to
30:29.409 --> 30:35.549
solve them to get those estimate.
So, suppose that if a sample value of a random
30:35.549 --> 30:41.950
variable x with density function f x and the
parameter is theta here are this x 1 x 2 x
30:41.950 --> 30:48.950
n then, the maximum likelihood method is aimed
at finding that value of theta, which maximizes
30:49.519 --> 30:54.080
the likelihood of obtaining the set of observations
x 1 x 2 x n.
30:54.080 --> 31:00.039
So, what we have to do is that the likelihood
of the of obtaining a particular sample x
31:00.039 --> 31:07.039
Iis proportional to the function value of
the pdf at x i. So, so this x I means from
31:07.769 --> 31:13.940
this x 1 x 2 x 3, we have to calculate what
is that likelihood function. So, the likelihood
31:13.940 --> 31:20.940
function for obtaining the set of this observation
x 1 x 2 x n is given by these values of this
31:21.690 --> 31:28.690
distribution at each point.That is, what is
the value of that function at x 1, at x 2,
31:29.029 --> 31:36.029
at x 3 and all and their multiplication.
So, this differentiating the likelihood function
31:37.230 --> 31:43.690
with respect to theta now and equating it
to 0, so, this is basically we are we are
31:43.690 --> 31:50.230
maximizing the likelihood function.We are
finding where this likelihood function will
31:50.230 --> 31:57.230
be maximized.We get the value, that is theta
value of of that estimate, that is theta hat,
32:01.679 --> 32:08.169
we can just set which is the maximum likelihood
estimator of this parameter theta, that is
32:08.169 --> 32:14.309
this 1 will equate to 0 and then we will get
that estimate of that theta hat.
32:14.309 --> 32:20.100
The solution can also be obtained by maximizing
the logarithm of this likelihood function.
32:20.100 --> 32:25.879
So, if we take the log also, sometimes this
it will be more that. So, far as mathematical
32:25.879 --> 32:32.309
calculation is concerned, it a may become
easier that will take the log of this likelihood
32:32.309 --> 32:36.570
function and will differentiate with respect
to the parameter, can if there are m numbers
32:36.570 --> 32:41.179
of parameters of the distribution, then the
likelihood function is like this. So, there
32:41.179 --> 32:46.190
are theta 1 theta 2 up to theta m, these are
the parameters of the distribution and this
32:46.190 --> 32:52.929
is you know, that this is a multiplication
sign of this at each sample point x i.
32:52.929 --> 32:58.379
The maximum likelihood estimators are obtained
by solving the following simultaneous equations.
32:58.379 --> 33:04.080
So, we will, we have to take the differentiation
with respect to each parameter theta j, j
33:04.080 --> 33:11.080
can vary from 1 to up to m.And we can take
this; we can take this parcel derivatives
33:12.490 --> 33:18.289
equated to 0. So, we will get m equations
to solve them, to get that estimate of this
33:18.289 --> 33:21.240
m parameters.
33:21.240 --> 33:28.240
Now, we will take one example using both the
methods that we have seen just now.Both, that
33:29.080 --> 33:34.039
thatmethod of method of moment and method
of maximum likelihood.
33:34.039 --> 33:37.529
So, and here we have taken that example of
this exponential distribution.You know this
33:37.529 --> 33:44.360
exponential distribution here, the example
is on this interarrival time of this vehicle
33:44.360 --> 33:51.029
on a certain stretch of a highway is expressed
by an exponential distribution, where this
33:51.029 --> 33:55.409
f t is equals to 1 by lambda e power minus
t by lambda.
33:55.409 --> 34:01.919
Now, means some places or even in this lecture
also, earlier we have, we might have used
34:01.919 --> 34:06.389
some other form, that is, lambda e power minus
lambda t. So, it does not matter. So, there
34:06.389 --> 34:12.120
just parameter is taken that 1 by lambda here.
So, here also. So, the form is not changing,
34:12.120 --> 34:18.629
only thing the parameter represents in a different
way. So, if we use that, the the other form
34:18.629 --> 34:25.590
also that is lambda e power minus lambda t,
then also we can we can get the same result
34:25.590 --> 34:29.460
that we see now.
So, and obviously, here this t is greater
34:29.460 --> 34:36.230
than equal to 0.Now, there are some samples
has been taken, that isthe time between the
34:36.230 --> 34:42.710
successive arrival of the vehicle was observed
as 2.2 seconds, 4 seconds, 7.3 seconds, 1.1
34:42.710 --> 34:49.710
seconds, 6.2 second, 3.4 seconds and 8.1 seconds.
So, this is a sample that we have collected.Now,
34:51.510 --> 34:58.510
determine the mean, inter arrival time that
is lambda by, so, or I, if I do not even want
34:59.030 --> 35:03.940
to mention this; what is this, I just want
to estimate, what is the parameter lambda
35:03.940 --> 35:08.500
for this distribution by two methods.One is
the method of moments and other one is the
35:08.500 --> 35:11.599
method of maximum likelihood.
35:11.599 --> 35:17.339
So, first the method of moment.When we have
said that, we should take the moment with
35:17.339 --> 35:22.809
respect to the origin and that we should equate
with this mean. So, if we take this moment,
35:22.809 --> 35:25.869
you know that this is a first moment with
respect to mean that we have discussed in
35:25.869 --> 35:30.520
the earlier classes, is that that t multiplied
by that distribution.Taking this integration
35:30.520 --> 35:35.869
over the entire support of this distribution
and here this exponential distribution having
35:35.869 --> 35:40.859
the support 0 to infinity. So, we will take
this integration and if we solve this one,
35:40.859 --> 35:47.290
we can see that this mu is equals to lambda
here. So, this is mu. So, the lambda is equals
35:47.290 --> 35:51.510
to mu, which is, we can obtained from this
sample of this x bar, which is the estimator
35:51.510 --> 35:56.520
for this mean. So, this 1 by 7 summation of
all this t I, so, it is the arithmetic mean.
35:56.520 --> 36:01.920
So, 6.04 second is the estimate for this lambda.
Now, if we use the other form of this exponential
36:01.920 --> 36:08.920
distribution that is that is lambda e power
minus lambda t then, you can see that this
36:09.490 --> 36:15.140
mu will become that 1 by lambda. So, there
the lambda will be equals to 1 by x bar and
36:15.140 --> 36:22.140
this will be 1 by 6.04 second. So, it depends
on what way parameters is used here.
36:26.670 --> 36:33.670
And, the other one that is use of this maximum
likelihood maximum likelihood function.That
36:35.780 --> 36:40.390
is, so, assuming the random sampling, the
likelihood function of the observed value
36:40.390 --> 36:45.780
is that is t 1, t 2, t 3 up to t 7. So, all
this sample will take and this will get what
36:45.780 --> 36:51.349
is the likelihood function first.
So, 1 by lambda e power. So, e power minus
36:51.349 --> 36:57.880
t I by lambda. So, at all this observation,
we are getting this value and we are multiplying
36:57.880 --> 37:04.040
them with each other. So, this lambda power
minus 7 exponential of this value, we will
37:04.040 --> 37:08.500
get and the estimator can now be obtained
by differentiating the likelihood function
37:08.500 --> 37:11.980
l with respect to the parameter that is lambda.
37:11.980 --> 37:18.640
So, if we do that that, if we do this derivative,
we take with respect to the lambda, then we
37:18.640 --> 37:25.250
will get this form.If we just equate it to
the 0, then after solving this form, you will
37:25.250 --> 37:27.720
get that again the lambda is equals to 6.04
second.
37:27.720 --> 37:34.720
So, for this one, this example, that we have
shown for both the method, whatever the parameter
37:34.760 --> 37:40.910
that we have that we have estimated are same
for the lambda is equals to 6.04 second for
37:40.910 --> 37:46.109
both the methods; that is method of moment
and method of maximum likelihood. But sometimes,
37:46.109 --> 37:53.109
in some problems, in some distribution this
two estimate may not be same.
37:55.670 --> 38:02.290
So, next we will take that interval estimation.
So, as I was telling that in this point estimation,
38:02.290 --> 38:08.599
we generally get a single value.That is, what
we have seen in in the previous example also,
38:08.599 --> 38:14.559
in both the methods of method of moments or
method of maximum likelihood, we get the single
38:14.559 --> 38:18.430
value.That is, the lambda value that you have
seen that 6.04 second. So, only a single value
38:18.430 --> 38:24.710
that you have seen in the point estimate.
Now, the the interval estimate, we generally
38:24.710 --> 38:29.420
look for a interval, in which, with some confidence
that actual value of the parameter should
38:29.420 --> 38:36.089
lie. So, this is what this interval estimation.
So, in case of point estimate, the chances
38:36.089 --> 38:43.030
are very low that the true value of the parameter
will exactly coincide with the estimated value.
38:43.030 --> 38:49.260
So, as the sample is finite, always there
will be some error. So, hence sometimes, it
38:49.260 --> 38:56.260
is desirable or it is useful to specify an
interval within which the parameter is expected
38:56.440 --> 39:00.530
to lie.
The interval is associated with certain confidence
39:00.530 --> 39:05.940
level that is, it can be stated with certain
degree of confidence that the parameter will
39:05.940 --> 39:10.960
lie within that interval. So, this is what
we have, we are discussing few minutes ago,
39:10.960 --> 39:17.569
that always this estimate whenever we say
that this is the this is the interval, so,
39:17.569 --> 39:24.569
that interval must be associated with some
of some confidence level, in statistical sense.
39:27.869 --> 39:34.869
Well. So, the confidence interval of the mean
with known variance. So, so whatever the estimators
39:38.790 --> 39:43.829
that we have seen for this mean, and as we
are taking it from the sample, so, that will
39:43.829 --> 39:50.390
also have some sampling distribution; it should
have.And if we somehow know what is the variance
39:50.390 --> 39:57.390
and known variance means, we know the population
variance.If we know that, one then, how we
39:58.170 --> 40:04.940
can get that confidence interval for the mean.
So, for a large sample n, so, large sample
40:04.940 --> 40:10.750
is again, this is a subjective word. So, generally
we can, we have seen that, if n is greater
40:10.750 --> 40:15.500
than equal to 30, then you can say that in
case of mean only, so, not for all in general
40:15.500 --> 40:19.800
case, in case of mean, if the sample size
is greater than 30, then we can say that it
40:19.800 --> 40:26.800
is a large sample. So, in such case, if x
bar is calculated sample mean and the sigma
40:27.589 --> 40:33.280
square is the known variance of the population.
So, sigma square I know which is exactly which
40:33.280 --> 40:39.280
is actual value of this population.How we
know, that is the second issue, but, we know
40:39.280 --> 40:44.500
this sample population we know the variance
of the population.Only thing we are interested
40:44.500 --> 40:49.339
to know, what is the interval for this sample
mean.
40:49.339 --> 40:56.339
Then, it is, it can be shown that this x bar
is is a normal distribution with mean equals
40:59.809 --> 41:06.390
to mu, which is the population mean and the
variance of this x bar, that is, this sample
41:06.390 --> 41:13.390
mean, variance of the x bar is sigma square
by n or the standard deviation of this x bar
41:13.589 --> 41:18.309
is sigma by square root n.
So, this what I want to repeat once again,
41:18.309 --> 41:24.230
that this sigma square is the variance of
the population for the random variable x.Now,
41:24.230 --> 41:31.230
what we are talking about is this x bar.This
x bar is again another random variable, which
41:32.250 --> 41:39.250
is normally distributed having the same mean
of the population, which is mu and its standard
41:40.920 --> 41:47.380
deviation is sigma by square root n.
Now, you know from the, from our earlier lecture,
41:47.380 --> 41:53.119
that this if we just take this quantity now,
that this, what is the random variable minus
41:53.119 --> 41:58.770
its mean divided by its standard deviation
is a is a is a standard normal distribution.
41:58.770 --> 42:05.190
So, that is why this x bar minus mu by sigma
square root n is a standard normal variate.
42:05.190 --> 42:10.040
So, now for once we know that this is the
quantity and this follow a standard normal
42:10.040 --> 42:16.329
distribution.Now, we can calculate whatever
the confidence interval that we are looking
42:16.329 --> 42:23.329
for. So, the confidence interval of the mean
mu is given by is this, that is, x bar.Basically,
42:27.799 --> 42:32.470
we are just equating it with two side of this
standard normal distribution.
42:32.470 --> 42:39.470
Now, if you if you see this one here.So, if
this is your standard normal distribution,
42:42.290 --> 42:49.290
then basically that that quantity, that is
x bar minus mu by sigma by square root n,
42:50.390 --> 42:55.329
that should, so, this is your, this is the
standard normal distribution. So, this the
42:55.329 --> 43:00.410
distribution that I have drawn is a standard
normal distribution. So, this should lie between
43:00.410 --> 43:07.410
these two values here, should in such a way
that this area should be your, that whatever
43:07.880 --> 43:14.880
the confidence limit that you wish to specify.
Now, so, so this is the confidence level of
43:15.960 --> 43:22.450
that estimated that it should have.Now, if
I just say that this this confidence level
43:22.450 --> 43:29.450
is, say at some some level, say that 95 percent
confidence level. So, whatever is remaining
43:30.470 --> 43:37.470
here is your 0.025 and whatever is remaining
here is again 0.025.
43:38.420 --> 43:45.420
So, we have to find out these two values.Suppose
that, if I just put that one, z alpha by 2
43:46.859 --> 43:52.020
and and this is say that z alpha by 2; obviously,
this will be the negative side, this is 0
43:52.020 --> 43:57.420
for the standard 1. So, this one, so, this
quantity should lie between this z alpha by
43:57.420 --> 44:04.420
2 and this z alpha by 2, where this alpha
by 2 are basically the cumulative probability
44:05.670 --> 44:12.670
up to that point.
So...So, this x bar minus mu divided by sigma
44:13.900 --> 44:20.900
by square root n should lie between this two,
z alpha by 2 and this minus z alpha by, sorry,
44:23.400 --> 44:30.400
z alpha by 2. So, z is the is the variate,
is the is the reduced variate for this standard
44:32.770 --> 44:39.770
normal distribution.And, this alpha by 2 is
this part where we are. So, the confidence,
44:39.849 --> 44:46.849
if I just want to relate to this confidence
level is that, it will be that 1 minus alpha
44:47.400 --> 44:54.400
multiplied by 100 confidence level. So, this
is the confidence level that we can say. So,
44:54.640 --> 45:01.290
that if this area, this white area is the
alpha by 2, then this 95, that is here in
45:01.290 --> 45:07.220
case of 0.25 here, can be relate to this one.
So, this is that confidence interval that
45:07.220 --> 45:13.160
we are, that is the confidence level that
we are talking about for this, once we can
45:13.160 --> 45:20.160
put this limits as this z alpha by 2.Now,
if I just do some arithmetic change, then
45:22.040 --> 45:27.970
it will come like this; that is, mu should
be equals to here that x bar plus z alpha
45:27.970 --> 45:34.970
by 2 sigma by square root n.Here it will be,
x bar minus z alpha by 2 sigma by square root
45:39.589 --> 45:42.780
n.
So, you remember, that even this alpha by
45:42.780 --> 45:49.780
2 when we are talking about. So, so this alpha
by 2, it is automatically, when it is coming
45:49.890 --> 45:56.540
to this negative side, it have this negative
value. So, do not confuse that this will have
45:56.540 --> 46:00.589
this have the negative value and this again
another negative sign is here. So, this will
46:00.589 --> 46:06.619
not be the plus. So, this is a single value
that we are using here, which should havethat
46:06.619 --> 46:10.990
this particular quantity.From the symmetry,
both the values will be same numerically,
46:10.990 --> 46:14.619
only this one will be positive and this one
will be negative.
46:14.619 --> 46:20.619
So, this is what is mention here, that is
x bar minus z alpha by 2 sigma by square root
46:20.619 --> 46:27.619
n minus mu less than x bar plus z alpha by
2 sigma by square root n. So, you know that
46:28.520 --> 46:32.510
from the continuous distribution less than
and less than equals to are same.
46:32.510 --> 46:37.920
Where, this 1 minus alpha into 100 percent,
this is the quantity, which is the degree
46:37.920 --> 46:44.119
of confidence.And here, this minus plus z
alpha by 2 is the value of the standard normal
46:44.119 --> 46:50.020
variate, at the cumulative probability level
alpha by 2 and 1 minus alpha by 2.
46:50.020 --> 46:56.690
So, when you are taking this minus sign minus
z alpha by 2, this is is the probability level
46:56.690 --> 47:03.309
at this alpha by 2. So, if it is 95, if you
put this alpha equals to 0.95 then; obviously,
47:03.309 --> 47:10.309
1 minus, so, sorry, this is full is equals
to 0.95, then alpha will become 0.05 and this
47:11.799 --> 47:18.799
alpha by 2 will become 0.025.
Now, at this 0.025, this z alpha by 2 will
47:19.010 --> 47:26.010
be, say if it is 0.95, you know that this
value will be 1.96. So, this will be x bar
47:26.500 --> 47:33.460
minus 1.96 multiplied by sigma by square root
n and here also, plus 1.96 sigma by square
47:33.460 --> 47:38.329
root n this will come.
47:38.329 --> 47:44.079
Now, as we are discussing that if the sample
is more, if the sample is large, if it is
47:44.079 --> 47:51.079
more than 30, now in other case, if the sample
size is small, say if it is less than 30 and
47:51.369 --> 47:57.030
if the x bar is the calculated sample mean
and this, then this s square is this calculated
47:57.030 --> 48:04.030
sample variance, then the random variable
x bar minus mu divided by s square root of
48:04.359 --> 48:08.240
n, follow a t distribution with n minus 1
degrees of freedom.
48:08.240 --> 48:15.240
Here you see, this is that the variance, we
do not know that is the unknown variance.
48:15.880 --> 48:22.180
So, this one also we have to calculate from
the from the sample itself. So, this also,
48:22.180 --> 48:28.200
again when you take that this x bar minus
this population mean divided by this sample
48:28.200 --> 48:33.339
variance divided by square root n.
So, this one instead of following this standard
48:33.339 --> 48:40.339
normal distribution, it will follow the t
distribution with n minus 1 degrees of freedom.
48:40.359 --> 48:45.339
So, this t distributions and all we have discussed
earlier.Only thing is that, if the sample
48:45.339 --> 48:49.609
size is small, this quantity will follow the
t distribution.If the sample size is more
48:49.609 --> 48:54.960
and thisthis variance is known, then this
will follow a normal distribution.
48:54.960 --> 49:01.960
So, here in this case, this this confidence
interval will be x bar minus t alpha by 2
49:03.069 --> 49:09.280
sigma by square root n and x bar plus t alpha
by 2 sigma by square root n.So, this is a
49:09.280 --> 49:15.049
t distribution with degrees of freedom n minus
1.And, you can see even that from standard
49:15.049 --> 49:22.049
text book about this t distribution, when
this n n goes beyond 30, then the value of
49:22.280 --> 49:25.680
this t alpha by 2 and the z alpha by 2 are
essentially same.
49:25.680 --> 49:32.680
So,where this again, the 1 minus alpha into
100 percent is the degree of confidence and
49:35.970 --> 49:41.490
minus t alpha by 2, is the value of the standard
t distribution variate at cumulative probability
49:41.490 --> 49:47.970
again, that alpha by 2 and this 1 minus alpha
by 2. So, basically the difference between
49:47.970 --> 49:54.369
the t and the and the standard normal distribution
is at this lower end level, where you will
49:54.369 --> 50:00.819
get a a wider estimate of the interval because,the
when the sample size is less, as there are
50:00.819 --> 50:06.569
uncertainty is more.
Though it is assumed that the sample is drawn
50:06.569 --> 50:13.569
from a normal population, the expression applies
roughly for non normal population also. So,
50:13.599 --> 50:19.539
basically when the population is normal distributions,
these are very well acceptable method. But,
50:19.539 --> 50:24.099
even though it is non normal also, this method
we can apply.
50:24.099 --> 50:31.099
So, we will take one example of this, whatever
we have seen, is that 30 concrete cubes prepared
50:34.510 --> 50:41.510
under certain condition.The sample mean of
these cubes is found to be 24 kilo Newton
50:43.569 --> 50:49.230
per meter cube, if the standard deviation
is known to be 4 kilo Newton per meter cube,
50:49.230 --> 50:53.869
determine the 99 percent and 95 percent confidence
interval of the mean strength of the concrete
50:53.869 --> 50:58.130
cube.
So, this 4 kilo Newton per meter cube is known;
50:58.130 --> 51:05.130
it is from the population.And this one, when
we say that this one is obtained from this
51:07.680 --> 51:08.510
sample.
51:08.510 --> 51:13.579
So, if we want to solve this one, then you
know, that we first of all, we will get that
51:13.579 --> 51:19.000
what should be the quintile value for this
z alpha by 2 and this is for this 99% confidence
51:19.000 --> 51:26.000
interval from the standard normal table.You
can see that it is 2.575, which is z alpha
51:26.119 --> 51:32.410
by 2 value.So, this sigma by square root n
multiplied by z alpha by 2 is equals to 1.88,
51:32.410 --> 51:33.940
whatever the data is available.
51:33.940 --> 51:40.940
So, the 99 percent confidence interval will
be the mean minus that quantity 1.88 and mean
51:41.819 --> 51:47.630
plus 1.88. So, the confidence interval is
22.12 and 25.88 kilo Newton per meter square.
51:47.630 --> 51:54.020
Sorry.
To determine the 95 percent confidence interval,
51:54.020 --> 52:00.760
again we have to find out the z alpha by 2
and this is your 1.96, earlier it was 2.575.
52:00.760 --> 52:03.099
So, it is now 1.96.
52:03.099 --> 52:10.099
If we calculate this one, it will become 1.43
and the 95 percent confidence interval of
52:10.530 --> 52:17.530
the mean strength of this concrete will be
24 minus 1.43 and 24 plus 1.43. So 22.57 and
52:23.130 --> 52:30.130
25.43 kilo Newton per meter square.Sorry.This
is or this is the density sorry sorry this
52:30.730 --> 52:35.020
is the density. So, this it is kilo meter
per meter cube.
52:35.020 --> 52:41.960
But, what we should observe here is that there
are two confidence interval.We have determined
52:41.960 --> 52:45.720
one is the 99 percent confidence interval
and other one is the 95 percent confidence
52:45.720 --> 52:49.660
confidence interval.
So, the 95 percent confidence interval is
52:49.660 --> 52:56.660
22.57 to 25.43 whereas, the 99 percent confidence
interval is 22.12 and 25.88. So, you can see
52:59.640 --> 53:06.430
that this 99 percent confidence interval is
wider, because it is more likely that the
53:06.430 --> 53:12.589
larger interval will contain the mean value
than the smaller one. So, hence the 99 percent
53:12.589 --> 53:18.490
confidence interval is larger, when the 95
percent confidence interval whereas compare
53:18.490 --> 53:23.349
to this 95 percent confidence interval.
So, this example, that we have used, it is
53:23.349 --> 53:29.930
we we we have taken that a variance is known.
So, once we we have decided the variance is
53:29.930 --> 53:33.630
known, we have used the standard normal distribution.
But, in the other case, we have seen that
53:33.630 --> 53:39.130
sometimes the sample size is less and we have
to we have to use, in that case, we have to
53:39.130 --> 53:44.569
use that, what is the t distribution.From
the t distribution interval, we have to use
53:44.569 --> 53:47.730
that one.
So, maybe we will take up the same example,
53:47.730 --> 53:54.730
but, in that time we will just declare that
whatever the distribution, whatever the standard
53:54.869 --> 53:59.079
deviation that we got, it is not from the
population, but, from the sample. So, that
53:59.079 --> 54:04.490
example we will basically start from the next
lecture and after that we will take what should
54:04.490 --> 54:11.000
be the, that estimation for the other parameters
like the variance proportion and all.And,
54:11.000 --> 54:17.480
we will also relate, we will also see about
this test of hypothesis which is obviously,
54:17.480 --> 54:24.130
essential when we are comparing the mean of
from the two different samples or the variance
54:24.130 --> 54:28.140
or proportion from. So, when it is related
to two different samples, then we have to
54:28.140 --> 54:34.030
go for those testing. So, we will start from
this point in the next lecture.Thank you.