WEBVTT
Kind: captions
Language: en
00:00:17.930 --> 00:00:27.410
Welcome students to the MOOCs series of lectures
on Statistical Inference. And this is the
00:00:27.410 --> 00:00:43.840
9th lecture of the series. If you remember
in the previous classes, we have seen sampling
00:00:43.840 --> 00:01:08.860
distributions of
say x bar s square, this is the sample mean,
00:01:08.860 --> 00:01:25.100
this is sample variance. And we have seen
that this is unbiased for population mean.
00:01:25.100 --> 00:01:38.310
This is not unbiased for sigma square, this
is not unbiased for sigma square, which is
00:01:38.310 --> 00:01:46.420
the population variance, but a constant multiplier
of sample variance will give an unbiased estimator
00:01:46.420 --> 00:01:56.180
for sigma square.
So, if you observe, we see that the above
00:01:56.180 --> 00:02:29.090
statistic are some arithmetic functions
of the sample observations, x 1, x 2 up to
00:02:29.090 --> 00:03:01.709
x n. But, we often need to estimate some other
population parameters.
00:03:01.709 --> 00:03:27.919
For example, say minimum or maximum. So, if
we take a sample based on that can we estimate
00:03:27.919 --> 00:03:34.090
the minimum value of the attribute that we
are interested in in the population or say
00:03:34.090 --> 00:03:52.350
the maximum value of the attribute in the
population or say median or in fact any other
00:03:52.350 --> 00:04:09.550
quantiles. For example, what is the 10 percentile
of the population or what is the 25 percentile,
00:04:09.550 --> 00:04:18.900
which is the first quartile of the population.
So, we may like to estimate things like that
00:04:18.900 --> 00:04:34.230
range of the values say, what is the expected
range of the attribute in the population that
00:04:34.230 --> 00:04:54.460
is the maximum minus the minimum.
So, these parameters do not really depend
00:04:54.460 --> 00:05:32.180
on some arithmetic operations. In fact, they
depend on some relative ordering of the values
00:05:32.180 --> 00:05:59.210
in the sample. So, basically you have observed
n samples x 1, x 2, x n. And if we find a
00:05:59.210 --> 00:06:09.060
relative ordering among those values to some
extent, we can feel that they may give us
00:06:09.060 --> 00:06:18.770
clue about what can possibly be the expected
value of these parameters that we have just
00:06:18.770 --> 00:06:30.219
mentioned.
So, this leads to what is called order statistic.
00:06:30.219 --> 00:06:56.800
As the name suggests, it is not only one statistic,
but there are different statistics involved.
00:06:56.800 --> 00:07:08.779
And they are based on the relative ordering
of the values in the sample. So, definition
00:07:08.779 --> 00:07:30.150
suppose
x 1, x 2, x n is a random sample
00:07:30.150 --> 00:08:04.680
of size n from a univariate distribution with
pdf is equal to f x.
00:08:04.680 --> 00:08:29.069
So, since we are talking about PDF, we can
we are assuming that f is a continuous distribution
00:08:29.069 --> 00:08:42.279
continuous density function. What does it
mean, it means that probability x i is equal
00:08:42.279 --> 00:08:54.570
to x j i not equal to j is equal to 0. So,
we have x 1, x 2, x n n observation from a
00:08:54.570 --> 00:09:13.370
continuous density function f x. Suppose,
we arrange them in
00:09:13.370 --> 00:09:35.040
increasing order so we are arranging them
in increasing order of magnitude.
00:09:35.040 --> 00:10:01.589
Let the arrangement be X 1 I am using strictly
less than, because we assumed that two sample
00:10:01.589 --> 00:10:08.560
values cannot be equal that probability is
0. Therefore, we can get a relative order
00:10:08.560 --> 00:10:20.440
of the sample. So, what is X 1, you remember
this parenthesis, this is not equal to x 1.
00:10:20.440 --> 00:10:50.509
The first observed sample, rather it is the
smallest of all the observed values. X 2 is
00:10:50.509 --> 00:11:14.060
similarly not necessarily equal to x 2. X
2 is the second smallest observed value etcetera.
00:11:14.060 --> 00:11:30.670
And X n is the maximum of the
of of the values ok.
00:11:30.670 --> 00:12:02.540
So, let me give you an illustration. Suppose,
we have taken 5 observations from uniform
00:12:02.540 --> 00:12:18.880
0, 1 that means, from the real line interval
0 to 1. We have taken 5 values, let x 1 be
00:12:18.880 --> 00:12:46.820
0.81, x 2 is equal to 0.15, x 3 is equal to
say 0.29, x 4 is equal to 0.75, and x 5 is
00:12:46.820 --> 00:13:00.160
equal to say 0.52, as you can see that they
are not in any sorted order.
00:13:00.160 --> 00:13:26.560
So, from this sample, we can get x 1 is equal
to 0.15, x 2 is equal to 0.29, x 3 is equal
00:13:26.560 --> 00:13:50.480
to 0.52, x 4 is equal to 0.75, and x 5 is
equal to 0.81. Therefore, you can understand
00:13:50.480 --> 00:14:02.490
that once we sort these values, we get a particular
arrangement of the observed values. So, this
00:14:02.490 --> 00:14:17.089
is the order statistic
00:14:17.089 --> 00:14:28.600
generated from this sample. In fact, this
is not the only sample that generates this.
00:14:28.600 --> 00:14:35.009
In fact, since there are five observations,
we can have factorial five many different
00:14:35.009 --> 00:14:42.910
orderings of the observation each of which
will give rise to the same order statistic.
00:14:42.910 --> 00:14:54.310
Therefore, what is an order statistic, given
a sample of n observations from a distribution
00:14:54.310 --> 00:15:04.890
function say f x. When we arrange the observed
values in increasing order, where x subscript
00:15:04.890 --> 00:15:14.190
bracket i is the ith minimum in the arranged
order. Then this sequence X 1, X 2, X n is
00:15:14.190 --> 00:15:35.110
the order statistic generated from that sample.
Question, how to get the distribution of the
00:15:35.110 --> 00:15:57.800
order statistics that is the question? We
may like to know the sampling distribution
00:15:57.800 --> 00:16:27.810
of X 1 or that is the minimum. Therefore,
if we get the distribution of X 1 the first
00:16:27.810 --> 00:16:38.410
order statistic, then we can know or we can
infer about the distribution of the minimum
00:16:38.410 --> 00:17:20.780
of the sample. Similarly, X n gives for maximum.
Also, We may be interested in the joint distribution
00:17:20.780 --> 00:17:50.180
of two order statistics e.g X r and X s where
r less than s. Say for example, how the 3rd
00:17:50.180 --> 00:18:02.060
order statistic and the 8th order statistic
in a sample of size say 10 are jointly distributed.
00:18:02.060 --> 00:18:10.050
We will see the applications of such distributions,
but let me first illustrate with some simpler
00:18:10.050 --> 00:18:36.630
examples.
So, let us consider uniform 0, 1 distribution,
00:18:36.630 --> 00:18:54.521
and we have observed n samples from this distribution.
Suppose, we are interested in the distribution
00:18:54.521 --> 00:19:20.950
of
x n the nth order statistic. So, what we will
00:19:20.950 --> 00:19:28.900
do, we will first calculate the cumulative
distribution function of the nth ordered statistic.
00:19:28.900 --> 00:19:42.950
So, this is the cdf of X n, we are denoting
with F n ok. So, this is probability that
00:19:42.950 --> 00:19:58.190
the nth order statistic is less than equal
to x. Suppose, this is 0, 1 and this is x,
00:19:58.190 --> 00:20:07.320
and we have taken n samples. And we want that
the maximum of this suppose this is the maximum,
00:20:07.320 --> 00:20:12.370
which is less than equal to x what is the
probability.
00:20:12.370 --> 00:20:32.920
We know that the event X n less than equal
to x happens. If when it will happen, that
00:20:32.920 --> 00:20:42.850
the maximum is less than equal to x that means,
that all the n samples are actually less than
00:20:42.850 --> 00:21:05.350
equal to x right. Therefore, probability ok
therefore probability X n less than equal
00:21:05.350 --> 00:21:18.650
to x is equal to probability all x 1, x 2,
x n less than equal to x. And what is the
00:21:18.650 --> 00:21:32.220
probability that any one of them is less than
equal to x that is F x. Similarly, what is
00:21:32.220 --> 00:21:49.160
x 2 less than equal to x that is also F x
into F x is equal to F x to the power n right.
00:21:49.160 --> 00:21:56.440
Therefore, the maximum will be less than equal
to x if all the observations are less than
00:21:56.440 --> 00:22:04.400
equal to x, and that probabilities F x, where
f is the parent distribution whole to the
00:22:04.400 --> 00:22:31.290
power n.
Therefore therefore, if we differentiate the
00:22:31.290 --> 00:22:39.850
cumulative distribution with respect to x,
we will get the F x or we denote it as F n
00:22:39.850 --> 00:23:02.220
x that is that pdf of nth order statistic,
which is equal to d dx of F x power n, which
00:23:02.220 --> 00:23:21.170
is equal to n times F x to the power n minus
1 d F x dx, which is equal to n into F x to
00:23:21.170 --> 00:23:33.240
the power n minus 1 to f x. So, once we know
the parent distribution at the parent density
00:23:33.240 --> 00:23:47.750
function and the corresponding cdf. Then if
we take n samples, then we can get that pdf
00:23:47.750 --> 00:23:56.410
like this.
So, example
00:23:56.410 --> 00:24:16.500
as I said, I will take uniform 0, 1 n samples,
F x is equal to x. Therefore, f n x is equal
00:24:16.500 --> 00:24:33.320
to n times x to the power n minus 1 into f
x, which is is equal to 1 is equal to n into
00:24:33.320 --> 00:25:10.840
x to the power n minus 1. So, suppose we have
taken 10 samples from uniform 0, 1 one. Therefore,
00:25:10.840 --> 00:25:20.060
f 10 at x is equal to 10 into x to the power
9 right.
00:25:20.060 --> 00:25:45.500
Or in other words, 10 samples from uniform
0, 1, then expected value of the maximum is
00:25:45.500 --> 00:26:02.440
equal to expected value of X 10 is equal to
0 to 1 10 x to the power 9 multiplied by x
00:26:02.440 --> 00:26:20.710
dx is equal to 0 to 1 10 x to the power 10
dx is equal to 10 upon 11 x to the power 11
00:26:20.710 --> 00:26:39.420
1, 0 is equal to 10 upon 11.
Can you from here guess, what will be the
00:26:39.420 --> 00:27:03.100
expected value of X n in general? Obviously,
it is 0 to 1 n x to the power n minus 1 times
00:27:03.100 --> 00:27:23.240
x dx, and it is n upon n plus 1. Therefore,
what does it say, therefore as n increases
00:27:23.240 --> 00:27:37.141
the expected value
converges to 1, because n upon n plus 1. If
00:27:37.141 --> 00:27:42.890
you take n to be large, it comes close and
close to 1.
00:27:42.890 --> 00:27:59.530
Another example
x 1, x 2, x n are from
00:27:59.530 --> 00:28:19.100
exponential with lambda. Obviously, they are
independent with from exponential lambda.
00:28:19.100 --> 00:28:42.480
So, what is the pdf of X n, we know that f
of x is equal to lambda e to the power minus
00:28:42.480 --> 00:28:57.790
lambda x. And what is F of x capital F of
x 1 minus e to the power minus lambda x. Therefore,
00:28:57.790 --> 00:29:20.000
f n of x is equal to n into 1 minus e to the
power minus lambda x whole to the power n
00:29:20.000 --> 00:29:33.851
minus 1 into lambda e to the power minus lambda
x. So, like that we can get the distribution
00:29:33.851 --> 00:29:44.830
of the maximum of the n samples from a exponential
random variable.
00:29:44.830 --> 00:29:59.420
Now, let us consider
00:29:59.420 --> 00:30:21.860
distribution of
X 1. So, what is the probability that the
00:30:21.860 --> 00:30:45.250
smallest of the observation is less than equal
to x. Now, the smallest will be less than
00:30:45.250 --> 00:31:29.690
equal to x, it is the compliment of right.
So, we consider uniform 0, 1, this is x. Therefore,
00:31:29.690 --> 00:31:38.750
if any one of them is less than x, then the
minimum has to be less than equal to x. But
00:31:38.750 --> 00:31:45.480
if two of them are less than x, still the
minimum is less than equal to x. And if all
00:31:45.480 --> 00:31:53.660
n are less than x, then also the minimum is
less than equal to x. Therefore, the probability
00:31:53.660 --> 00:32:01.810
that minimum less than equal to x is complement
of all the observations are here. Therefore,
00:32:01.810 --> 00:32:22.850
1 minus probability all the observations are
greater than x.
00:32:22.850 --> 00:32:32.950
And therefore, this is 1 minus 1 minus F x
whole to the power n, because this is the
00:32:32.950 --> 00:32:48.940
probability
and observation is greater than x. Therefore,
00:32:48.940 --> 00:33:15.470
what is f 1 x pdf of X 1 is equal to d dx
of 1 minus 1 minus F x whole to whole to the
00:33:15.470 --> 00:33:31.960
power n is equal to n into 1 minus F x whole
to the power n minus 1. This minus will give
00:33:31.960 --> 00:33:49.861
you a minus into minus of d dx of F x is equal
to n into 1 minus F x whole to the power n
00:33:49.861 --> 00:34:24.190
minus 1 into f x.
Therefore, example x 1, x 2, x n are from
00:34:24.190 --> 00:34:43.419
uniform 0, 1 independent. Therefore, f 1 x
is equal to n times, what is F x, F x is x
00:34:43.419 --> 00:34:52.109
for uniform 0, 1, it is a it is 1 minus x
whole to the power n minus 1 into f x is equal
00:34:52.109 --> 00:35:02.750
to 1. Therefore, this is n into 1 minus x
whole to the power n minus 1. Therefore, expected
00:35:02.750 --> 00:35:11.779
value of X 1 is equal to integration 0 to
1 n into 1 minus x whole to the power n minus
00:35:11.779 --> 00:35:26.529
1 into x dx. How do you integrate that this
is n into 0 to 1 1 minus x whole to the power
00:35:26.529 --> 00:35:38.579
n minus 1 x to the power 2 minus 1 dx. And
this comes under our familiar beta integral
00:35:38.579 --> 00:35:45.599
right.
We know integration 0 to 1 x to the power
00:35:45.599 --> 00:35:59.349
m minus 1 into 1 minus x whole to the power
n minus 1 dx is equal to beta m comma n is
00:35:59.349 --> 00:36:05.720
equal to gamma m gamma n upon gamma m plus
n.
00:36:05.720 --> 00:36:25.250
Therefore, the expected value of X 1 is equal
to n times gamma 2 gamma n upon gamma n plus
00:36:25.250 --> 00:36:43.150
2 is equal to n into 1 into factorial n minus
1 upon factorial n plus 1 is equal to n factorials
00:36:43.150 --> 00:37:08.910
upon n plus 1 factorial is equal to 1 upon
n plus 1. Therefore, expected value of
00:37:08.910 --> 00:37:29.369
depends upon the number of samples. And as
n increases this quantity 1 upon n plus 1
00:37:29.369 --> 00:37:38.750
converges to 0, which is expected. Because,
if you are sampling from uniform 0, 1, then
00:37:38.750 --> 00:37:46.180
the minimum is expected to go to 0. And as
we have observed, the maximum is expected
00:37:46.180 --> 00:38:00.319
to converge to 1.
Now, suppose we take n independent samples
00:38:00.319 --> 00:38:40.690
from exponential lambda. What is the pdf of
the minimum? We know that f 1 x is equal to
00:38:40.690 --> 00:38:51.109
n into 1 minus F x whole to the power n minus
1 into f x. So, in this case, it is going
00:38:51.109 --> 00:39:04.500
to be n into 1 minus 1 minus e to the power
minus lambda x whole to the power n minus
00:39:04.500 --> 00:39:19.750
1 times lambda e to the power minus lambda
x is equal to n into 1, 1 cancels e to the
00:39:19.750 --> 00:39:32.440
power minus lambda into n minus 1 x into lambda
e to the power minus lambda x.
00:39:32.440 --> 00:39:45.210
This is is equal to n lambda e to the power
minus lambda into n minus 1 plus 1 x is equal
00:39:45.210 --> 00:40:07.200
to n lambda e to the power minus. Therefore,
the minimum
00:40:07.200 --> 00:40:29.150
of n samples from exponential lambda is distributed
as exponential with n lambda.
00:40:29.150 --> 00:41:01.660
Suppose, now we take n samples from
different exponential distributions. Say x
00:41:01.660 --> 00:41:14.609
1 is from lambda 1 e to the power minus lambda
1 x. x 2 is from lambda 2 e to the power minus
00:41:14.609 --> 00:41:32.470
lambda 2 x. x n is from lambda n e to the
power minus lambda n x. The question is
00:41:32.470 --> 00:41:51.359
what is the distribution of
the minimum. This is not exactly coming under
00:41:51.359 --> 00:41:57.420
order statistic, but the idea that I gave
will tell you.
00:41:57.420 --> 00:42:19.930
That let G x denote probability minimum of
X 1, X 2, X n less than equal to x is equal
00:42:19.930 --> 00:42:43.249
to 1 minus probability all X 1, X 2, X n greater
than x is equal to 1 minus product of probability
00:42:43.249 --> 00:42:55.019
X i greater than x i is equal to 1 to n is
equal to 1 minus product of i is equal to
00:42:55.019 --> 00:43:10.440
1 to n 1 minus 1 minus e to the power minus
lambda i x.
00:43:10.440 --> 00:43:23.779
Is equal to 1 minus product i is equal to
1 to n of e to the power minus lambda i x
00:43:23.779 --> 00:43:34.150
is equal to 1 minus e to the power minus lambda
1 plus lambda 2 plus lambda n whole to the
00:43:34.150 --> 00:43:45.390
power x is equal to 1 minus e to the power
minus sigma lambda i to the power in to x.
00:43:45.390 --> 00:43:54.849
Therefore, we see that the actual result is
more generalized. Even if lambda 1 lambda
00:43:54.849 --> 00:44:05.339
2 and lambda n are all different, then the
cdf is going to be 1 minus e to the power
00:44:05.339 --> 00:44:14.789
minus sigma lambda i into x. In order statistic
we found that if all the lambda is are similar,
00:44:14.789 --> 00:44:30.900
then this came out to be n lambda. Therefore,
in this case the minimum
00:44:30.900 --> 00:44:46.930
will be distributed as exponential with sigma
lambda i.
00:44:46.930 --> 00:45:14.039
Now, let me give an interesting example suppose
X 1, X 2, X n are independent samples
00:45:14.039 --> 00:45:52.779
from uniform 0, 1, what is the expected value
of the range. What is the range? It is the
00:45:52.779 --> 00:46:08.339
difference of the maximum minus minimum . We
know that if these are the samples, then the
00:46:08.339 --> 00:46:18.349
range is x n minus x 1 .
Suppose, you want to find out the expected
00:46:18.349 --> 00:46:40.749
value of the range, by linearity of expectation,
we can say it is the expected value of X n
00:46:40.749 --> 00:46:54.509
minus expected value of X 1 is equal to n
upon n plus 1 minus 1 upon n plus 1 is equal
00:46:54.509 --> 00:47:09.680
to n minus 1 upon n plus 1. Therefore, if
we take n samples, the expected value of ranges
00:47:09.680 --> 00:47:15.690
n minus 1 upon n plus 1 which is; obviously,
less than 1 because we are taking samples
00:47:15.690 --> 00:47:23.950
from uniform 0, 1 .
But what is going to happen as n increases
00:47:23.950 --> 00:47:34.619
these value converges to 1 right. And this
is expected because we are talking about samples
00:47:34.619 --> 00:47:43.019
from 0 to 1. Therefore, as more and more samples
will be taken we expect that the entire range
00:47:43.019 --> 00:47:48.980
will be covered the entire span from 0 to
1 will be covered. Therefore, the expected
00:47:48.980 --> 00:47:57.480
value of the range is going to be very very
close to 1 as n increases .
00:47:57.480 --> 00:48:11.390
Note that
00:48:11.390 --> 00:48:41.140
we have not actually considered the distribution
of range, in fact, we have used linearity
00:48:41.140 --> 00:48:58.240
property
of expectations to compute the expected value
00:48:58.240 --> 00:49:19.150
of range from expected value of the maximum
and expected value of the minimum.
00:49:19.150 --> 00:49:45.599
To get the actual distribution of range we
need to know the joint distribution of X 1
00:49:45.599 --> 00:49:58.900
and X n . From there we can find out the expectation
of the range . In fact, from there we find
00:49:58.900 --> 00:50:03.950
out the distribution of the range; and from
there we shall calculate the expected value
00:50:03.950 --> 00:50:22.299
of range .
Also to compute the distribution of different
00:50:22.299 --> 00:50:54.440
quantiles, we need to know the PDF of the
rth order statistic for different r . What
00:50:54.440 --> 00:51:09.789
is rth order statistic, we know that X 1 less
than X 2 . So, rth order statistic is the
00:51:09.789 --> 00:51:28.710
minimum the rth minimum in the sample.
In the next class, I shall look at the distribution
00:51:28.710 --> 00:51:39.430
of the rth order statistic for different r
from 1 to n . And I shall also look at the
00:51:39.430 --> 00:51:51.359
joint distribution of rth and sth order statistic
in the next class.
00:51:51.359 --> 00:52:02.510
Thank you.