WEBVTT
Kind: captions
Language: en
00:00:17.960 --> 00:00:29.420
Welcome students to the 4th lecture on the
MOOC's series on statistical inference .
00:00:29.420 --> 00:00:56.170
If you remember in the last class I was talking
about estimating p where p is the proportion
00:00:56.170 --> 00:01:25.280
of items in universe use say having an
attribute A. So, this was the problem that
00:01:25.280 --> 00:01:28.750
we have done towards the end of the last class
.
00:01:28.750 --> 00:01:47.950
And what we have found that if universe is
equal to X 1 X 2 X N of which pre proportion
00:01:47.950 --> 00:02:22.370
having A attribute . We take a sample X 1
X 2 X N if f is the sample proportion of items
00:02:22.370 --> 00:02:45.360
with attribute A , then expected value of
f is equal to p . This we have found out in
00:02:45.360 --> 00:02:52.780
the last class .
Today I start with that with a slightly more
00:02:52.780 --> 00:03:22.480
complicated problem. Suppose there are machines
M 1 and M 2 both produce
00:03:22.480 --> 00:03:46.061
items with proportion of defectives is equal
to p . So, M 1 produce items and M 2 produce
00:03:46.061 --> 00:04:05.680
items, they are all coming to the common pool
, and the manufacturer
00:04:05.680 --> 00:04:22.690
makes packets of 100 items .
So, when items being produced by machine 1
00:04:22.690 --> 00:04:30.810
, and items being produced by machine 2 are
coming to the common pool from there where
00:04:30.810 --> 00:04:40.699
the manufacturer is taking 100 items , and
make packets . He wants to know what is the
00:04:40.699 --> 00:05:15.349
distribution of defective items
in a packet ; packet means, packet of 100
00:05:15.349 --> 00:05:29.590
Suppose we want to find out this distribution
. The problem is we do not know
00:05:29.590 --> 00:05:57.810
how many items are there from machine one,
and how many from machine 2 . Only thing that
00:05:57.810 --> 00:06:34.300
we know if m items are from machine M 1 and
n items are from M 2, then m plus n is equal
00:06:34.300 --> 00:06:56.069
to 100 . This much we know .
So, we want to calculate
00:06:56.069 --> 00:07:20.159
probability number of defectives in a packet
is equal to k . Let us note that the total
00:07:20.159 --> 00:07:45.020
number of defectives
can be k in k plus 1 ways , 0 from M 1 and
00:07:45.020 --> 00:08:12.729
k from M 2 1 from M 1 and k minus 1 from M
2 up to k from M 1 and 0 from M 2 .
00:08:12.729 --> 00:08:25.169
So, the probability that there are k many
defective items in a packet can be computed
00:08:25.169 --> 00:08:35.320
by adding these k plus 1 individual events,
and some of them may have probability 0. So,
00:08:35.320 --> 00:08:58.930
let us see how to work out that therefore,
if x denotes the number of defective items
00:08:58.930 --> 00:09:19.370
in a packet , then probability x is equal
to k is ; number of ways of choosing 0 defectives
00:09:19.370 --> 00:09:44.970
from machine 1 into number of ways of choosing
k defectives from machine 2 , plus n number
00:09:44.970 --> 00:09:56.340
of ways of choosing one defective out of m
from machine one , multiplied by number of
00:09:56.340 --> 00:10:32.710
ways of choosing k minus 1 defectives from
machine 2 up to mck p to the power k .
00:10:32.710 --> 00:10:47.950
This we can write it as sigma x is equal to
0 to k mcx p to the power x q to the power
00:10:47.950 --> 00:11:14.130
m minus x multiplied by nc k minus x p to
the power k minus x into q to the power n
00:11:14.130 --> 00:11:26.910
minus k plus x ; is equal to, now if you look
at it , this comes out to be p to the power
00:11:26.910 --> 00:11:38.050
k which we can take out of the summation.
This comes out to be q to the power m plus
00:11:38.050 --> 00:11:47.440
n minus k so, that is also independent of
x . So, we can take it out of the summation
00:11:47.440 --> 00:12:06.810
, and then what we get x is equal to 0 to
k mcx into nck minus x . So, that is the expression
00:12:06.810 --> 00:12:17.330
that we get
So now we need to find out this term. What
00:12:17.330 --> 00:12:29.350
is summation over x from 0 to k mcx multiplied
by nc k minus x ? To do that let us consider
00:12:29.350 --> 00:12:47.010
1 plus x whole to the power n into 1 plus
x whole to the power n . So, this term is
00:12:47.010 --> 00:13:00.410
1 plus x whole to the power m plus n .
Now, let us expand both of them binomially
00:13:00.410 --> 00:13:43.010
. Therefore, if we consider the power of x
to the power 0 0 here which is mc 0 , and
00:13:43.010 --> 00:13:57.160
multiplied with power of x to the power k
here that is nck plus mc 1 into power of x
00:13:57.160 --> 00:14:08.290
to the power k minus 1 here which is nc k
minus 1 , and like that power of the coefficient
00:14:08.290 --> 00:14:17.620
of x to the power k multiplied by the coefficient
of x to the power 0 here. So, that is what
00:14:17.620 --> 00:14:30.170
we are getting as mck multiplied by nc 0
So, this is precisely the summation that we
00:14:30.170 --> 00:14:41.190
are talking about . If you look at it, it
is mc 0 into nck plus mc 1 into nck minus
00:14:41.190 --> 00:14:54.790
1 up to mck into nc 0 . So, this is the summation
that we get . And what it is ? So, this is
00:14:54.790 --> 00:15:04.590
coeff of x to the power k in 1 plus x whole
to the power m plus n .
00:15:04.590 --> 00:15:29.400
Therefore m plus nck is equal to mc 0 into
nck plus mc 1 into nc k minus 1 plus up to
00:15:29.400 --> 00:15:51.250
mck into nc 0 . Therefore, the final probability
00:15:51.250 --> 00:16:02.420
is equal to so, this was the term that we
have obtain. Therefore, final of probability
00:16:02.420 --> 00:16:16.970
x is equal to k is equal to m plus n c k multiplied
by p to the power k q to the power m plus
00:16:16.970 --> 00:16:27.300
n minus k .
Since m plus n is constant equal to 100 , we
00:16:27.300 --> 00:16:39.730
can write it as 100 ck p to the power k q
to the power 100 minus k . And this is true
00:16:39.730 --> 00:16:57.970
for k is equal to 0 1 up to 100 . Therefore,
what we get
00:16:57.970 --> 00:17:21.430
the number of defectives in each packet
of 100 follow binomial with 100 p . So, we
00:17:21.430 --> 00:17:29.270
get the distribution of the number of defectives
in each packet .
00:17:29.270 --> 00:17:48.809
Thus we observe the following
00:17:48.809 --> 00:18:30.350
we may need to infer about the sum of 2 random
variables , and we want to know it is probability
00:18:30.350 --> 00:18:47.909
distribution .
In particular, , if X 1 is a random variable
00:18:47.909 --> 00:19:01.820
which is binomial with parameters m and p
and X 2 is a random variable which is binomial
00:19:01.820 --> 00:19:23.119
with n comma p , then we find that
X 1 plus X 2 is binomial m plus n comma p
00:19:23.119 --> 00:19:47.090
; when X 1 X 2 are independent .
So, this is one result we have established
00:19:47.090 --> 00:19:57.761
. We can prove it in a slightly tricky way
if we make use of moment generating function
00:19:57.761 --> 00:20:19.269
.
We know that if X 1 is binomial with n comma
00:20:19.269 --> 00:20:35.139
p , then MGF of X 1 is equal to; if you remember
in the first class I have given you the formula
00:20:35.139 --> 00:20:56.080
it is q plus p e to the power t whole to the
power n . Similarly, if X 2 follows binomial
00:20:56.080 --> 00:21:14.950
with m comma p then MGF of X 2 at t is equal
to q plus p e to the power t whole to the
00:21:14.950 --> 00:21:22.090
power m .
Now, you must have seen 2 results in your
00:21:22.090 --> 00:21:40.070
probability class . So, you must have seen
these 2 results in your probability class
00:21:40.070 --> 00:22:04.749
that if x and y are independent then moment
generating function of x plus y t is equal
00:22:04.749 --> 00:22:21.510
to the product of their individual moment
generating functions . And the second result
00:22:21.510 --> 00:22:39.529
is that moment generating function if exists
then it is unique .
00:22:39.529 --> 00:22:51.100
There are distributions for each moment generating
function may not exist, but if it exists then
00:22:51.100 --> 00:23:10.360
it is unique . What does it mean? It means
that if MGF of X t is equal to MGF of y t
00:23:10.360 --> 00:23:32.830
for all t, then x and y have same distribution
.
00:23:32.830 --> 00:23:43.610
That is if x and y are 2 random variables
having the same moment generating function
00:23:43.610 --> 00:23:52.129
for all t . Then they have the same distribution
.
00:23:52.129 --> 00:24:20.629
By applying these 2 theorem , we can see that
moment generating function of X 1 plus X 2
00:24:20.629 --> 00:24:39.009
at t is equal to moment generating function
of X 1 at t into mg f of X 2 at t, and when
00:24:39.009 --> 00:24:48.919
they are binomial, then we can write it as
q plus p e to the power t whole to the power
00:24:48.919 --> 00:25:01.379
m multiplied by q plus p e to the power t
whole to the power n is equal to q plus p
00:25:01.379 --> 00:25:09.369
e to the power t whole to the power m plus
n with respect to the problem that I was dealing
00:25:09.369 --> 00:25:17.570
with .
And this is precisely the moment generating
00:25:17.570 --> 00:25:32.690
function
of binomial m plus np. So, from here also
00:25:32.690 --> 00:25:43.279
we can prove that if X 1 which is a binomial
random variable telling you the number of
00:25:43.279 --> 00:25:54.840
defectives in a sample m from machine M 1
and X 2 is the number of defectives in a sample
00:25:54.840 --> 00:26:05.139
size n from machine M 2 , then X 1 plus X
2 will have a distribution which is binomial
00:26:05.139 --> 00:26:13.360
m plus n p . So, the results that I have shown
you earlier by explicit summation can also
00:26:13.360 --> 00:26:22.059
be proved using moment generating function
. Subsequently, in many examples I may use
00:26:22.059 --> 00:26:44.730
these results directly .
Now, let us examine the example
00:26:44.730 --> 00:27:07.200
what is that catch ? That both M 1 and M 2
have the same proportion of defectives , if
00:27:07.200 --> 00:27:34.439
if M 1 has p one proportion of defectives
and M 2 has p to proportion of defectives
00:27:34.439 --> 00:27:45.519
,
00:27:45.519 --> 00:28:18.059
then we cannot get such a closed form . Therefore,
when we deal with summation of random variables
00:28:18.059 --> 00:28:26.609
, we will try to see cases. So, that we can
get a closed form for the distribution of
00:28:26.609 --> 00:28:53.960
the summation of the random variables .
Another point is independence ; when we are
00:28:53.960 --> 00:29:01.799
collecting samples from 2 different machines
as we have done in the previous example, we
00:29:01.799 --> 00:29:14.639
know that they are going to be independent
anyway . But suppose
00:29:14.639 --> 00:29:45.489
we are looking at sampling
from a population . If the sampling is with
00:29:45.489 --> 00:30:25.359
replacement, then effectively each of the
selection of items are independent .
00:30:25.359 --> 00:31:13.429
But if it is without replacement ; obviously,
the selection made up to ith unit chosen
00:31:13.429 --> 00:31:33.940
will govern the probability of selection at
the i plus 1 th stage . This is very clear,
00:31:33.940 --> 00:31:41.080
because it is without replacement therefore,
the items that I have chosen till the I th
00:31:41.080 --> 00:31:47.389
stage then when I am selecting the I plus
1 th element ; obviously, it will depend upon
00:31:47.389 --> 00:32:03.489
the items that I have selected so far . Therefore,
they are not
00:32:03.489 --> 00:32:31.669
truly independent .
But consider that the universe is very large
00:32:31.669 --> 00:33:15.489
, the probability of selection of a sample
whether with replacement or without replacement
00:33:15.489 --> 00:33:26.539
are very close .
So, for your benefit I have calculated some
00:33:26.539 --> 00:33:39.389
of these probabilities .
00:33:39.389 --> 00:34:06.529
Consider n is equal to 100 , and I am choosing
3 elements from the population . So, when
00:34:06.529 --> 00:34:24.869
n is equal to 100 under SRSWR that is simple
random sampling with replacement , the probability
00:34:24.869 --> 00:34:41.530
of a triplet is 1 by 100 into 1 by 100 into
1 by 100 , which is
00:34:41.530 --> 00:35:00.770
is equal to 0.00001 . Under SRSWOR this is
going to be 1 by 100 into 1 by 99 into 1 by
00:35:00.770 --> 00:35:22.750
98 . This probability is going to be 0.00000103
. So, if we look at their difference we find
00:35:22.750 --> 00:35:32.960
that the difference is coming in the 7th and
8h places of decimals .
00:35:32.960 --> 00:36:00.130
Similarly, for n is equal to 50 we get SRSWR
0.000008 SRSWOR it is going to be 0.0000085
00:36:00.130 --> 00:36:16.779
. So, again the
difference is coming at the 7th decimal place
00:36:16.779 --> 00:36:36.599
. If we make N even smaller then what we are
getting for SRSWR this is 0.000037 and for
00:36:36.599 --> 00:36:54.240
SRS WOR, this is is equal to 0.000041 .
So, even when the size of the population is
00:36:54.240 --> 00:37:02.099
as small as 30 , we find that the probabilities
are very, very close, they are differing only
00:37:02.099 --> 00:37:13.490
in the fifth decimal place so, that tells
us something . So, that tells us that if the
00:37:13.490 --> 00:37:25.600
population is even reasonably large we may
consider each sample to be taken from the
00:37:25.600 --> 00:37:31.870
population to be independent of the other
samples to be taken .
00:37:31.870 --> 00:37:44.130
So, what is the advantage? The advantage is
that we can use the independence of probability
00:37:44.130 --> 00:37:51.890
by multiplying the individual probabilities
to obtain the probability of selection of
00:37:51.890 --> 00:38:06.869
a particular sample of size n .
Another important point is that
00:38:06.869 --> 00:38:28.589
given a random variable x we may like to obtain
the distribution
00:38:28.589 --> 00:38:53.799
of function of x . For example, x is a
random variable with mean 0 . Therefore, what
00:38:53.799 --> 00:39:07.670
is the variance of x is equal to expected
value of x square . So, given that distribution
00:39:07.670 --> 00:39:18.069
of x we are trying to find out the expectation
of a function namely square of the random
00:39:18.069 --> 00:39:21.510
variable . For discrete case this may be simple
.
00:39:21.510 --> 00:39:53.380
For example, consider x which takes values
minus 2, minus 1, 0 1 and 2 , with probabilities
00:39:53.380 --> 00:40:25.510
1 by 4, 1 by 4, 1 by 6 , 1 by 6, 1 by 6 . Then
how is x squared distributed ?
00:40:25.510 --> 00:40:40.930
X square can take values only 3 possibilities
0 1 and 4 . Probability of 0 is x is equal
00:40:40.930 --> 00:40:49.339
to 0 therefore, it is 1 by 6 .
Probability x squared is equal to 1, that
00:40:49.339 --> 00:40:56.859
we can get in 2 different ways when x is minus
1 whose probability is 1 by 4 , and x is plus
00:40:56.859 --> 00:41:03.240
1 whose probability is 1 by 6 . Therefore,
probability x square is equal to 1 is 1 by
00:41:03.240 --> 00:41:24.800
4 plus 1 by 6 which is equal to 5 by 12 .
Similarly, probability x squared is equal
00:41:24.800 --> 00:41:37.050
to 4 will coming out to be 5 by 12 . Therefore,
from the distribution of x , we can get the
00:41:37.050 --> 00:41:49.140
distribution of x square to be 1 by 6, 5 by
12 for 1 and 5 by 12 for 4 . But now suppose
00:41:49.140 --> 00:42:01.140
I ask you given this distribution of x square
can you find the distribution of x then you
00:42:01.140 --> 00:42:09.500
see that you cannot . Because this tells me
that probability x is equal to minus 1 plus
00:42:09.500 --> 00:42:18.609
probability x is equal to plus 1 that sum
is 5 by 12 . But there is no way I know the
00:42:18.609 --> 00:42:25.520
decomposition of this into the 2 values which
are the respective probabilities of x is equal
00:42:25.520 --> 00:42:37.020
to minus 1 and x is equal to plus 1 .
The question is why we cannot do it in this
00:42:37.020 --> 00:42:49.460
case ? But we can do in some other case . Say
for example, x takes values minus 4 minus
00:42:49.460 --> 00:43:05.150
3 0 1 2 , and suppose their probabilities
are 1 by 4, 1 by 4 , 1 by 6, 1 by 6, 1 by
00:43:05.150 --> 00:43:18.849
6 . Then x square can take values 0 1 4 9
and 16 with probabilities 1 by 6, 1 by 6,
00:43:18.849 --> 00:43:31.550
1 by 6 , 1 by 4 and 1 by 4 .
In this case , I can get the probabilities
00:43:31.550 --> 00:43:55.130
of different values of x . This we can do
because the mapping
00:43:55.130 --> 00:44:13.099
from x to the function y ; where y is the
function of x
00:44:13.099 --> 00:44:26.650
is unique . Given x I know the x square, and
given x square I know which value of x has
00:44:26.650 --> 00:44:37.020
produced this value for x squared . Thus when
the mapping is unique it helps us to compute
00:44:37.020 --> 00:44:51.930
the pdf of a function hx of the random variable
x from the distribution of the original random
00:44:51.930 --> 00:45:01.289
variable namely x .
I appeal to the following theorem which many
00:45:01.289 --> 00:45:31.960
of you must have seen ; Let x be a continuous
random variable with pdf f of x , and suppose
00:45:31.960 --> 00:45:56.349
f x is greater than 0 on an interval a b .
Now, consider y is equal to hx , that is Y
00:45:56.349 --> 00:46:20.020
is a function of X ;
X is strictly monotonic . So, from x we are
00:46:20.020 --> 00:46:45.240
mapping it into a random variable Y which
is strictly monotonic . Assume hx is differentiable
00:46:45.240 --> 00:47:10.390
and hence continuous for all X , then pdf
the probability density function of Y is equal
00:47:10.390 --> 00:47:31.920
to H of x is given by
g of y is equal to f of x multiplied by dx
00:47:31.920 --> 00:47:48.380
dy. The mod value of that expressed in terms
of y .
00:47:48.380 --> 00:48:00.440
Now, why did we take strictly monotonic whether
increasing or decreasing . Suppose this is
00:48:00.440 --> 00:48:11.579
x and this is a strictly monotonic function
of x , then given y what is the probability
00:48:11.579 --> 00:48:21.410
that the random variable y is less than equal
to a particular value y? So, that probability
00:48:21.410 --> 00:48:37.470
will come from the probability of x . So,
that probability will be given by F of H inverse
00:48:37.470 --> 00:48:49.950
of Y . The strict monotonicity allows us to
make the inverse of Y and to get the value
00:48:49.950 --> 00:48:58.680
of x, and therefore, we can apply the cumulative
distribution function of x on that one .
00:48:58.680 --> 00:49:33.530
So, let me give a very quick proof of the
above statement . So, we have taken an increasing
00:49:33.530 --> 00:49:49.460
function . Therefore, G of y which is probability
y less than equal to y is same as probability
00:49:49.460 --> 00:49:59.880
H of X less than equal to y , and since it
is strictly monotonic we can get the inverse
00:49:59.880 --> 00:50:28.549
of y uniquely. And therefore, this is a fact
H inverse Y. Therefore, what is the pdf of
00:50:28.549 --> 00:50:40.349
y ?
00:50:40.349 --> 00:50:44.200
We get it by differentiating the cumulative
distribution function .
00:50:44.200 --> 00:51:17.150
So, this is d dy of f of H inverse y ; which
is
00:51:17.150 --> 00:51:36.599
multiplied by d H inverse y d y where x is
equal to H inverse y . Therefore, we get it
00:51:36.599 --> 00:51:53.529
that g of y is equal to f of x H inverse y
is equal to x into dx dy and this we express
00:51:53.529 --> 00:52:13.880
using y .
If G is monotonically decreasing , then what
00:52:13.880 --> 00:52:27.109
will happen ? G of y is equal to probability
Y less than equal to Y is equal to probability
00:52:27.109 --> 00:52:38.420
H of x less than equal to y is equal to probability
x greater than equal to H inverse y , this
00:52:38.420 --> 00:52:44.380
is because it is decreasing .
Therefore, inequality will be reversed is
00:52:44.380 --> 00:52:57.079
equal to 1 minus probability X less than equal
to H inverse y is equal to 1 minus F at H
00:52:57.079 --> 00:53:21.140
inverse y . Therefore, d G y dy is equal to
minus f at H inverse y which is f into d H
00:53:21.140 --> 00:53:29.230
inverse y dy , and we know that if it is a
decreasing function, then this derivative
00:53:29.230 --> 00:53:36.660
is going to be negative. Therefore, that multiplied
by this minus sign will give you the modulus
00:53:36.660 --> 00:53:59.650
of the derivative, right ? Or we can write
it as fx into mod of dx dy .
00:53:59.650 --> 00:54:12.880
So, if x is monotonically increasing or monotonically
decreasing? We can get the inverse, that is
00:54:12.880 --> 00:54:24.849
not the case. Say for example, H of x is like
this , then given a y I can get 2 different
00:54:24.849 --> 00:54:34.180
values of x and we cannot take the inverse
of y. In a similar way, if it is something
00:54:34.180 --> 00:54:43.329
like this although it is increasing, but on
this range the value is fixed then also for
00:54:43.329 --> 00:54:52.059
these values of y we cannot actually get the
inverse of the H function .
00:54:52.059 --> 00:55:01.849
And therefore, in these cases we have to apply
some tricks to partition it into non overlapping
00:55:01.849 --> 00:55:10.109
segments, and from there we will have to calculate
the pdf of the function of x. With that I
00:55:10.109 --> 00:55:21.789
stop here . In the next class I will use these
theorem, and I will show a case of this type
00:55:21.789 --> 00:55:33.930
to show how we can obtain the distribution
of some functions particularly of normal variable
00:55:33.930 --> 00:55:50.779
, ok. So, with that I stop here , see you
in the next class.
00:55:50.779 --> 00:55:58.739
Thank you .