WEBVTT
Kind: captions
Language: en
00:00:17.630 --> 00:00:26.329
Welcome students to my MOOCs online lecture
on statistical inference. I am planning to
00:00:26.329 --> 00:00:34.769
have about 20 lectures on this topic. And
this is the very first lecture of that series.
00:00:34.769 --> 00:00:45.600
It is assumed that the listeners of this course
have some background of basic statistics and
00:00:45.600 --> 00:00:52.570
basic probability distributions. In this course
of course, I will revise the probability part
00:00:52.570 --> 00:01:01.040
very quickly; and I will touch upon only those
aspects which are which will be used in course
00:01:01.040 --> 00:01:10.490
of my lecture on this series. This course
I expect will help under graduate students
00:01:10.490 --> 00:01:18.049
of statistics, maths and computing, computer
science etcetera and also basic science students
00:01:18.049 --> 00:01:25.859
at honors level to understand the basics of
statistical inference .
00:01:25.859 --> 00:01:35.130
Before I go into the topic let me first explain
what is statistics. As per Wikipedia, it is
00:01:35.130 --> 00:01:43.710
a branch of mathematics dealing with collection,
analysis, interpretation, presentation and
00:01:43.710 --> 00:01:55.259
organization of data. So, one thing is very
clear to us that statistics is something where
00:01:55.259 --> 00:02:07.409
we deal with data. This has become very important
in this era of big data, when data is in abundance
00:02:07.409 --> 00:02:20.750
and we need to learn from this data .
So, historically the term statistics was used
00:02:20.750 --> 00:02:40.200
first in English by Sir John Sinclair who
is the Scottish politician; and he was a prolific
00:02:40.200 --> 00:03:10.330
writer he has written 21 volumes on Statistical
Account of Scotland that was around the time
00:03:10.330 --> 00:03:27.180
1791 to 99 . So, it is more than 200 years
that the word statistics is being used in
00:03:27.180 --> 00:03:32.250
English.
Originally people think that the word has
00:03:32.250 --> 00:04:12.490
come from a German word Statistik which means
. So, at state level government was collecting
00:04:12.490 --> 00:04:33.560
data which was used by government and administrative
bodies
00:04:33.560 --> 00:04:47.820
so that was around middle of the 18th century
in particular it is around 1749 . And the
00:04:47.820 --> 00:05:12.650
term was coined by Gottfried . So, it
is more than 250 years old that the term statistics
00:05:12.650 --> 00:05:18.270
statistic or something related to that one
that is invoked.
00:05:18.270 --> 00:05:36.419
By the time 18th century
statistics more or less
00:05:36.419 --> 00:06:04.060
stand for systematic collection of demographic
and economic data
00:06:04.060 --> 00:06:23.330
by states .
The basic purpose was taxing and military
00:06:23.330 --> 00:06:28.800
so that is the basic historical background
of statistics.
00:06:28.800 --> 00:06:45.980
So, statistics essentially has two parts;
one is descriptive statistics
00:06:45.980 --> 00:07:04.540
which is basically to provide a summary of
the data .
00:07:04.540 --> 00:07:10.430
I am not going to discuss descriptive statistics
as I said that it is not the very first course
00:07:10.430 --> 00:07:20.699
of statistics. I assume people know some basics
of descriptive statistics which may mean data
00:07:20.699 --> 00:07:32.700
visualization
which is very very important for practical
00:07:32.700 --> 00:07:42.770
purpose. Because when you see it on a graph
in a 2D or 3D, one can get much better intuitive
00:07:42.770 --> 00:07:57.680
idea of the data. And we have scatter plot
you must be knowing all these things bar chart,
00:07:57.680 --> 00:08:17.919
pie chart, histogram, box-plot
these are basic data visualization techniques
00:08:17.919 --> 00:08:35.210
.
Also if you go for multivariate data, then
00:08:35.210 --> 00:08:50.200
one can visualize using many techniques, some
of them are constellation graph, one can think
00:08:50.200 --> 00:09:08.800
of Bi-plot, one can think of Chernoff Chernoff's
faces . Also apart from visualization, one
00:09:08.800 --> 00:09:31.940
can think of some basic properties such as
mean, median mode . These are the central
00:09:31.940 --> 00:09:55.810
tendency . Similarly, one can think of range,
variance, quartile deviation etcetera as a
00:09:55.810 --> 00:10:08.301
study study of dispersion . And similarly
one can think of higher order movements like
00:10:08.301 --> 00:10:19.370
skewness, kurtosis etcetera. These are the
techniques any statistician should learn for
00:10:19.370 --> 00:10:27.480
dealing with data because these are the basic
processing of the data to to understand what
00:10:27.480 --> 00:10:35.560
is going on there.
But statistics has an another purpose that
00:10:35.560 --> 00:10:58.990
is inferential statistics. We often want to
study
00:10:58.990 --> 00:11:27.070
some population parameters . For example,
you may likes to know the average income of
00:11:27.070 --> 00:11:47.520
the population of a city or state or country
.
00:11:47.520 --> 00:12:06.430
We can think of
the agriculture productivity, total water
00:12:06.430 --> 00:12:23.470
resources etcetera . How do you study this
when the population is huge, what when the
00:12:23.470 --> 00:12:30.600
universe is huge, it is not possible to check
each and every unit of it, and measure the
00:12:30.600 --> 00:12:40.330
relevant properties to come to overall
figure with respect to the universe.
00:12:40.330 --> 00:13:16.839
Here comes the utility of statistical inference
it is all about learning various parameters
00:13:16.839 --> 00:13:46.310
of a population . If the population is small,
one can actually study each and every individual
00:13:46.310 --> 00:14:05.870
unit and come to a conclusion about the population
typically that is called complete enumeration
00:14:05.870 --> 00:14:12.911
complete enumeration. So, you are looking
at all the members of the population, you
00:14:12.911 --> 00:14:19.950
are measuring the parameter that you are looking
for which may be a weight, which may be height,
00:14:19.950 --> 00:14:29.240
which may be income which may be age. Similarly,
you can think of the total volume of forestry
00:14:29.240 --> 00:14:36.430
in a country etcetera. If a population is
small then it can be done very easily.
00:14:36.430 --> 00:15:12.459
What if the population is very large, then
complete enumeration is not possible or it
00:15:12.459 --> 00:15:37.760
is time consuming. For example, if you look
at census data where surveyors actually go
00:15:37.760 --> 00:15:45.839
from household to household, and collect information
about individuals and the household; together
00:15:45.839 --> 00:15:59.279
it is expensive and time consuming and that
is the reason census is collected once in
00:15:59.279 --> 00:16:22.959
10 years . And the processing takes much more
time
00:16:22.959 --> 00:16:40.110
to publish the result . In practice that is
not always affordable you cannot afford 10
00:16:40.110 --> 00:16:50.110
years, 15 years to complete your study because
lot of planning economic or otherwise have
00:16:50.110 --> 00:17:14.830
to be done in a much shorter span of time.
Here comes the role of statistics . So, what
00:17:14.830 --> 00:17:35.169
statistics will do
take a sample from the population, process
00:17:35.169 --> 00:18:08.840
it, and use it to estimate population parameters
. So, instead of a huge population considering
00:18:08.840 --> 00:18:14.320
completely, we will take a representative
sample out of it, will process it. And from
00:18:14.320 --> 00:18:25.350
the results obtained after the processing
we will try to infer about the whole population
00:18:25.350 --> 00:18:34.320
this is the science of statistics and in this
series of lectures I will look into this aspect
00:18:34.320 --> 00:18:38.620
of statistics which is called statistical
inference .
00:18:38.620 --> 00:19:01.380
There are two basic approaches of statistical
inference ;
00:19:01.380 --> 00:19:18.110
parametric and non-parametric .
In this series, I will be focusing on parametric
00:19:18.110 --> 00:19:28.530
inference; non-parametric I am not going to
cover in these series of lectures parametric
00:19:28.530 --> 00:20:04.360
means the distribution pattern of the
property of interest is known; and our job
00:20:04.360 --> 00:20:25.280
is to estimate
00:20:25.280 --> 00:20:33.650
the parameters . So, here comes the concept
of probability, you must have studied different
00:20:33.650 --> 00:20:43.700
probability distribution, and you must have
had the background of that one I will assume
00:20:43.700 --> 00:20:50.660
that much knowledge from your side.
But to make it complete, I will first talk
00:20:50.660 --> 00:21:12.960
about some popular probability distributions.
As you all know they can be of two types primarily
00:21:12.960 --> 00:21:27.110
there can be mixed also, but I am not considering
that discrete and continuous. When the random
00:21:27.110 --> 00:21:35.860
variable takes discrete values, then we call
it a discrete random variable and corresponding
00:21:35.860 --> 00:21:43.690
distribution is a discrete probability distribution.
Otherwise if it is continuous range along
00:21:43.690 --> 00:21:51.610
the real line then we call it a continuous
random variable with respect to discrete random
00:21:51.610 --> 00:22:11.430
variable we associate probability mass function
in short we call it pmf of x where x is one
00:22:11.430 --> 00:22:19.400
possible value that the random variable can
take. So, a pmf the basic properties is that
00:22:19.400 --> 00:22:41.559
pmf of x is greater than equal to 0 for all
x and some of the values over all x is equal
00:22:41.559 --> 00:23:00.270
to 1. So, any discrete values, which are greater
than equal to 0 and this sum up to 1, we can
00:23:00.270 --> 00:23:12.730
in principle consider that to be a probability
mass function. It does not mean that any arbitrary
00:23:12.730 --> 00:23:22.210
selection of values which satisfy these properties
can be modeled with some natural phenomenon,
00:23:22.210 --> 00:23:30.400
but for mathematical treatment we can consider
that to be a valid pmf.
00:23:30.400 --> 00:23:40.559
Continuous random variable
we assume that it it is spread over the entire
00:23:40.559 --> 00:23:55.890
real line minus infinity to plus infinity.
And therefore, it does not make sense to assign
00:23:55.890 --> 00:24:07.730
a value to each one of them because thus total
probability has to be 1. So, in this case,
00:24:07.730 --> 00:24:26.490
we talk about probability density function;
in short pdf of x x belonging to R. So, if
00:24:26.490 --> 00:24:40.179
f x is a pdf, what if fx is a function which
is the probability density function then f
00:24:40.179 --> 00:24:49.460
x has to be greater than equal to 0 on R everywhere,
on R it is greater than equal to 0. And if
00:24:49.460 --> 00:24:58.870
you integrate it from minus infinity to infinity,
that has to be 1.
00:24:58.870 --> 00:25:36.169
As you know in case of continuous distribution,
it does not make sense to assign probability
00:25:36.169 --> 00:25:47.520
to any x. In fact, when we talk about continuous
random variable or a continuous distribution,
00:25:47.520 --> 00:26:04.940
we look at f of x is equal to probability
that the random variable
00:26:04.940 --> 00:26:13.340
less than equal to x which is obtained by
integrating the probability density function
00:26:13.340 --> 00:26:33.850
from minus infinity to x. And this is called
the cumulative distribution function . So,
00:26:33.850 --> 00:26:42.299
this is the cumulative distribution function.
And this makes sense, this gives you the probability
00:26:42.299 --> 00:26:51.659
that random variable is taking a value less
than equal to x .
00:26:51.659 --> 00:27:03.260
Corresponding to each random variable, we
can assign expected value of x which is sigma
00:27:03.260 --> 00:27:20.549
over x x p x, p x is the corresponding probability
mass function of x or it can be written as
00:27:20.549 --> 00:27:32.601
minus infinity to infinity x times f x dx,
where f x is the corresponding pdf of x. As
00:27:32.601 --> 00:27:43.980
you all know variance of x is defined as expected
value of X minus expected value of X whole
00:27:43.980 --> 00:27:52.710
square which can be written as the expected
value of X square minus expected value of
00:27:52.710 --> 00:28:00.850
X whole square .
Also you know there is something called moment
00:28:00.850 --> 00:28:13.740
generating function
which is called MGF of x for real value t
00:28:13.740 --> 00:28:26.390
is equal to expected value of E to the power
e x this is called moment generating function
00:28:26.390 --> 00:28:36.740
because from here we can generate all the
moments of the random variable. For example,
00:28:36.740 --> 00:28:45.669
first moment is expectation of x second moment
is expectation of x square like that .
00:28:45.669 --> 00:29:08.270
Now, let me revise some well known distribution
all of you know. But if you are forgetting
00:29:08.270 --> 00:29:15.299
something, then you try to recap. Also I am
not going to deal with all the discrete distributions
00:29:15.299 --> 00:29:23.429
that you might have studied, but it will be
good if you have a revision of those random
00:29:23.429 --> 00:29:37.730
variables as well . The first one that I look
at is binomial. It has two parameters n comma
00:29:37.730 --> 00:29:51.710
p, it takes values if x is a random variable
which is binomial with parameters n comma
00:29:51.710 --> 00:30:06.809
p then the possible values for x are 0, 1,
2 up to n. And probability x is equal to x
00:30:06.809 --> 00:30:18.350
that is the probability mass function at x
is equal to n c x p to the power x 1 minus
00:30:18.350 --> 00:30:27.840
p whole to the power n minus x. These all
of you should know. And you also know that
00:30:27.840 --> 00:30:39.059
expected value of x is equal to n p variance
of x is equal to n p into 1 minus p .
00:30:39.059 --> 00:31:00.500
And if x is binomial n comma p then its MGF
at t is equal to q plus p e to the power t
00:31:00.500 --> 00:31:22.610
whole to the power n, where q is equal to
1 minus p. Here n can be any integer greater
00:31:22.610 --> 00:31:34.659
than equal to 1, 0 less than p less than 1.
So, for so the binomial distribution is defined
00:31:34.659 --> 00:31:45.091
for all n greater than equal to 1 integers;
and for any value of p between 0 to 1. As
00:31:45.091 --> 00:31:53.690
you know that binomial distribution is used
to obtain, the distribution of the number
00:31:53.690 --> 00:32:06.169
of heads of a coin where probability of getting
a head is p. And if the coin is tossed n times
00:32:06.169 --> 00:32:20.309
what is the probability that one will obtain
x many heads. So, probability of x heads in
00:32:20.309 --> 00:32:31.570
n tosses right that probability is going to
be ncx p to the power x q to the power n minus
00:32:31.570 --> 00:32:46.190
x so that is only a model. In reality when
some experiment is going on n number of times
00:32:46.190 --> 00:32:54.150
where probability of success is p binomial
distribution gives you the probability of
00:32:54.150 --> 00:33:04.320
obtaining certain particular value.
For example, if a machine is producing some
00:33:04.320 --> 00:33:14.881
items say nut bolts, they can be defective
or they can be ok. Suppose, the probability
00:33:14.881 --> 00:33:26.110
of getting non-defective nut bolt is p and
that machine has produced 10,000 many nut
00:33:26.110 --> 00:33:34.490
bolts in a day. So, as a producer or manufacturer,
one may like to know how many of them are
00:33:34.490 --> 00:33:42.570
defectives or how many of them are non-defectives
of course, there is nothing guaranteed it
00:33:42.570 --> 00:33:50.059
is a probability it is a random event and
the probability can be estimated using binomial
00:33:50.059 --> 00:34:11.359
model.
The next one is Poisson distribution .
00:34:11.359 --> 00:34:24.020
This is also a discrete distribution and it
takes values 0, 1, 2, 3 up to infinity that
00:34:24.020 --> 00:34:49.220
means, it can take any non-negative integral
values. Poisson distribution can be used to
00:34:49.220 --> 00:35:00.359
model the number of arrivals when there is
a flow of incoming things. For example, the
00:35:00.359 --> 00:35:07.569
number of cars passing or say suppose there
is a conveyor belt which is carrying the material
00:35:07.569 --> 00:35:18.279
the items produced by a machine, and then
suppose the defective items are coming at
00:35:18.279 --> 00:35:27.859
a rate say 2 per minute, then what is going
to be the expected number of defective items
00:35:27.859 --> 00:35:37.540
if the machine runs for half an hour.
Again this is a random variable, it is not
00:35:37.540 --> 00:35:45.739
fixed, but the number of defective items that
are coming that will take different values
00:35:45.739 --> 00:35:56.539
different integral values and the probabilities
can be modeled using Poisson random variable,
00:35:56.539 --> 00:36:03.950
it should have one parameter lambda . If we
know the lambda then we should be able to
00:36:03.950 --> 00:36:12.049
know everything about the distribution. Probability
x is equal to x is equal to e to the power
00:36:12.049 --> 00:36:22.249
minus lambda lambda power x upon factorial
x x is equal to 0, 1 etcetera lambda greater
00:36:22.249 --> 00:36:30.150
than 0.
If you are recalling then you will be knowing
00:36:30.150 --> 00:36:44.549
that the expected value of x is equal to lambda;
variance of x is equal to lambda; and the
00:36:44.549 --> 00:36:51.410
moment generating function of x the point
t is equal to e to the power lambda into e
00:36:51.410 --> 00:37:00.650
to the power t minus 1. They are very easy
to compute as it is not a first course on
00:37:00.650 --> 00:37:07.589
probability, I am not computing it, but it
will be good if you revise these things . Another
00:37:07.589 --> 00:37:16.559
interesting point to note that for binomial
random variable, variance was npq or np into
00:37:16.559 --> 00:37:24.920
1 minus p. Since 1 minus p is less than 1,
we knew that the variance of a random binomial
00:37:24.920 --> 00:37:34.279
random variable is less than the expected
value. In this case, we can see that the expected
00:37:34.279 --> 00:37:40.180
value and the variance both are same.
Now, I will consider another discrete random
00:37:40.180 --> 00:37:50.869
variable which is called geometric random
variable .
00:37:50.869 --> 00:38:05.119
It also has one parameter p like binomial
here also we look at tossing a coin . And
00:38:05.119 --> 00:38:21.259
p is the probability of getting a head . So,
in geometric random variable, we want to study
00:38:21.259 --> 00:38:34.329
how much one has to wait to get one head.
So, probability x is equal to x is equal to
00:38:34.329 --> 00:38:53.739
q to the power x into p, where x is equal
to 0, 1, 2 like that . And it is 0 otherwise;
00:38:53.739 --> 00:39:02.049
that means, suppose the geometric random variable
takes a value two that means you have to make
00:39:02.049 --> 00:39:11.229
two tosses before getting a head. So, in the
first toss, you got a tail whose probability
00:39:11.229 --> 00:39:18.720
is q; in the second toss, you get another
tail whose probability is q. So, you have
00:39:18.720 --> 00:39:25.940
to wait for two tosses to get the head and
its probability is p. So, overall probability
00:39:25.940 --> 00:39:35.690
is q square into p for x is equal to 2.
This is the basics of geometric random variable
00:39:35.690 --> 00:39:45.910
and it is used for modeling the waiting time.
And as before I am giving you the values of
00:39:45.910 --> 00:39:55.979
expectation of X is equal to q by p; variance
of x is equal to q by p square. And moment
00:39:55.979 --> 00:40:11.999
generating function of x, I suggest that you
verify these results that will give you some
00:40:11.999 --> 00:40:18.170
practice of working on examples as in course
of time I will give you some assignments,
00:40:18.170 --> 00:40:25.449
where you will be needing some practice of
solving problems. And by solving this on your
00:40:25.449 --> 00:40:33.829
own you will get that required practice.
Now, let me look at some continuous random
00:40:33.829 --> 00:41:11.089
variables, the simplest one is perhaps uniform
a, b or on the real line there are two points
00:41:11.089 --> 00:41:24.480
a and b. And the distribution of the random
variable is uniform that means all values
00:41:24.480 --> 00:41:45.069
are equal likely therefore if we call it f
x, then f x is basically constant on a, b
00:41:45.069 --> 00:42:03.700
and 0 otherwise.
So, if the constant is c, then integration
00:42:03.700 --> 00:42:21.680
a to b c dx is equal to 1 or c into b minus
a is equal to 1 or c is equal to 1 upon b
00:42:21.680 --> 00:42:33.130
minus a. Therefore, a uniform distribution
on an interval a to b will have a constant
00:42:33.130 --> 00:42:42.750
density function is equal to 1 upon b minus
a you may be wondering why I am integrating
00:42:42.750 --> 00:42:49.930
only from a to b why not from minus infinity
to infinity that is because minus infinity
00:42:49.930 --> 00:43:01.559
to a in this region f x is 0 and also b to
infinity in this region f x is 0. Therefore,
00:43:01.559 --> 00:43:08.549
when we add, when you integrate them they
do not contribute to the overall integration.
00:43:08.549 --> 00:43:31.540
So, what is the mean of a uniform random variable
mean is equal to b plus a by 2 or since all
00:43:31.540 --> 00:43:43.390
the points are uniform, it is the midpoint
of this all right. The variance is equal to
00:43:43.390 --> 00:43:58.680
b minus a whole square upon 12 . And the moment
generating function of x at t is equal to
00:43:58.680 --> 00:44:11.930
e to the power bt minus e to the power at
upon t into b minus a, when t is not equal
00:44:11.930 --> 00:44:24.900
to 0, so that is the basic properties of uniform
distribution, the most well known continuous
00:44:24.900 --> 00:44:46.219
random variable is normal distribution.
Typically, we have mean 0 and variance equal
00:44:46.219 --> 00:45:20.719
to 1, then it is called standard normal distribution
pdf at x is equal to 1 over root over 2 pi
00:45:20.719 --> 00:45:30.449
e to the power minus x square by 2 minus infinity
less than x less than infinity. So, you see
00:45:30.449 --> 00:45:42.920
that this is one continuous random variable
that is defined over the entire real line
00:45:42.920 --> 00:45:52.309
. And if it is centered around 0 with variance
is equal to 1, it is called a standard normal
00:45:52.309 --> 00:46:00.349
distribution. Many of you are familiar with
a curve of this type which typically is used
00:46:00.349 --> 00:46:11.190
for standard normal.
The MGF of x t is equal to e to the power
00:46:11.190 --> 00:46:41.719
t square by 2. However, if a
normal distribution has mean equal to mu and
00:46:41.719 --> 00:46:55.089
variance equal to sigma square, then pdf is
equal to 1 over root over 2 pi sigma into
00:46:55.089 --> 00:47:03.999
e to the power minus x minus mu whole square
upon 2 sigma square ok. Mu can be any real
00:47:03.999 --> 00:47:09.599
number and sigma square being a variance of
course has to be positive. And this is going
00:47:09.599 --> 00:47:19.719
to give you the pdf . In this case, the MGF
is going to be
00:47:19.719 --> 00:47:36.119
e to the power mu t plus half sigma square
t square. If you put mu is equal to 0, and
00:47:36.119 --> 00:47:43.079
sigma square is equal to 1, you get the moment
generating function for the standard normal
00:47:43.079 --> 00:47:49.160
random variable .
Our next focus is on exponential distribution
00:47:49.160 --> 00:48:13.659
. This is also used for modeling sum arrival,
where lambda is typically the arrival rate
00:48:13.659 --> 00:48:23.930
. So, if x is a random variable which is following
exponential distribution with parameter lambda
00:48:23.930 --> 00:48:33.199
where lambda is greater than 0 then f of x
is equal to lambda e to the power minus lambda
00:48:33.199 --> 00:48:47.029
x for x belonging to 0 to infinity. It can
be proved, or I will rather ask you to prove
00:48:47.029 --> 00:49:01.319
that expectation of X is equal to 1 upon lambda;
variance of X is equal to 1 upon lambda square.
00:49:01.319 --> 00:49:13.619
And moment generating function of X at t is
equal to lambda upon lambda minus t, of course
00:49:13.619 --> 00:49:21.650
it will be valid if lambda is greater than
t .
00:49:21.650 --> 00:49:29.650
Before I close I will give you one more distribution
that will be used often in this course that
00:49:29.650 --> 00:49:56.489
is called gamma distribution . It has two
parameters; lambda greater than 0 and alpha
00:49:56.489 --> 00:50:07.969
greater than 0. And f of x is defined as lambda
power alpha upon gamma alpha e to the power
00:50:07.969 --> 00:50:23.140
minus lambda x x to the power alpha minus
1 0 less than . So, it is defined for non-negative
00:50:23.140 --> 00:50:33.690
x which is going from 0 to infinity; and this
is going to be the corresponding PDF.
00:50:33.690 --> 00:50:41.349
If we integrate this quantity e to the power
minus lambda x into x to the power alpha minus
00:50:41.349 --> 00:50:58.269
1 in the range 0 to infinity, then we can
write it as 0 to infinity e to the power minus
00:50:58.269 --> 00:51:06.609
lambda x, lambda x to the power alpha minus
1 into 1 upon lambda to the power alpha minus
00:51:06.609 --> 00:51:23.410
1 dx; I have used lambda power alpha minus
1. So, I have cancelled it put lambda x is
00:51:23.410 --> 00:51:36.869
equal to z, therefore, dz dx is equal to lambda
therefore, dx is equal to dz upon lambda.
00:51:36.869 --> 00:51:46.049
So, this now I can write it as as a is going
from 0 to infinity, z is also going from 0
00:51:46.049 --> 00:51:51.839
to infinity as lambda is positive.
So, it is 0 to infinity e to the power minus
00:51:51.839 --> 00:52:05.900
z, z to the power alpha minus 1 1 upon lambda
alpha minus 1 into dz upon lambda is equal
00:52:05.900 --> 00:52:13.640
to 1 upon lambda power alpha integration 0
to infinity e to the power minus z z to the
00:52:13.640 --> 00:52:26.849
power alpha minus 1 dz. If you remember your
mathematics this is the famous gamma integral,
00:52:26.849 --> 00:52:35.709
and this is actually gamma alpha.
Therefore, the whole integration boils down
00:52:35.709 --> 00:52:45.609
to gamma alpha upon lambda power alpha. So,
this is what we have obtained when I am integrating
00:52:45.609 --> 00:52:54.019
0 to infinity e to the power minus lambda
x x to the power alpha minus 1 dx . Therefore,
00:52:54.019 --> 00:53:06.209
if we multiply this by lambda power alpha
upon gamma alpha, then we get this quantity
00:53:06.209 --> 00:53:19.150
which integrates to 1. Therefore, lambda power
alpha upon gamma alpha e to the power minus
00:53:19.150 --> 00:53:26.579
lambda x x to the power alpha minus 1 is a
valid pdf.
00:53:26.579 --> 00:53:37.920
So, what are the mean and standard deviation
expectation of X is equal to alpha upon lambda
00:53:37.920 --> 00:53:48.789
. Variance of X is equal to alpha upon lambda
square. And moment generating function of
00:53:48.789 --> 00:54:00.369
X t is equal to lambda upon lambda minus t
whole to the power alpha where t is less than
00:54:00.369 --> 00:54:09.099
alpha ok. So, these are some of the basic
probability distributions or probability mass
00:54:09.099 --> 00:54:20.439
functions that I will be using during the
course. Also I will be using some distributions
00:54:20.439 --> 00:54:29.349
like chi square, t, f. In some of the subsequent
lectures, I will derive those distribution
00:54:29.349 --> 00:54:35.849
what do they mean because I will be using
them in later part of statistical inference
00:54:35.849 --> 00:54:43.109
when I will be doing testing of hypothesis.
Thank you for your attention to this course.
00:54:43.109 --> 00:54:43.400
Thank you .