WEBVTT
00:16.640 --> 00:25.340
Now, I given you this homework while discussing
the conjugate directions method in some of
00:26.970 --> 00:31.500
the, in one of the previous lectures. Now
I will solve this part of this homework this
00:31.500 --> 00:38.500
part, this part, but I will keep this for
you to try out. So, today I will begin by
00:38.870 --> 00:50.850
trying to solve. So, what I have to prove
is the conjugate directions and the gradients
00:50.850 --> 00:52.530
are perpendicular to each other.
00:52.530 --> 00:59.530
But, so what I have to really prove is a following
is g k, d i at any k is equal to 0 for all
01:05.470 --> 01:12.470
i for, for all i which is bigger than equal
to 0 or less than or equal to k strictly.
01:14.070 --> 01:21.070
What we do is that we use induction. So, by
induction what we show is a following that
01:27.140 --> 01:34.140
we show that let us assume g k, d i is equal
to 0 for all and to prove g k plus 1 d i is
01:55.610 --> 02:02.610
equal to 0 for, so that is exactly what we
have to do. So, you we have assume this now
02:06.500 --> 02:13.500
let us start working it out; let us observe
this fact. Now this one is very simple since
02:22.260 --> 02:29.260
g k is nothing but, H x k plus b. So, g k
plus 1 would be nothing but H x k plus 1 plus
02:32.220 --> 02:36.269
b, so b would get cancelled out and this is
what you will have.
02:36.269 --> 02:43.269
Now of course, you know that x k plus 1 is
x k plus alpha that is how you update using
02:46.860 --> 02:53.860
the conjugate gradient direction that is how
you update alpha k d k. So, then g k plus
02:56.730 --> 03:03.730
1 minus g k, now I can write this as alpha
k d k, so it will become alpha k H d k. Now
03:06.050 --> 03:13.050
I multiply both sides by take the inner product
both sides by d i right for i. So, then I
03:14.269 --> 03:21.269
will get g k plus 1, d i is equal to g k,
d i plus alpha k, d i H d k. Now I have to
03:32.510 --> 03:39.069
prove this fact. So, basically I have to prove
that for all 0 for all i equal from 0 to i
03:39.069 --> 03:46.069
equal to k this thing holds. So, first put
i is equal to k; imply that
03:58.129 --> 04:05.129
plus alpha k d k H d k. You know what is alpha
k, alpha k is already you have solved out
04:08.080 --> 04:13.500
alpha k. Now what we have to do is now here
I have to put the value of alpha k.
04:13.500 --> 04:20.500
So, alpha k is already known to me and that
is minus g k, d k, d k, H d k. So, if you
04:31.950 --> 04:38.950
do that, so then g k plus 1, d k is nothing
but g k, d k minus orientation, so that can
04:46.599 --> 04:53.599
following it - g k, d k. This and now here
also we have d k H d k which cancels up now
05:05.880 --> 05:10.419
H is positive definite matrix, so it will
this d k is non-zero, so these are all positive.
05:10.419 --> 05:17.419
So, this will cancel up and so we will be
left with g k, d k. Now once you have done,
05:27.969 --> 05:33.169
this is 0. So, you have proved for k, now
you have to prove for anything other than
05:33.169 --> 05:40.169
k. Now take consider i is strictly less than
k then for that g k plus 1, d i is g k, d
05:50.550 --> 05:57.550
i plus alpha k I think alpha k you will come
here d I, H d k. So, this i is strictly less
06:06.460 --> 06:12.300
than k now, so our d i is not i is not equal
to k. So, then by the fact that these are
06:12.300 --> 06:17.649
conjugate directions this would become 0 and
the fact that we have already assumed when
06:17.649 --> 06:22.830
we started the proof then this would also
become 0.
06:22.830 --> 06:29.830
So ultimately, so this proves the fact. Now
what is important to know that all these things
06:45.229 --> 06:48.940
that we have done, all the conjugate direction
or conjugate gradient method have been really
06:48.940 --> 06:55.940
applicable for convex problem that to with
H a convex quadratic problem with H positive
06:57.310 --> 07:03.680
definite is really strongly convex or strictly
convex quadratic problems. What about handling
07:03.680 --> 07:10.680
it for non-quadratic problems or anything
it is a for any sort of convex problems for
07:14.399 --> 07:17.219
example, can we do something with that.
07:17.219 --> 07:22.969
So, for that Fletcher reeves introduced a
method called the Fletcher Reeves method.
07:22.969 --> 07:29.969
We are just going to outline this method here.
Now let us write down the Fletcher Reeves
07:37.320 --> 07:44.320
method step by step. This can work for non-quadratic
problems also. Actually the shift from linear
08:00.159 --> 08:06.589
to non-linear is quite a difficult shift methods
you know they are not once you have non convex
08:06.589 --> 08:11.240
problems you have very less things available
to you. You just make a trial with whatever
08:11.240 --> 08:14.719
algorithms you have in your hand and see what
you have that that is the only thing that
08:14.719 --> 08:21.719
you can possibly do. And recent research there
are efforts to take those points and try to
08:22.279 --> 08:27.529
give some quantitative justification about
their behavior, but that is absolutely at
08:27.529 --> 08:30.959
the frontier of research.
So, we would not get into that issue at all,
08:30.959 --> 08:36.200
but I just want to recall and remind you that
non convex optimization by the way is very
08:36.200 --> 08:40.010
very hard, of course, there could be convex
optimization problems which are also not so
08:40.010 --> 08:45.970
easy to handle, but there are algorithms which
will support them. But for non convex problems,
08:45.970 --> 08:51.960
you just do not know in large cases what to
do. We will learn about sequential quadratic
08:51.960 --> 08:57.810
programming method methodology as we go along,
but you see that that has its own drawbacks
08:57.810 --> 09:04.810
as we come to it.
Now in Fletcher reeves method, so step one
09:07.660 --> 09:14.660
is to initialize your starting point
x 0 and tolerance epsilon. Step two is to
set k is equal to 0; step two, you should
09:39.270 --> 09:46.270
compute g 0 that is the gradient at x 0 and
set the first conjugate direction that is
09:49.040 --> 09:56.040
set. Step three - it was the most idealistic
stuff find alpha greater than zero, which
10:08.970 --> 10:15.970
minimizes
of course this minimization is done in a approximate
way. So, here you have one dimensional minimization,
10:22.620 --> 10:28.060
we have not spoken much about one dimensional
minimization in this course, possibly at the
10:28.060 --> 10:34.960
very end of this study about unconstraint
optimization, we will take up a one day to
10:34.960 --> 10:40.130
explain some very important one dimensional
minimization techniques. Because when you
10:40.130 --> 10:44.780
want to solve this problem where x k and d
k are fixed then this is nothing but a function
10:44.780 --> 10:51.770
in alpha. So, then basically you are talking
about one dimension minimization minimization
10:51.770 --> 10:56.200
of a one of a real function in real variables,
so f from r to r.
10:56.200 --> 11:03.200
Now once we have done that, so that is alpha
k. So, basically find alpha k I should say
11:08.590 --> 11:15.590
which minimizes this set your next iterate
to x k plus 1 and x k plus 1 is x k plus alpha
11:18.030 --> 11:25.030
k d k is that I cannot I really do not know
the solution. But when the distance between
11:30.730 --> 11:37.730
x k plus 1 and x k - these distance that distance
comes becomes very very small that is we are
11:40.820 --> 11:45.360
basically coming near the solution. Like if
you had thought about steepest descent, so
11:45.360 --> 11:52.360
there was this is your this is your level
curves, suppose say what would happen level
12:01.330 --> 12:06.420
curves means these are the function values
say f x y equal to c, because we are in two
12:06.420 --> 12:12.800
dimensional set up you can see it very well.
So, you are here, so you take one direction
12:12.800 --> 12:17.420
and you moved here then you took another direction
and you moved here then you are moving there
12:17.420 --> 12:22.670
then you are moving there and then you are
moving there and then here then here then
12:22.670 --> 12:29.670
here then here. So, as go near the solution,
these distance also decreases.
12:30.620 --> 12:34.960
Let me also give an explanation that so if
you thing is actually taking you to the solution
12:34.960 --> 12:41.960
then this is what should happen that is if
x k is going to the solution x star then this
12:42.590 --> 12:47.300
is what should happen. Because then what I
can do is that I can write x k plus 1 minus
12:47.300 --> 12:54.300
x star this is equal to plus x star minus
x k, and this is nothing but less than x k
12:58.810 --> 13:05.810
plus 1 minus x star plus x star minus x k.
And this all also this whole thing now goes
13:08.029 --> 13:15.029
to 0. So, if I can show that the distance
between these two consecutives one’s becomes
13:15.440 --> 13:20.810
very very small, basically I am trying to
show that it is a essentially in some sense.
13:20.810 --> 13:25.620
So, if I can show that then I know that it
will converge that it will actually go to
13:25.620 --> 13:32.620
towards the solution. So, for sufficient for
k sufficiently large, we can show that this
13:38.880 --> 13:45.880
is true this distance is very very small then
I am almost near the solution then I can stop
13:46.010 --> 13:53.010
there.
So, what I do now x k plus 1 minus x k is
13:54.410 --> 14:01.410
equal to alpha k d k, and norm x k plus 1
minus x k is equal to alpha k d k. So, basically
14:04.670 --> 14:09.589
you really do not have to bother about x k
plus 1 at very first, you just take alpha
14:09.589 --> 14:16.589
k d k check if norm alpha k d k is strictly
less than epsilon - step 4. So, there are
14:25.960 --> 14:32.960
two answers to it yes and no. So, if it is
yes, stop and take x k plus 1 as the solution
14:46.930 --> 14:53.930
approximately, as the Approximate solution
I should write. If it is no, then do the following
15:09.690 --> 15:16.690
there comes step five. If k is equal to n
minus 1, set x 0 is equal to x k plus 1 and
15:37.779 --> 15:44.779
go back to step 2. If k is equal to n minus
1, so we have not reached our solution in
15:57.110 --> 16:04.110
n minus 1 steps then we have to again restart
the procedure. If not else compute g k plus
16:09.580 --> 16:16.580
1 and beta k is equal to g k plus 1 g k plus
1 g k g k that is nice one and then you compute
16:32.550 --> 16:38.899
the next direction d k plus 1 because from
there the x k plus 1 now you have to go to
16:38.899 --> 16:45.120
x k plus 2. So, you are computing the d k
plus 1 is equal to as minus g k plus 1 in
16:45.120 --> 16:51.170
the same way we have done for the power this
hastiness conjugate gradient method plus b
16:51.170 --> 16:55.170
k to d k.
And you see now once you have that what you
16:55.170 --> 17:02.170
can do is set k is equal to k plus 1 and go
back to step three
or rather assign k equal to k plus 1. So,
17:16.579 --> 17:22.920
this is an approach see why this k equal to
n minus 1, because in the quadratic case when
17:22.920 --> 17:29.920
you have positive definite hessian in n minus
1 steps you are basically getting the solution.
17:33.550 --> 17:39.380
So, in n iterations basically n iterations
you are getting the solution, and here I am
17:39.380 --> 17:44.000
in n minus 1 eth step and I am yet to get
the solution. So, I have to reset x naught
17:44.000 --> 17:51.000
as x one and taking up the starting point
and start the whole procedure again. Because
17:53.820 --> 18:00.260
at k is equal to n minus 1, I have not got
the solution this is some sort of slight restriction
18:00.260 --> 18:07.179
that is there, but k equal to n, I am suppose
to get the solution and I started from x 0.
18:07.179 --> 18:14.179
So, I have x 0 then in n steps, so x n minus
1 would be my just a moment…
18:20.960 --> 18:27.020
So, as we have already seen in our previous
studies that in n steps in the case of when
18:27.020 --> 18:34.020
H is positive definite, in n steps we are
coming to the solution. So, this we can come
18:35.270 --> 18:42.270
to the solution in just n steps. We start
with x 0, and x n would be x star. So, here
18:48.160 --> 18:55.160
when k is you can say, why k should not be
n minus 1, but I have started with k equal
18:55.590 --> 19:02.590
to 0 and I have come to k equal to n minus
1 which I have, so I have taken n steps right
19:02.920 --> 19:07.450
0 th step first step second step third step
n eth step. n minus 1 is the n eth step in
19:07.450 --> 19:14.010
this case. So, I have reached the n eth step.
So, even if I have reached the n eth step,
19:14.010 --> 19:21.010
so x n should be my solution. So, here I have
reached the n eth step, but I have not got
19:22.380 --> 19:27.570
the solution that that is the if k is that
that is the whole idea if k is equal to n
19:27.570 --> 19:32.200
minus 1. So, I have already taken n eth k
is from 0 to n minus 1, I have taken n step,
19:32.200 --> 19:39.050
this n eth step and 0 th iteration the first
iteration 0 th iteration first iteration second
19:39.050 --> 19:43.059
iteration k n minus 1.
So, basically I have taken n steps I have
19:43.059 --> 19:50.059
got my come to my n minus 1 th iteration in
next iteration, I am suppose to get the solution.
19:50.230 --> 19:57.230
But at k equal to n minus 1, if I do not get
the solution then I really have to restart
19:59.350 --> 20:04.900
the procedure that because we are handling
non convex problems, we are not sure whether
20:04.900 --> 20:10.100
at nth step we are going to get a solution.
We have non-quadratic problem with we do not
20:10.100 --> 20:14.800
know anything about the of functions nature.
So, then a little bit of extra caution is
20:14.800 --> 20:19.580
kept here by putting k is equal to n minus
1 and then again restarting the whole procedure
20:19.580 --> 20:26.580
from x not equal to x k plus 1. Now we are
going to end our discussion of conjugate directions
20:28.740 --> 20:33.710
and conjugate gradient method which is very
important class and then we will go into in
20:33.710 --> 20:40.710
a very slightly interesting problem called
the least square problem. Now what does this
20:48.549 --> 20:55.549
least square problem means? So, least square
problems is how optimization can help you
20:55.770 --> 21:01.020
in solving, how optimization can help you
in solving equations.
21:01.020 --> 21:08.020
And let us just see what can we do about it
one to I have a m cross n matrix, and I want
21:10.049 --> 21:17.049
to solve this equation. So, A is m cross n
matrix, and x is a n vector, and b is in R
21:32.220 --> 21:39.220
m . So, now, this need not have unique solution,
it depend on the relation between m and n.
21:40.490 --> 21:45.000
So, in general, it can have many solutions,
and it is not see I do not know that is a
21:45.000 --> 21:48.820
there is a because m is not equal to n, and
I have no idea about the inevitability in
21:48.820 --> 21:53.280
this case. And, so I cannot really figure
out what is the solution so easily, it is
21:53.280 --> 21:59.350
not so easily to figure out one solution even.
If b is 0 then x can be 0 then one of the
21:59.350 --> 22:06.350
solution; if b is non 0 I do not know. How
do I try to attempt to solve this sort of
22:06.530 --> 22:13.530
system of equation, because these things come
of very much in applications. See what I can
22:13.740 --> 22:20.740
prefer to do instead of trying to solve it,
I construct this function
22:32.350 --> 22:39.350
is called the residual function. So, if I
take n x, I take any x in R n and put here
22:42.590 --> 22:49.590
in and multiply it with A, then if A x is
not equal to b if it is not the solution then
22:50.179 --> 22:56.950
A x minus b this vector is a non zero vector,
and then the norm of that would be non zero.
22:56.950 --> 23:03.950
So, if x is not a solution, then
23:20.880 --> 23:27.880
so this quantity is called a residual, this
quantity sometimes for the more terminology
23:30.940 --> 23:37.940
loving b person is called a residual. So,
x is not a solution then
this will happen. So, if x is a solution A
23:44.960 --> 23:50.660
x minus b is would be equal to 0, and then
that x would actually be the minimum of this
23:50.660 --> 23:56.600
problem, because this is always greater than
equal to 0. For any x for which A x minus
23:56.600 --> 24:03.600
b is a x is A x is equal to b that x must
be a solution of the minimization of this
24:03.610 --> 24:10.610
problem over R n. So, if x star solves A x
equal to b, if and only if x star solves the
24:36.620 --> 24:43.620
problem minimize r x, x element in R n where
r x is in this case now how do you… So,
25:02.429 --> 25:09.390
if I want to solve this problem, I can actually
solve the this minimization problem, but how
25:09.390 --> 25:16.390
do I prove that. If x star solves A x equal
to b then naturally A x star is equal to b
25:28.130 --> 25:35.130
and r x star is equal to 0, hence r x is greater
because r x is greater than equal to 0 for
25:39.970 --> 25:46.970
all x, so r x star, so x star minimizes r
x. Now if x star minimizes r x, how do I show
26:04.049 --> 26:10.419
that you have this as a solution, if x star
minimizes r x then that x star would solve
26:10.419 --> 26:14.700
A x star equal to b.
26:14.700 --> 26:21.700
So, if x star minimizes suppose x star minimizes
26:44.470 --> 26:51.470
r x then what would happen, I can write that
the gradient
of this problem must be 0. So, I have already
27:00.760 --> 27:07.760
know that x star is minimizing r x. So gradient
of f x star, so that would give me A transpose
27:14.679 --> 27:21.679
A x star minus b is equal to 0 or A transpose
A x star is equal to A transpose b that is
27:44.780 --> 27:51.780
what it gives me. What I have found here is
actually a critical point, so if this happens
27:52.590 --> 27:59.590
then this is what will happen, then can I
say here that A x star is equal to b from
28:06.289 --> 28:13.289
this can I say that A x star is equal to b
which looks quite clear. Now suppose A is
28:17.070 --> 28:24.070
of full rank, A is of full column rank then
A transpose A is actually invertible then
28:30.030 --> 28:37.030
you can write x star if a is of rank n that
is a is of full column rank if a is of rank
28:37.450 --> 28:44.450
n
28:50.480 --> 28:57.480
then x star is equal to
29:05.929 --> 29:12.929
A transpose b.
Now if I can get this, so if x star solves
29:21.620 --> 29:28.620
this problem then grad f x star must be equal
to 0 and x star must have this form. Then
29:29.970 --> 29:36.250
if A is of rank n then x star must have this
form then what I have if a x star then A of
29:36.250 --> 29:43.250
x star is equal to A of A transpose A inverse
A transpose b. See because A is of rank n,
29:53.450 --> 29:57.000
A transpose A becomes a positive definite
matrix that is very important and that is
29:57.000 --> 30:04.000
why it is invertible. So, if A is of rank
n
30:11.500 --> 30:18.500
is a positive definite is a positive definite
matrix. So, we prove this in the homework,
30:20.610 --> 30:27.610
so this is your homework. Now what is A x
star. So, let me see what happens. So, now,
30:47.600 --> 30:54.600
this can be written as A A inverse A transpose
inverse where A b inverse is b inverse A inverse.
30:57.679 --> 31:04.150
So, this is A inverse A transpose inverse
A transpose inverse b, this is identity and
31:04.150 --> 31:09.820
this is identity. So, you have A x star is
equal to b. So, I will now modify what I have
31:09.820 --> 31:12.190
written.
So, this is how you discover things in mathematics
31:12.190 --> 31:17.380
by trying it out. So, I have not written it
purposefully, but what I have now done what
31:17.380 --> 31:24.380
I have written, I have written that if A is
of rank n, let so my result is let a be of
31:29.470 --> 31:36.470
rank n of full column rank. Then x star solves
A x equal to b if and only if x star solves
31:41.419 --> 31:48.419
this problem. Then so you see then now we
have the full result now this is my result.
31:55.510 --> 32:02.510
So, basically, now if I want to solve this
problem, what I will first do is I will first
32:07.030 --> 32:14.030
find a point like this, and then if A is of
rank n, and if I am now trying to minimize
32:19.030 --> 32:26.030
this problem. If I find a critical point of
this problem, any critical the critical point
32:27.539 --> 32:33.870
of x star if A is of rank n critical point
of r x I have written should be r x critical
32:33.870 --> 32:38.630
point of r x is of this form. And we want
to show that this critical point is actually
32:38.630 --> 32:45.630
a minimum this critical point is actually
a minimum that is very very important to show.
32:45.730 --> 32:52.730
So, r of x is norm A x minus b whole square.
So, take any x in R n, now once you know this,
33:22.220 --> 33:29.220
you can write this as x star minus b. Now
this term would be 0, because of this fact
34:13.220 --> 34:20.220
because we know that if x star is a critical
point, so the critical point of this least
34:26.159 --> 34:31.639
square problem if I minimize this if I find
this critical point when a rank of A is n
34:31.639 --> 34:38.639
then that critical point is actually a solution
of this problem. And we have shown that that
34:41.279 --> 34:48.279
critical point is actually solving the original
problem A x star equal to b. So, it is very
34:48.940 --> 34:53.409
nice that I can actually in an exact way I
can find I mean exact analytic expression
34:53.409 --> 34:57.859
for this and that is a beautiful part which
you do not always have in optimization. So,
34:57.859 --> 35:04.859
if x star is a critical point in then what
you have, so the derivative I have again made
35:04.910 --> 35:09.999
a small mistake in my writing, but please
forgive me for this, because this is something
35:09.999 --> 35:13.509
you can figure out because here instead of
A it should be r, because we are trying to
35:13.509 --> 35:19.319
find the critical point of r.
35:19.319 --> 35:26.319
So, then grad r x star is equal to 0 and again
I will leave it as a homework for you to figure
35:30.749 --> 35:37.749
out that the derivative of r x that is it
is your homework to figure out that the gradient
35:37.799 --> 35:44.799
of r x is nothing but A transpose A x minus
b. So, this implies A transpose A x star minus
36:02.589 --> 36:09.589
b is equal to 0. Now if you look at the expression
2 or A x minus A x star. So, let me just forgot
36:14.049 --> 36:21.049
out this two. So A x star minus b and this
is nothing but A into x minus x star while
36:25.529 --> 36:32.529
matrix property and A x star minus b and that
is equal to x minus x star A transpose A x
36:37.009 --> 36:44.009
star minus b and that is you already know
it is 0, so this is 0 . So, now, this is equal
36:44.180 --> 36:51.180
to 0 and this is greater than equal to 0,
so what we have proved that we have proved
36:55.859 --> 37:02.019
that r x is greater than equal to r x star
is this part.
37:02.019 --> 37:09.019
So, this is your r x star. So, hence x star
is the minimum of r x. So, any critical point,
37:18.900 --> 37:25.680
there is one critical point, the critical
point of this function is actually minimum
37:25.680 --> 37:32.680
of this and hence and is also a solution of
the equation A x star A x equal to b provided
37:34.089 --> 37:39.349
A is of rank n. So, this is a very very important
requirement of rank n. So, it is some little
37:39.349 --> 37:43.749
bit of extra things that we have if you put
in some little assumption things flow in a
37:43.749 --> 37:50.749
much more interesting way. So tomorrow we
will start in the next class by talking to
37:51.759 --> 37:56.680
you about the Gauss Newton method and talking
to you about the least square problem in a
37:56.680 --> 38:01.359
more general way. This is just an example
to show you the importance of the least square
38:01.359 --> 38:07.029
problem, how optimization can be used even
to solve a equation this linear equation.
38:07.029 --> 38:13.119
So, and then we will discuss a bit about Gauss
Newton method and after that we will start
38:13.119 --> 38:19.309
talking about the Quasi-Newton method and
then return to theory for a while for and
38:19.309 --> 38:24.999
then get into return to the theory for a while
for the unconstrained case and get into the
38:24.999 --> 38:29.019
study of the celebrated Kuhn Tucker conditions.
Thank you.