WEBVTT
00:20.240 --> 00:27.240
This is a course on intelligent control. Today,
we will have the lecture 1 of module 1, neural
00:28.270 --> 00:35.270
networks. What is a neural network? What you
are looking at is human brain, the structure
00:38.080 --> 00:45.080
of the human brain. It may appear in the beginning
or at the very outset that it is a very homogenous
00:48.500 --> 00:55.500
substance. But in reality, it has various
diversities.
00:56.510 --> 01:03.510
Each section of the brain functions differently.
They have different functionalities and the
01:10.760 --> 01:17.740
field of neural networks, the artificial neural
network, although may not have any connection
01:17.740 --> 01:24.740
with the real neural network in the real sense,
nonetheless, the field of artificial neural
01:30.979 --> 01:37.979
network is inspired by the studies in real
neural network and real neural networks are
01:40.060 --> 01:47.060
sitting on this brain. There are very complex
connections among the neurons of this brain
01:51.899 --> 01:58.899
that you are looking at. Researchers over
time have developed various models of the
02:03.899 --> 02:10.899
neural networks just by studying this human
brain.
02:13.370 --> 02:20.370
Look at the complex structure of the connections
among neurons. We can see, these are all neurons
02:23.100 --> 02:30.100
and each neuron is connected to other neurons
through these internal connections which we
02:32.570 --> 02:39.570
will call as dendrites and the whole structure
is very complex.
02:40.860 --> 02:47.860
What is a neuron? A neuron is the basic processing
unit in a neural network sitting on our brain.
02:53.820 --> 03:00.820
We can see that this is the structure of neuron
where, this is the nucleus, this is the cell
03:02.560 --> 03:09.560
body and these are all dendrites. This is
the Axon. The Axon is like an output node
03:13.060 --> 03:20.060
of the neural network and these are all inputs.
They carry signals from other neurons. Such
03:23.570 --> 03:30.570
neurons, when they are connected with other
neurons, they give rise to processing of information
03:40.800 --> 03:47.800
in a unique way inside the brain. When one
neuron is connected with another neuron, they
03:49.050 --> 03:52.580
are connected through synaptic junctions.
03:52.580 --> 03:59.580
Again the dynamics of this synaptic junction
is complex. We can see the signal inputs from
04:03.070 --> 04:10.070
the action of a neuron and through synaptic
junction an output is actuated which is carried
04:15.660 --> 04:22.660
over through dendrites to another neuron.
Here, these are the neurotransmitters. We
04:30.039 --> 04:37.039
learned from our experience that these synaptic
junctions are either reinforced or in the
04:41.499 --> 04:48.499
sense they behave in such a way that the output
of synaptic junction may excite a neuron or
04:54.280 --> 05:01.280
inhibit the neuron. This reinforcement of
the synaptic weight is a concept that has
05:03.370 --> 05:07.460
been taken to artificial neural model.
05:07.460 --> 05:14.460
The brain that we showed consists of 10 billion
nerve cells or neurons. Each neuron is connected
05:18.439 --> 05:25.439
to other neurons through about 10,000 synapses.
It may not be an exact figure, but estimation.
05:32.550 --> 05:39.550
While saying so, let us look at some of the
properties of the brain or neural networks
05:41.289 --> 05:48.289
inside the brain. The properties tend to degrade
gracefully under partial damage. A part of
05:51.430 --> 05:58.430
the brain, as we saw before, if this part
of the brain becomes dysfunctional, then another
06:04.020 --> 06:11.020
part of the brain takes over the function
of this. The damage of this part of the brain
06:11.930 --> 06:18.930
will not completely destroy the total functionality
of the brain. The brain will continue to function
06:22.930 --> 06:29.930
and while functioning, the healthy part of
brain tries to take over the functionality
06:31.270 --> 06:38.270
of the damaged part. This is a very important
property of the real neural network or a real
06:38.309 --> 06:45.309
brain. Second one, it can be learnt from experience;
it is one of the key properties of the brain
06:47.189 --> 06:54.189
that through synaptic junction reinforcement,
the brain learns. As I said, healthy portions
07:02.229 --> 07:09.219
of brain learn to take over the functions
previously carried out by the damaged part.
07:09.219 --> 07:14.990
The other property is that it performs massively
parallel computations extremely efficiently.
07:14.990 --> 07:21.990
For example, complex visual perception occurs
within less than 100 milliseconds. Imagine
07:24.159 --> 07:31.159
an artificial machine or a computer trying
to remember or trying to recollect an image
07:36.680 --> 07:43.680
or trying to recognize the image. It takes
a long time, whereas the brain does this job
07:47.539 --> 07:54.539
within an order of 100 milliseconds. This
is something unique to the brain as to how
07:56.900 --> 08:03.900
parallel massive computational activities
are being carried on. As I said earlier, these
08:09.330 --> 08:14.460
massively parallel computations are because
of this complex, parallel connection amongst
08:14.460 --> 08:21.460
neurons. The final one is it supports our
intelligence and self-awareness.
08:22.080 --> 08:29.080
Nobody knows the actual basis of intelligence
and self-awareness. All of you must be very
08:35.419 --> 08:42.419
aware about your own experience. When you
try to solve a problem, at some point of time
08:42.690 --> 08:49.690
you solve the problem and at some other point
of time, you fail to solve it. From where
08:50.040 --> 08:57.040
does the intelligence come to solve the problem?
This is something we have still not answered.
08:58.180 --> 09:05.180
Although scientists believe this brain is
the source of intelligence, but we are not
09:07.610 --> 09:14.610
very clear where this intelligence sits. Similarly,
self-awareness: when I solve this problem,
09:16.079 --> 09:23.079
I am very much aware about the problem that
I am going to solve, or when I see an image
09:23.550 --> 09:30.550
I am very clear that I perceive the image.
These are the issues that have not been answered
09:36.769 --> 09:43.769
even in a real neural network. Nobody knows
where from intelligence comes and where self-awareness
09:44.630 --> 09:46.350
comes from.
09:46.350 --> 09:53.350
Now, I have given a comparison between a brain
and computer. As I said, the brain has 10
09:59.980 --> 10:06.980
to the power 10, approximately, nerve cells
or neurons and each neuron is connected to
10:09.610 --> 10:16.610
other neurons through around 10 to the power
4 synaptic weights. In that way, the total
10:18.440 --> 10:24.019
number of synaptic weights sitting in our
brain is approximately 10 to the power of
10:24.019 --> 10:31.019
14. This is approximate, not exact figure;
whereas in our computer, we have transistors
10:31.560 --> 10:38.560
whose order is 10 to the power 8. The element
size of synapses is 10 to the power minus
10:39.850 --> 10:46.850
6 whereas, in computer the transistor size
also 10 to the power minus 6 units. The energy
10:51.860 --> 10:58.860
that is being consumed by our brain is 30
watts, whereas the CPU in a computer also
10:59.310 --> 11:06.310
consumes 30 watts. But the contrast is the
processing speed. The normal frequency of
11:09.790 --> 11:16.790
the neuron, the operational frequency of neurons
is 100 hertz, whereas the operation of the
11:21.550 --> 11:28.550
CPU is in the form of Giga hertz; pretty fast.
But as I said, although the computer has such
11:31.550 --> 11:38.550
fast speed, a traditional computer, however,
when it comes to certain processing like pattern
11:41.959 --> 11:47.440
recognition and language understanding, the
brain is very fast.
11:47.440 --> 11:54.440
Although, present day computers has almost
taken over popular human operation in terms
11:57.810 --> 12:04.810
of multiplication, arithmetic calculation
and problem solving, there are certain fields,
12:07.269 --> 12:14.269
like pattern recognition, language understanding,
where brain is very fast. When you compare
12:14.449 --> 12:21.449
in terms of learning, the ability to learn
is very much there in human brain. We have
12:23.800 --> 12:30.800
very little idea about how we learn. We have
some ideas, but not in completeness, like
12:34.720 --> 12:41.720
when a small baby grows the way she learns
things fast. Probably, we have now developed
12:44.220 --> 12:51.220
a robot, an over artificial machine that learns
like a little baby that grows and learns.
12:53.420 --> 13:00.209
In that sense, if you compare, we have not
developed a machine that really learns like
13:00.209 --> 13:07.209
the way we learn. As I said, intelligence
and self-awareness, first of all we do not
13:09.720 --> 13:16.720
understand these subjects as to how they come
about although they are there with us. But
13:19.459 --> 13:22.810
these features are absent in an artificial
machine.
13:22.810 --> 13:29.810
We talked about the brain. The real brain
of the human, it is not homogeneous. You know
13:31.410 --> 13:38.120
it has various regions. The cortex, midbrain,
brain stem and cerebellum are the various
13:38.120 --> 13:45.120
parts of the brain. Each part of the brain
has many regions. If we take cortex, I think
13:45.870 --> 13:51.949
this is a major processing area in the brain.
In the cortex, you have the visual cortex,
13:51.949 --> 13:58.529
the auditory cortex and like that there are
various regions. In each region there are
13:58.529 --> 14:05.529
many areas. For example, one of the regions
of the cortex is the visual cortex. It has
14:06.540 --> 14:13.540
been studied in detail by researchers and
they have found out around 10 to 11 processing
14:16.420 --> 14:23.420
stages in this visual cortex. This is the
most studied region in the human brain, because
14:26.819 --> 14:33.290
visual processing has attracted the attention
of the researchers at large.
14:33.290 --> 14:40.290
These stages they have identified. There are
10 stages and they found that there are connections
14:42.759 --> 14:49.759
called feed forward. For example: stage 1,
stage 2, stage 3 and stage 4 and so on. We
14:50.709 --> 14:56.240
have been able to identify that there are
connections in a feed forward manner; stage
14:56.240 --> 15:02.699
1 to stage 2, stage 2 to stage 3 and so on
and also feedback manner lies; stage 3 to
15:02.699 --> 15:09.699
stage 1, stage 3 to stage 2, like that. What
we gave in this lecture until now is some
15:13.230 --> 15:20.230
idea as to what is real neural network. As
I said, the brain is very complex.
15:22.800 --> 15:29.800
We have very little idea about the brain,
although we have understood some of the properties
15:30.899 --> 15:37.899
of the brain through studies during the last
few centuries. Based on the studies, researchers
15:46.670 --> 15:53.670
have tried to develop an artificial neural
system. The objective is to create artificial
15:56.329 --> 16:03.329
machine and this artificial neural networks
are motivated by certain features that are
16:06.470 --> 16:13.470
observed in human brain, like as we said earlier,
parallel distributed information processing.
16:15.499 --> 16:22.499
The brain consists of 10 to the power of 10
nerve cells approximately and they are connected
16:23.980 --> 16:30.980
among each other in a very complex manner.
Each neuron is a processing unit and each
16:32.889 --> 16:39.889
processing unit is connected to other processing
units. Simultaneously, one unit is connected
16:40.389 --> 16:47.389
to another 10 to the power 4 units approximately.
That gives the connection structure of the
16:53.910 --> 17:00.910
real neural network - a structure called parallel
and distributed structure.
17:01.079 --> 17:08.079
Information processing is carried out inside
the brain in parallel and also in distributed
17:08.319 --> 17:15.319
fashion. It is not centralized like the CPU
that you see in an artificial computer in
17:18.750 --> 17:25.750
which the computation takes place centrally
in the central processing unit which is different
17:27.329 --> 17:34.329
in its architecture in comparison to the brain
neural network. A high degree of connectivity
17:38.490 --> 17:45.490
among the basic units, connections are modifiable
based on experience, learning is a constant
17:46.260 --> 17:53.260
process and usually unsupervised, learning
is based only on local information and performance
17:53.600 --> 18:00.600
degrades gracefully, if some units are removed.
These are some of the properties that we have
18:02.130 --> 18:09.130
seen in real neural network. Now, can we bring
these properties while developing an artificial
18:13.890 --> 18:20.110
neural network model- that is a fear.
18:20.110 --> 18:27.110
Now, being biologically inspired, we are talking
about a field called artificial neural network
18:30.390 --> 18:37.390
and we have set some agenda on how to build
a thinking machine or a learning machine which
18:41.679 --> 18:48.679
has the capability or which can be mimic the
abilities of biological organism like a human
18:55.130 --> 19:02.130
being. Now, let us go to the most basic computational
unit in an artificial neural network. Obviously,
19:09.179 --> 19:10.070
it has to be an artificial neuron.
19:10.070 --> 19:17.070
This artificial neuron has three basic elements:
nodes, weights and activation function. This
19:20.970 --> 19:27.970
is a node and these are input signals. Between
input nodes and output nodes, there are synaptic
19:32.909 --> 19:39.909
weights w1, w2, w3, w4, w5 and w6. There can
be as many weights and these weights are multiplied
19:43.659 --> 19:50.659
with the signal as they reach the output unit,
where the output is simply sum of the signal
19:59.279 --> 20:06.279
multiplied with the weights and then this
output goes to an activation function; this
20:07.260 --> 20:08.909
is f.
20:08.909 --> 20:15.909
What we are talking about is the artificial
neuron- a simple basic processing unit. As
20:24.149 --> 20:31.149
I showed you in the slide, we have an output
node. At this output node, signals reach.
20:40.710 --> 20:47.710
Let my signals be x1, x2, xn and then I put
an activation function here and I have y.
20:59.510 --> 21:06.510
As you can see here, I can easily write down;
these are weights w1, w2 and wn. The total
21:11.750 --> 21:18.750
signal that is reaching here is the summation
being wixi, i is equal to 1 to n. This is
21:23.279 --> 21:30.279
your total output reaching here and you activate
this total input by a function f. That is
21:35.500 --> 21:42.500
your output. What you are seeing is actually
a nonlinear map from input vector xi to output
21:47.470 --> 21:54.470
y. Here, in this case, we have single input
and we also have multi output if we take more
21:54.870 --> 21:59.210
nodes here. What we are talking about is a
single neuron. A single neuron has single
21:59.210 --> 22:06.210
output but multiple inputs. Inputs are multiple
for a single neuron and the output is unique,
22:06.870 --> 22:13.870
y and this y and the input bear a nonlinear
relationship, this f. Neural networks can
22:22.370 --> 22:29.370
be built using this single neuron. We can
use the single neuron and build neural networks.
22:31.350 --> 22:38.350
Today, we will discuss the linear neural networks.
What is linear neural network? I have a neuron
22:45.830 --> 22:52.830
whose output is simply wixi, i is equal to
1 to n. I have this expression. That is I
22:58.110 --> 23:05.110
do not have a nonlinear activation function.
That is my y is a linear relationship with
23:06.880 --> 23:13.880
weight. All of you know, if I write y is equal
to mx plus c in a two-dimensional place, this
23:15.450 --> 23:22.450
is a linear equation. This is two-dimensional.
Similarly, this is an equation. Input ЕЕ
23:32.309 --> 23:39.309
is n dimensional, output is one more; this
is an equation with n plus 1 dimensional plane.
23:48.429 --> 23:55.429
When my neural network output When I say neural
network, kindly remember, it does not have
24:07.860 --> 24:14.860
connection with real neural network. There
is no connection, whereas when I say neural
24:23.789 --> 24:30.789
network in this class, it means always artificial.
The only connection between the artificial
24:35.890 --> 24:42.890
neural network and real neural network is
that these artificial neural networks are
24:48.620 --> 24:55.620
biologically inspired and that is only the
difference.
24:56.100 --> 25:03.100
Let us see how we develop a linear neural
network. All of you want to study in this
25:10.169 --> 25:17.169
course- intelligent systems and control. It
will be nice, if I take an example of a control
25:17.669 --> 25:24.669
system and then explain to you what is a linear
neural network? All of you know that in a
25:26.460 --> 25:33.460
control system we always talk about model.
Let us talk about a discrete time model. y
25:45.669 --> 25:52.669
k is equal to a1 y k minus 1 plus a2 y k minus
2 plus b1 u k minus1. If I have such a dynamical
26:05.750 --> 26:12.750
model and I want to feed this dynamical model
to a system about which I have some apriori
26:15.779 --> 26:22.779
knowledge that the system is second order.
This is apriori; this knowledge is aproiri
26:36.630 --> 26:43.630
and I use this apriori knowledge, I put forth
a model. But what I do not know is a1, a2
26:45.600 --> 26:52.419
and b1. These parameters I do not know and
I want to identify. What do I do? I observe
26:52.419 --> 26:59.419
the input and output from the system and try
to fit this model using this data and try
27:00.730 --> 27:04.000
to identify a1, a2 and b1.
27:04.000 --> 27:10.710
The objective is how do I solve this problem
using a neural network? Since this model is
27:10.710 --> 27:17.710
linear, I will use a linear neural network.
What I will do in my neural network is I will
27:18.950 --> 27:25.950
have a single neuron, because this is only
a single output. I do not need more number
27:26.440 --> 27:33.440
of neurons here, only a single neuron and
I have three inputs. As I said before, the
27:36.260 --> 27:43.260
inputs are y k minus1 y k minus 2 u k minus
1. These are the inputs, I put it here. u
27:44.190 --> 27:51.190
k minus1 y k minus1 y k minus 2 ; these are
three inputs. I connect these three inputs
27:59.059 --> 28:06.059
through three weights w1, w2 and w3 and I
add them. My y is w1 u k minus 1 plus w2 y
28:24.029 --> 28:31.029
k minus 1 plus w3 y k minus 2, whereas my
actual system is supposed to be a1 y k minus
28:43.700 --> 28:50.700
1 plus a2 y k minus 2 plus b1 u k minus 1.
a1, a2, b1 represent the system parameters.
29:01.600 --> 29:08.600
I would like to know the parameters a1, a2,
b1. What I do is I keep this model and this
29:11.210 --> 29:18.210
model is sitting in my computer and to the
computer, I feed this data u k minus 1 y k
29:18.720 --> 29:25.720
minus 1 y k minus 2 at every instant to the
model that is sitting in my computer and what
29:26.470 --> 29:33.470
is the actual output of the system? y, given
y k minus1 y k minus 2 u k minus1. I give
29:36.679 --> 29:43.679
this data to this model and I have to develop
a methodology to update my weights w1, w2,
29:46.049 --> 29:53.049
w3 in such a manner and finally w1 becomes
b1, w2 becomes a1 and w3 becomes a2.
29:57.769 --> 30:04.769
The objective is to find a learning rule which
you can also say as weight update rule such
30:20.059 --> 30:27.059
that w1 converges to b1, w2 converges to a1
and w3 converges to a2. These are actual system
30:42.049 --> 30:49.049
parameters and these are neural network weights.
I hope the problem is very clear to all of
30:57.510 --> 30:57.980
you.
30:57.980 --> 31:04.980
Now, I want to identify the weights w1, w2
and w3. How do I do it? This is called a popular
31:21.490 --> 31:28.490
This learning rule is derived using the popular
gradient descent rule. What is gradient descent
31:29.090 --> 31:36.090
rule? When I try to learn the weights, the
objective is always
to minimize a cost function. That is the objective.
31:53.570 --> 32:00.570
I want to minimize a cost function. If I can
minimize this, now your y is equal to wi xi
32:07.070 --> 32:12.870
and in this case i is equal to 1 to 3, for
this example. But generally, this i is equal
32:12.870 --> 32:19.870
to 1 to n; you have n weights. But in our
case the example that we took for a second
32:20.019 --> 32:27.019
order model, you have only three parameters
to identify; that is wi. What you are given?
32:29.130 --> 32:36.130
You are given a training set.
32:39.389 --> 32:46.389
What is a training set? Now I have y k minus
1 y k minus 2 u k minus 1 and y k. This is
32:54.250 --> 33:01.250
my input and this is my output y k. I do some
experiments in my system. I give various inputs.
33:12.580 --> 33:19.580
You can see here I can define k equal to 2,
k equal to 3, k equal to 4 and so on. What
33:23.440 --> 33:30.440
am I doing? I generate some random input or
some input I generate here and correspondingly,
33:32.590 --> 33:39.590
I have some initial values of y k minus1,
some initial values of y k minus 2. When I
33:39.840 --> 33:46.840
put that values, my system gives When I give
u k minus 1 my system gives y k and I go on
33:49.009 --> 33:56.009
giving to the system various inputs and I
get the various outputs here. I know also
33:59.539 --> 34:06.360
the previous output; y k minus1 y k minus
2. Once I know what y k is that my system
34:06.360 --> 34:13.360
is actuating or the system output is, I know
also y k minus 1 and y k minus 2. This data
34:15.600 --> 34:22.600
I am getting from my system. This is called
system data.
34:24.730 --> 34:31.730
This system data also can be obtained from
simulation if you can assign a model to the
34:52.720 --> 34:59.720
system. This also you can do. That is in computer,
what I will do is given y k equal to a1 y
35:06.710 --> 35:13.710
k minus1 plus a2 y k minus 2 plus b1 u k minus
1, I will generate some random input here,
35:17.940 --> 35:24.940
u k minus1, say 0 to 1 and I give this input
to the system. I have some values a1, a2 and
35:30.240 --> 35:37.240
b1, because I have some idea about the system.
I know the system a1, a2 and b1; so, I generate
35:43.280 --> 35:50.280
data by just simulating this system. I just
put the value k equal to 2, 3, 4, 5. Then
35:50.369 --> 35:57.369
I will compute what are y2, y3 and y4.
So, this point is very clear to you.
35:58.099 --> 36:05.099
The first part is to generate the training
data. Our model is sigma wi xi. I am generating
36:21.030 --> 36:28.030
data y and x. Once I have this data, I define
a cost function. What is the cost function?
36:38.960 --> 36:45.960
The cost function E is normally given as yp
d minus yp. This is p whole square sigma;
37:02.970 --> 37:09.970
p equal to 1 to N. What is the meaning of
that? My actual system gives me the output
37:15.079 --> 37:22.079
y d p and the neural network gives yp. Again,
to help you to remind, my yp is sigma wi xi
37:31.349 --> 37:38.349
p. This is my p x input pattern and p x input
pattern gives me output yp using the neural
37:42.089 --> 37:49.089
network weights wi. These weights do not correspond
to actual weights. We want to identify this
37:51.440 --> 37:58.440
weight and the system provides an output y
p d, the actual system. In our case, the actual
38:03.900 --> 38:10.900
system was that is y p d k is a1 y k minus1
plus a2 y k minus 2 plus b1 u k minus 1. This
38:28.089 --> 38:35.089
is my actual system and this actual system
gives me a value y p d k and the neural network
38:39.790 --> 38:46.790
provides me yp, corresponding to these values
y k minus1 y k minus 2 u k minus 1 and a specific
38:47.750 --> 38:54.750
constant k. What is the objective? The objective
is that I give my neural network these weights,
39:00.990 --> 39:07.990
the various input patterns. Let us look at
what the neural network output yp is and I
39:09.950 --> 39:16.950
compute this output yp for p equal to 1 to
N and I find what is E? The objective is E
39:21.630 --> 39:28.329
should be minimum.
39:28.329 --> 39:35.329
This E I am writing as sigma yp d minus y
whole square. I missed this half earlier.
39:43.700 --> 39:50.010
This is customary because, normally the energy
function is written as half mv square. From
39:50.010 --> 39:57.010
that this half is coming here; so, half yp
d minus y square p equal to 1 to N patterns.
40:03.700 --> 40:10.490
This is actually a function p equal to 1 to
N and this is a constant, because this comes
40:10.490 --> 40:17.490
from the system yp d, system provides this
output and my actual output is a function
40:17.940 --> 40:24.940
of weight. So, that is weight factor Wi. This
I can write as Wi xi. In essence, the cost
40:42.420 --> 40:49.420
function
solely depends on neural network weights.
I hope you understand now that the cost function
41:08.319 --> 41:15.319
solely depends on neural network weights.
Why? Because the given input and output everything
41:16.990 --> 41:23.990
is known except the weights of the neural
network. The cost functions because yp d is
41:24.579 --> 41:31.579
known, it is given; xi is known, because it
is given. What else is left unknown here?
41:32.359 --> 41:39.359
Only weights. How do I update these weights
such that my cost function..What would you
41:43.690 --> 41:50.690
like? You would like given input xi, my yp
should be exactly Е.. as that of the yp d.
41:52.390 --> 41:59.390
That is my neural network output should exactly
follow what is the desired output of the system.
42:01.410 --> 42:03.980
That is the objective.
42:03.980 --> 42:10.980
The objective
42:18.760 --> 42:25.760
is that is I have the neural network output,
yp from p equal to 1 to N. This is neural
42:33.760 --> 42:40.760
network output and my system output is yp
d, p equal to 1 to N. This is my system output
42:54.240 --> 43:00.740
and the neural network output should follow
the system output. This implies my E as I
43:00.740 --> 43:07.740
defined earlier equal to yp d minus yp whole
square half sigma p equal to 1 to N. This
43:14.599 --> 43:21.599
function should be minimized. I hope you understand
this. This is the key. You must know before
43:30.240 --> 43:35.540
designing a neural network, what you are going
to do? That is the most important point and
43:35.540 --> 43:42.540
here, what I am saying is this yp the neural
network output, should follow
the system output yp d. When you understand
43:51.510 --> 43:58.510
this, the cost function that I have defined
is minimized. How do you do it?
43:58.990 --> 44:05.990
Question is how do we minimize E? We follow
gradient descent rule. What is gradient descent
44:20.040 --> 44:27.040
rule? Gradient descent rule, I plot E with
respect to W. I already told you that E is
44:35.589 --> 44:42.589
solely a function of weight. This cost function
is solely a function of weight and I want
44:44.450 --> 44:51.450
to see, how this E varies with W and what
are we considering now? Remember we are only
44:52.920 --> 44:59.730
considering a linear neural network and for
linear neural network, the normal kind of
44:59.730 --> 45:06.730
curve that you will find for any kind of linear
system, E versus W. This we will also discuss
45:07.270 --> 45:14.270
later in detail, but now you just understand,
you can also try; you try some linear function
45:15.309 --> 45:20.510
and define E and W; do a computer simulation
and plot E versus W.
45:20.510 --> 45:27.510
You will find a kind of concave function having
one global minimum. I hope this is clear to
45:28.940 --> 45:35.940
you. This is a normal curve that I get when
I plot E versus W. W is N dimensional, because
45:41.329 --> 45:48.329
I have N dimensional weight and that means
this whole plot will be N plus 1 dimensional.
45:49.599 --> 45:56.599
But I cannot see things in N plus 1 dimensional.
For the sake of clarity, I am showing you
45:58.099 --> 46:05.099
how E is the function of w in two dimensions;
only in two dimensions. Let us see the weight.
46:13.480 --> 46:19.770
What I am doing here is at this point, I am
drawing a slope and at this point I am drawing
46:19.770 --> 46:26.770
a slope. You can see that if I draw a slope
here, this is a positive slope and this is
46:28.700 --> 46:35.700
a negative slope; a positive slope and a negative
slope. The objective is how do I update my
46:45.000 --> 46:52.000
weight so that I can find this weight. This
is the global minimum and this global minimum,
46:53.250 --> 46:58.210
this weight I have to find out. How do I find?
46:58.210 --> 47:03.160
Imagine that you have a neural network. You
start from blind; you are completely blind.
47:03.160 --> 47:10.160
What are the weights? You may be anywhere
here or you may be anywhere here; either to
47:10.950 --> 47:17.950
the right or left of the actual minimum weight
that you are looking for. The objective is
47:19.140 --> 47:24.980
that what is the method by which you can come
from any position you are situated in the
47:24.980 --> 47:30.230
beginning to the actual position, the minimum
weight.
47:30.230 --> 47:37.230
The gradient descent rule says if I update
my weight, w new equal to w old eta, in this
47:44.470 --> 47:51.470
manner if I do it, then I can easily come
to this particular point. How do I do it?
47:51.970 --> 47:58.440
You just look at this learning rule. What
is this learning rule? I am now situated here.
47:58.440 --> 48:05.440
According to this learning rule, del E by
del w is a positive quantity here. Because
48:06.260 --> 48:12.869
it is a positive quantity, whatever is my
old w new w will be less, because you will
48:12.869 --> 48:19.869
subtract. If I am here, my old w is here and
I have negative slope. Because of that, this
48:24.819 --> 48:29.240
weight update algorithm will add something
positive to the old point, because of the
48:29.240 --> 48:35.150
negative slope. So, this is negative. The
negative sign becomes positive.
48:35.150 --> 48:41.250
What is happening is whether you are here,
if you are here, you are moving in this direction;
48:41.250 --> 48:48.250
if you are here, you are moving in this direction.
Using this algorithm, you always tend to come
48:48.809 --> 48:55.809
to a point or towards the direction where
my optimal weights are there.
48:56.809 --> 49:03.809
Now, the simple formula is how do I derive
my algorithm? This is my algorithm number.
49:10.220 --> 49:17.220
I gave you the principle. Now, I will derive.
Given E equal to half yp d minus yp whole
49:20.160 --> 49:27.160
square p equal to 1 to N. What will I do?
del E by del w. I put i here, because this
49:40.559 --> 49:47.559
is a vector. Del E by del wi is half; I write
like this. This is my del yp; del yp over
50:02.910 --> 50:09.910
del wi and this has to be summed. I have a
specific pattern p, I have a term and I differentiate
50:16.660 --> 50:23.660
that term with respect to wi; that is del
E by del yp del yp by del wi, because yp d
50:25.000 --> 50:32.000
is a constant term and I have N such terms.
Naturally, I will try to compute this term
50:33.369 --> 50:40.369
p is equal to 1, 2. I can write this one as
w del E1; sorry. This is simply E1. So, this
50:49.790 --> 50:56.790
term I can say simply say, Ei or Ep, p is
equal to 1 to N. This is my term. E is equal
50:58.790 --> 51:05.790
to my error square for N patterns. When I
differentiate the total E with respect to
51:08.230 --> 51:12.970
wi, I get a term like this. Hope you are clear.
51:12.970 --> 51:19.970
If I do that I find out del yp by del wi is
simply xi. That is because yp is equal to
51:33.059 --> 51:40.059
sigma wi xi. Then del Ep by del yp is half
is there; then the 2 will come from the square
51:52.420 --> 51:59.420
and yp d minus yp because with respect to
yp, you have to multiply minus. So, finally
52:05.839 --> 52:12.760
you land with a term like this.
52:12.760 --> 52:19.760
The final equation is wi new is wi old plus
eta. You remember this eta becomes plus because
52:30.900 --> 52:37.900
I have a negative sign; yp d minus p into
xi and this pattern will be there. This is
52:46.359 --> 52:53.359
the p th pattern and sigma p equal to 1 to
N. This is my total the learning algorithm
53:02.569 --> 53:04.020
that I talked about.
53:04.020 --> 53:11.020
Let me summarize what we discussed today.
We had a cost function of this form. This
53:12.690 --> 53:19.690
is for individual cost corresponding to each
pattern. This is the total cost for N patterns
53:25.859 --> 53:31.010
and we use the gradient descent rule. I explained
to you, how using the gradient descent rule,
53:31.010 --> 53:38.010
wherever you may be in the weight space, you
will always go to the global minimum and we
53:38.119 --> 53:45.119
derived the learning rule. That is for batch
update, wi new equal to wi old plus eta into
53:48.140 --> 53:55.140
delta. delta is yp d minus yp that is delta
and this is input xi p. This is my general
54:01.520 --> 54:08.520
form. Here is a mistake. There is a sigma
and when I make it instantaneous update, that
54:11.900 --> 54:18.900
means I simply use this del Ep by del wi,
then I have this simple rule. That is my instantaneous
54:21.319 --> 54:23.089
update.
54:23.089 --> 54:27.760
What is the difference between instantaneous
update and batch update? Instantaneous update
54:27.760 --> 54:32.270
is often much faster, especially when the
training set is redundant. It contains many
54:32.270 --> 54:37.150
similar data points. Instantaneous update
can be used when there is no fixed training
54:37.150 --> 54:40.839
data; new data keeps coming.
54:40.839 --> 54:46.349
Also, instantaneous update is better at tracking
non-stationary environments. Instantaneous
54:46.349 --> 54:52.000
update introduces noise in update process
and this noise in the gradient can help to
54:52.000 --> 54:59.000
escape from local minima. Let us see how to
select this eta. If you take small eta, you
55:00.539 --> 55:07.390
can see, in this surface you slowly go towards
the global minimum. This is called slow convergence.
55:07.390 --> 55:14.390
In the contrary, if you take a very high value
of eta you diverge. Eta has to be fixed drastically
55:18.440 --> 55:25.440
in such a way that your convergence speed
is optimal as well as you go towards the global
55:25.780 --> 55:28.079
minimum; that is the objective.
55:28.079 --> 55:35.079
Now, we will end this lecture by taking an
example. We will take a first order system.
55:36.440 --> 55:43.440
y k equal to 0.8 y k minus1 plus 0.2 u k minus1.
This is my actual system. I generate data
55:44.089 --> 55:49.359
using this system. You can do a simulation
experiment using this data from here. Then
55:49.359 --> 55:54.910
I have this neural network where W1 and W2
are unknown. In the beginning, I take the
55:54.910 --> 56:01.910
initial values to random, very small values
between 0 and 0.1 and then, if you use the
56:02.829 --> 56:09.289
gradient descent rule, you finally reach W1
equal to 0.8 and W2 equal to 0.2 and you can
56:09.289 --> 56:16.289
see that W1 0.8 matches, here 0.8 and W2 0.2
matches, here 0.2. That is you have exactly
56:18.539 --> 56:25.539
identified the system. Now, your actual system
and neural network, the response is same.
56:28.940 --> 56:35.940
Let us see, if you plot E versus W, you will
see you have a concave like function here
56:37.240 --> 56:44.240
and the minimum is exactly at 0.8. W, when
I plot E versus weight which is the coefficient
56:46.799 --> 56:53.799
of y k minus 1. What we did in this class
today? We explained how the artificial neural
56:57.910 --> 57:04.910
network models are being developed, being
inspired biologically. Today, we discussed
57:06.359 --> 57:13.160
linear neural network. We discussed linear
neural network in the context of system identification,
57:13.160 --> 57:19.680
because this is a course on intelligent control
and I hope you are familiar with system identification
57:19.680 --> 57:20.910
problem.
57:20.910 --> 57:27.789
I give you two assignments in this lecture.
Problem one: consider two one dimensional
57:27.789 --> 57:33.240
classes that have a common variance equal
to 1. Their mean values are as follows: mu1
57:33.240 --> 57:40.210
equal to minus 10 and mu2 equal to plus 10.
The two classes are linearly separable. Design
57:40.210 --> 57:45.020
a classifier that separates these two classes.
This is the first problem.
57:45.020 --> 57:51.309
The second problem is the extension of the
example that we discussed today. A second
57:51.309 --> 57:56.700
order discrete time system is given by the
following equation. You simulate this equation;
57:56.700 --> 58:03.380
generate data, input data and output data.
Then identify the following neural network
58:03.380 --> 58:10.380
model where W1, W2, W3 have to be identified
and please solve this problem. Till then,
58:35.490 --> 58:42.490
good-bye!
341