WEBVTT
00:19.770 --> 00:26.770
Today, we will discuss a different type of
neural network. We have discussed feed-forward
00:28.240 --> 00:34.910
network, back propagation network, which is
normally multilayer network, also radial basis
00:34.910 --> 00:40.040
function network. Similarly, another kind
of network we discussed is recurrent network
00:40.040 --> 00:46.730
but today's network will be different. It
will talk about self-organizing map and specifically,
00:46.730 --> 00:52.829
we will only focus on Kohonen feature map,
Kohonen self-organizing map.
00:52.829 --> 00:59.829
Today, we will be discussing about
the self-organizing map, specifically, Kohonen.
Specifically, we will be considering only
01:07.970 --> 01:14.970
Kohonen network. We will discuss what a Kohonen
SOM learning algorithm is. We will simulate
01:20.280 --> 01:27.280
some examples, we will also show some actual
simulation, and we will talk about clustering
01:30.069 --> 01:35.850
in a visual-motor coordination and summary.
01:35.850 --> 01:42.850
Self-organizing map is motivated by certain
features in the brain, as usual. In this case,
01:49.410 --> 01:56.280
the neurons are organized in one or multi-dimensional
lattices – 1-D lattice, 2-D lattice, and
01:56.280 --> 02:02.530
1-D lattice. We can think of even higher dimensional
lattices also, because in engineering application,
02:02.530 --> 02:08.830
we can even become little more abstract. The
neurons compete among themselves to be activated
02:08.830 --> 02:14.450
according to competitive learning scheme.
There are many neurons in a lattice. They
02:14.450 --> 02:21.450
are excited by the same input vector or input
feature or input, whatever the data is. All
02:24.610 --> 02:31.610
the neurons in the space, in the lattice,
are all excited simultaneously and there is
02:31.640 --> 02:38.640
a competition. The weight vector associated
with the winning neuron is only updated and
02:41.630 --> 02:44.300
the scheme ‘winner takes all’.
02:44.300 --> 02:51.300
For example, if I have a two-dimensional lattice,
imagine I have many neurons placed in this.
02:54.910 --> 03:01.910
When an input excites this lattice, one of
these neurons becomes the winner. Then, in
03:11.550 --> 03:18.550
the ‘winner takes all’ scheme, only the
weight associated with this neuron is updated,
03:22.260 --> 03:29.260
whereas another scheme that is being employed
is the soft-max rule, where not only the winning
03:29.960 --> 03:36.960
neuron but also the nearest neighborhood neurons
also get updated. The weights associated with
03:41.480 --> 03:48.480
those neurons also take part in the decision-making
process. That is the principle.
03:52.140 --> 03:59.140
This is in general, in a self-organizing map,
but Kohonen proposed a novel neighborhood
04:00.350 --> 04:07.350
concept, where the topology of the input data
space can be learnt through SOM. For example,
04:08.100 --> 04:15.100
a data that is coming from which kind of geometry
can be understood or can be captured through
04:19.470 --> 04:26.470
SOM, self-organizing map. In this scheme,
a neural lattice can be one or multi-dimensional
04:30.199 --> 04:37.199
as usual and a neighborhood concept among
individual neurons in a lattice is a priori
04:40.550 --> 04:47.090
embedded. As neurons update their weights
upon competition, a meaningful coordinate
04:47.090 --> 04:53.249
system for different input features over the
lattice is developed. We will soon see how
04:53.249 --> 04:56.999
this happens in a Kohonen network.
04:56.999 --> 05:03.999
This is a two-dimensional Kohonen lattice,
where you are seeing the neurons all spread
05:07.259 --> 05:14.259
over. This is your input vector x, which is
an N-dimensional vector. This excites all
05:17.770 --> 05:24.770
the neurons. As the data comes, all the neurons
are excited. As usual, each neuron is associated
05:27.159 --> 05:34.159
with a weight vector wi, which is also N-dimensional.
A specific neuron wins based on SOM distance
05:44.599 --> 05:51.599
measured, which is some kind of function between
a distance measure between x and associated
05:57.009 --> 06:04.009
weight vector with a specific neuron. Based
on that, a winner is declared. Normally, when
06:07.249 --> 06:14.249
this distance measured is minimum in a specific
case, that specific neuron is declared as
06:15.830 --> 06:17.639
winner.
06:17.639 --> 06:24.639
Once we understand the principle, now we will
talk about the algorithm. Obviously, as you
06:26.279 --> 06:33.279
saw, we can have a one-dimensional neural
lattice or two-dimensional or three-dimensional
06:34.270 --> 06:39.139
and all the neurons are associated with a
weight vector, which has the same dimension
06:39.139 --> 06:46.139
as that of the input vector. Before we start
the learning process, these weights are all
06:47.059 --> 06:54.059
associated with some random weight vector.
This random weight vector has nothing to do
06:54.180 --> 07:01.180
with the actual input data; it can be anything
from anywhere.
07:03.770 --> 07:10.770
Three things are involved in the SOM algorithm,
self-organizing map learning algorithm. The
07:10.990 --> 07:17.990
first step is competition, that is, we find
the winner and then cooperation, that is,
07:25.990 --> 07:32.990
the winner selects its neighborhood, and finally,
weight update.
07:38.699 --> 07:45.699
In competition, we take some kind of distance
function, a function of a distance measure
07:48.150 --> 07:55.150
and for all neurons, we compute this function.
The neuron for which this function is minimum
07:57.389 --> 08:04.389
is declared as the winner. Let m be the dimension
of the input (data) space and weight vector,
08:07.499 --> 08:14.499
that is, the weight vector and input data
vector have m dimensions. Let a randomly chosen
08:17.259 --> 08:24.259
input pattern be x. We select this input pattern
randomly. It is given by this vector x, which
08:27.469 --> 08:30.509
is m-dimensional as you can easily see.
08:30.509 --> 08:37.509
The synaptic weight vector for neuron j, which
is wj, is wj1, wj2 up to wjm. The best match
08:44.300 --> 08:50.790
of the input vector x with a synaptic weight
vector wj occurs where the Euclidean distance….
08:50.790 --> 08:56.520
This is just one measure and it is not necessary.
There are many SOM algorithms. They use a
08:56.520 --> 09:00.400
different distance measure but today, you
simply understand… because we are all aware
09:00.400 --> 09:05.970
of Euclidean distance.
We compute Euclidean distance between the
09:05.970 --> 09:12.970
input vector and the weight vector and for
that neuron, where this distance Euclidean
09:15.600 --> 09:22.600
distance is minimum, we declare that neuron
is the winner. Let i be the index to identify
09:23.490 --> 09:30.490
the neuron that best matches x, that is, i
of x is actually the winning neuron index
09:31.000 --> 09:38.000
for the winner. Obviously, this is the … over
j, where this quantity is the Euclidean distance
09:41.220 --> 09:48.220
measure and this Euclidean distance is minimum
for a specific j and that j is actually i
10:00.780 --> 10:05.520
of x for the winner.
10:05.520 --> 10:11.500
The winning neuron selects its neighbors according
to a pre-defined neighborhood function. This
10:11.500 --> 10:16.770
is called cooperation, that is, as I told
in the beginning, the ‘winner takes all’
10:16.770 --> 10:23.000
scheme, where only the winner takes the final
decision, whereas in soft-max rule, not only
10:23.000 --> 10:30.000
the winner, winner allows to be cooperative,
that is, it allows its neighbors also to participate
10:30.280 --> 10:37.280
in decision making. The winning neuron selects
its neighbor according to a pre-defined neighborhood
10:37.460 --> 10:44.460
function. Let hj, i denote the topological
neighborhood centered on the winning neuron
10:45.020 --> 10:47.070
i.
10:47.070 --> 10:54.070
This is my winning neuron i, this is my typical
neuron j, and distance between this is hj,
10:58.550 --> 11:05.550
i. Any neuron around this winning neuron.…
The distance from the winning neuron i is
11:06.180 --> 11:13.140
hj, i. di, j denotes the lateral distance
between the winning neuron i and the excited
11:13.140 --> 11:20.140
neuron j. There are two things. Let hj, i
denote the topological neighborhood centered
11:29.180 --> 11:34.690
on the winning neuron and di, j denotes the
lateral distance between the winning neuron
11:34.690 --> 11:38.410
i and excited. There are two things.
11:38.410 --> 11:45.410
That is, for example, I have a lattice here,
this is my winning neuron i and this is my
11:47.010 --> 11:54.010
j th neuron. They have a lattice distance.
Within the lattice, the distance is di, j,
11:59.950 --> 12:06.950
whereas the topological neighborhood, that
is, how close this neuron is with this neuron
12:11.630 --> 12:18.630
is defined by a function hj, i. Obviously,
hj, i is a function of di, j or dj, i (whatever
12:23.280 --> 12:30.120
you say is all right). Hopefully, you are
very clear about the lateral distance, that
12:30.120 --> 12:37.120
is, the absolute distance between any two
neurons i and j, whereas hj, i defines the
12:38.410 --> 12:45.410
notion of neighborhood, that is, how close
is this neuron to this neuron compared to
12:45.800 --> 12:52.800
other neurons. This is the kind of a function.
So this is a function of di, j.
12:57.110 --> 13:02.000
The topological neighborhood hj, i is symmetric
about the maximum point, defined by dij = 0,
13:02.000 --> 13:09.000
that is, when distance is 0, lateral distance,
that is, when the neuron for which we are
13:11.230 --> 13:16.250
updating the weights is same as the winning
neuron, obviously, the distance between winning
13:16.250 --> 13:23.250
neuron and the same is 0. For this, xj, i
has to be maximum and we can say that is 1
13:25.160 --> 13:32.160
and h, the lateral distance, increases, the
function hj, i tends to 0.
13:36.500 --> 13:43.500
In other words, it attains the maximum value
at the winning neuron i, for which the distance
13:45.400 --> 13:51.380
dj, i is 0. The amplitude of the topological
neighborhood hj, i decreases monotonically
13:51.380 --> 13:58.380
with increasing lateral distance dj, i decaying
to 0 for dj, i tends to infinity, which is
13:58.680 --> 14:05.680
a necessary condition for convergence. I hope
you understood that if di, j = 0, then hj,
14:11.540 --> 14:18.540
i is maximum and as di, j tends to infinity,
hj, i becomes 0.
14:27.800 --> 14:32.750
A typical neighborhood function hj, i is a
Gaussian function. There can be other functions
14:32.750 --> 14:39.640
also, but one of the very typical ones is
a Gaussian function. When dj, i = 0, its maximum
14:39.640 --> 14:46.640
value is 1 and as it goes to this direction
and the other direction, its value finally
14:49.170 --> 14:56.170
decreases to 0. Normally, this function hj,
i is exponential minus dj, i square by 2 sigma
15:01.300 --> 15:08.300
square N, where sigma N is defined like this,
the weight.
15:08.570 --> 15:15.570
What is the meaning of this? If you can see
this, when N is the specific learning step….
15:16.230 --> 15:23.230
In the beginning, N = 1. So you can see when
N = 1, this value becomes almost like 1. So,
15:23.690 --> 15:29.400
in the beginning, sigma N is sigma 0, the
initial value. But, when N increases and goes
15:29.400 --> 15:36.400
to higher values like 10,000 or 50,000 iterations,
then this value will become almost 0. So,
15:38.740 --> 15:40.820
sigma N goes to 0.
15:40.820 --> 15:47.280
What you are actually trying to do is that
using this kind of function, what you are
15:47.280 --> 15:54.280
saying is that in the beginning, what happens
when a neuron… because the neural lattice
16:01.420 --> 16:08.420
knows nothing about the input data. In essence,
almost all the neurons in the lattice are
16:10.250 --> 16:17.250
considered as a neighbor for the winning neuron
but as learning progresses, the neighborhood
16:19.680 --> 16:26.680
decreases. That is the meaning of this. The
meaning is that as learning progresses, the
16:36.680 --> 16:43.680
neighborhood shrinks – that is the idea
of this neighborhood function.
16:48.860 --> 16:55.860
Now, I will give you some example of what
is a lateral distance. For example, this is
16:56.230 --> 17:03.230
a 1-D neural lattice. In this lattice, this
black is the winning neuron and I want to
17:10.880 --> 17:17.689
compute what is the lateral distance between
this red one 2…. Obviously, this is the
17:17.689 --> 17:24.059
absolute magnitude between 3 and 2, which
is 1. Similarly, the distance between 5 and
17:24.059 --> 17:31.059
3, we can say this distance major, the lateral
distance is 5 minus 3 and the absolute value
17:33.740 --> 17:35.020
is 2. This is an example of 1-D lattice.
17:35.020 --> 17:40.559
Similarly, we can see the example of 2-D lattice
also. In a 2-D lattice, we talk in terms of
17:40.559 --> 17:47.559
position vector. The example is like this.
This is a 2-D lattice. You can easily see
17:47.840 --> 17:54.840
this is the winning neuron and this is the
neuron for which we want to compute the lateral
17:55.370 --> 18:02.370
distance. Obviously, the position of this
neuron is 3, 2 or 2, 3 and this is 4, 2. Obviously,
18:05.290 --> 18:11.020
the index of the winning neuron is 2, 3 and
the index of the neighborhood neuron, the
18:11.020 --> 18:18.020
red one, is 4, 2 (this is the red-colored
neuron). The distance between these two is
18:19.260 --> 18:26.260
4 minus 2 whole square and 2 minus 3 whole
square. This 4 minus 2 whole square is.…
18:33.000 --> 18:40.000
Actually, this should be distance like this.
So 4 minus 2 whole square, that is, 2 square
18:40.860 --> 18:47.860
is 4 and 2 minus 3 whole square is 1. So,
root over 5 is the lateral distance.
18:52.380 --> 18:59.380
Once we understood this preliminary concept,
then.… Kohonen proposed this algorithm.
19:00.930 --> 19:07.930
This is the Kohonen algorithm. What you are
seeing is that this is the weight for a specific
19:20.080 --> 19:27.080
neuron that is being updated at any instant.
This is the learning rate. This is the neighborhood
19:29.990 --> 19:36.990
function of the j th neuron corresponding
to the winning neuron, which is ix, and this
19:40.850 --> 19:47.850
is the input vector, input data x, and this
is the weight associated with the j th neuron.
19:49.730 --> 19:56.730
You can easily see if this is 1 for the winning
neuron, then what is actually happening is
19:57.480 --> 20:04.480
that w is actually closing towards x, that
is, that specific weight is moving towards
20:09.000 --> 20:16.000
the specific input vector. This is the specific
standard one. The weight associated with the
20:19.020 --> 20:23.700
winning neuron and its neighbors are updated
as per a neighborhood index. The winning neuron
20:23.700 --> 20:28.090
is allowed to be maximally benefited from
this weight update, while the neuron that
20:28.090 --> 20:35.090
is farthest from the winner is minimally benefited.
That is the idea of this neighborhood function.
20:36.130 --> 20:43.130
We also make sure that the learning rate in
the beginning is maximal (the value) and as
20:46.420 --> 20:53.420
the learning progresses, its rate becomes
very small.
21:01.190 --> 21:07.250
We already talked about this. This equation
that we talked about is our normal Kohonen
21:07.250 --> 21:12.929
learning algorithm. This equation will move
the weight of winning neuron wi towards the
21:12.929 --> 21:18.440
input vector x – this is very important.
This equation will move the weight of the
21:18.440 --> 21:25.059
winning neuron wi toward the input vector
x, to maximum, whereas its neighborhood will
21:25.059 --> 21:30.640
go towards x according to the distance from
the winning neuron.
21:30.640 --> 21:37.640
The objective is that.… Normally, the input
data, input space is infinite. So the objective
21:39.270 --> 21:46.270
of SOM or self-organizing map is how can I
represent a voluminous data or infinite number
21:46.780 --> 21:53.780
of data using finite number of samples? This
is the basic idea behind this competitive
21:55.179 --> 22:02.179
network and in classical terminology, this
specific thing is known as how to form clusters.
22:07.429 --> 22:13.720
We have already learned clusters are used
in pattern recognition, in image coding, in
22:13.720 --> 22:20.720
image processing and we used clusters in many
other things, specifically, pattern recognition.
22:29.090 --> 22:34.990
What we understood until now is Kohonen lattice.
That lattice can be of any dimension – 1-D,
22:34.990 --> 22:41.990
2-D, 3-D and there is a competition once they
are excited – one is winning and others
22:42.740 --> 22:48.730
are its neighbors. In the beginning, the neighborhood
is very large. As learning progresses, the
22:48.730 --> 22:55.730
neighborhood shrinks, that is the idea and
finally, the objective is that Kohonen shows
22:59.320 --> 23:05.770
that such a scheme preserves the topology
of the input space. We will see that now.
23:05.770 --> 23:12.770
Here, what you are seeing is that a 1-D SOM
learns 2-D topology, that is, I have a 1-D
23:13.190 --> 23:20.190
neural lattice, a neural lattice that is 1-D.
For example, if my neurons are placed in a
23:22.960 --> 23:29.960
line, this is 1-D. Now, I will excite this
1-D lattice with data from a 2-D structure.
23:37.290 --> 23:44.290
That 2-D structure can be a square, can be
L shaped or can be anything, even a triangle.
23:44.290 --> 23:51.290
Now, can I say that this 1-D lattice can actually
preserve the topology? After I form the clusters
23:54.990 --> 24:01.990
using this 1-D lattice from the data that
is coming from a 2-D topology or 2-D geometry,
24:02.799 --> 24:09.799
then looking at these weight vectors of 1-D
lattice, can I conclude something about the
24:15.429 --> 24:22.429
structure of the 2-D input space? This is
what the question is.
24:26.170 --> 24:33.170
My input space is a square. I take a perfect
square. This is a perfect square of 1 meter
24:34.960 --> 24:41.960
and 1 meter. My data is coming randomly from
this square, I do not know and I excite this
24:45.620 --> 24:52.620
data. I excite this 1-D lattice with this
data. Now, looking at this the weight vector
24:54.250 --> 25:00.450
of this 1-D lattice, can I conclude that my
data is coming from a 2-D lattice, which is
25:00.450 --> 25:03.429
a square structure? This is the question.
25:03.429 --> 25:10.010
Now, we will do that SOM relation to let you
know what actually a Kohonen lattice is, a
25:10.010 --> 25:16.690
Kohonen SOM. A 1-D Kohonen lattice with 100
neurons is selected. I selected a 1-D lattice
25:16.690 --> 25:23.690
having 100 neurons. Each data point is two-dimensional
because any data here has x coordinate and
25:25.340 --> 25:32.340
y coordinate and obviously, the weight associated
with each neuron of this lattice will also
25:33.570 --> 25:40.570
be two-dimensional. So, x is two-dimensional
and wj is also two-dimensional.
25:52.710 --> 25:59.710
Training is done for 6,000 iterations. Now
for this, the dimension is m = 2, x is a two-dimensional
26:01.799 --> 26:08.620
vector and each weight vector is two-dimensional.
Now, the weights are all randomly initialized
26:08.620 --> 26:15.620
for the lattice. The first element of all
weights is lying between ?0.4 and 0.4 and
26:19.190 --> 26:26.190
so also, the second one is ?0.4 to 0.4 and
randomly distributed. The input x is uniformly
26:29.360 --> 26:36.360
distributed in the region 0 to 1 and 0 to
1, that is, my input space is actually 0,
26:43.220 --> 26:47.290
1, 1.
26:47.290 --> 26:54.290
This is my input space and my data is coming
from this space, whereas my weights are initialized.
26:55.320 --> 27:02.320
This is ?4.4, +0.4 and also, this is 0.4 and
this is 0.4. This is like this. My weights
27:24.500 --> 27:31.500
are all confined to this zone, the weights
of the 1-D lattice, whereas my data is actually
27:31.940 --> 27:36.850
coming from here. Now, we will see what is
happening.
27:36.850 --> 27:43.850
This is my input space, random data
and this is the initialization of the weight.
You can easily see this is from 0.04 to 0.04
27:52.929 --> 27:59.929
and 0.04 to 0.04. It is a very small zone
around the origin and all the weights are
28:00.559 --> 28:07.559
all initialized. Obviously, the weights have
no value. This is a mistake here, which I
28:09.820 --> 28:12.440
showed you.
28:12.440 --> 28:19.440
This is not 0.4, this is 0.04 to let you know
why we have taken a very small value, just
28:23.429 --> 28:30.429
to show that you can start from any initial
value for the weight and you still capture
28:31.220 --> 28:35.280
the topology using this Kohonen lattice.
28:35.280 --> 28:42.280
After 6,000 iterations, surprisingly, if you
plot the weights sequentially, that is, for
28:54.549 --> 29:01.549
example, this is my sequence of 100 neurons
and I plot this one, then this one is this
29:03.280 --> 29:09.870
one, this one is this one, this one is this
one and so on. If you look at it, it has developed
29:09.870 --> 29:16.870
a structure in such a way that by looking
at it, I can easily see that this 1-D lattice
29:18.799 --> 29:25.799
captures a 2-D topology. You can see that
this 1-D lattice, which was confined to a
29:29.480 --> 29:36.480
very narrow zone around the origin, expanded
itself – the neurons. in such a way that
29:41.720 --> 29:48.720
the weights were so organized that they captured
the 2-D lattice. I will show you now in a
29:52.590 --> 29:59.590
movie style now how this actually looks like.
30:02.020 --> 30:09.020
You see how this 1-D lattice which was the
one line in the beginning and slowly, it is
30:12.020 --> 30:18.190
developing into a very nice structure and
telling us that the data is actually coming
30:18.190 --> 30:25.190
from a square. I hope you could appreciate
that. Again just to let you know and for those
30:28.130 --> 30:35.130
who could not follow it, I will again repeat
it. You see that in the beginning, if you
30:39.890 --> 30:46.890
look at the structure, the lattice is actually
taking the shape or it is spreading to let
30:51.620 --> 30:58.620
us know that the topology is a 2-D square.
31:01.250 --> 31:08.250
That was a square. You may say that is nothing
surprising, but now, we will take an L-shaped
31:09.809 --> 31:16.809
input space. This is my L shape. My data is
not coming from a square but it is coming
31:21.580 --> 31:28.580
from an L shape. Now if I train, if I update
my weight of a 1-D lattice, that is, again,
31:32.280 --> 31:39.280
I have 100 neurons in 1-D and they are again
in a small.… Again, this is not 0.04, it
31:46.830 --> 31:53.830
is 0.04. The weights are randomly initialized.
Can I say once again…. Once they are trained,
31:56.010 --> 32:02.080
they will give me information that topology
is from actually the input space has a topology
32:02.080 --> 32:06.049
of L shape. Just looking at the weights of
this neuron, can I tell?
32:06.049 --> 32:13.049
In fact, this is also true here. This is my
input space. The data is all randomly generated
32:14.929 --> 32:21.929
from this L shape. These are my small weights,
initial weights – plus or minus 0.04 is
32:25.710 --> 32:32.710
the range. Then, you see that after 6,000
training steps, if you look at the weights,
32:37.320 --> 32:44.320
they have exactly L shape. You can easily
see the topology of the input space merges
32:46.620 --> 32:53.620
to be L shape. Looking at the data, you can
infer. Again, I can show you a video on this.
32:55.450 --> 33:02.450
This is an L shape, the input data, and unlike
the other one, which was rectangular, now
33:11.549 --> 33:16.780
it is L shape and you see how surprisingly,
things are moving in a different manner. So,
33:16.780 --> 33:23.780
this is L shape. Finally, for those who could
not capture, I to show how actually the neurons
33:28.919 --> 33:35.360
self-organize themselves in such a way that
we can say that the input data is coming actually
33:35.360 --> 33:41.520
from an L shape using 1-D lattice. You see
that the first part is already… this part
33:41.520 --> 33:48.520
is L and again, suddenly, this is spreading
to say that this is L shape. That is something
33:52.330 --> 33:53.780
nice to see.
33:53.780 --> 34:00.250
You can all do it by sitting in front of your
computer. Write the program, a simple program.
34:00.250 --> 34:05.730
You can write in C or MATLAB and you can have
fun with this kind of structure to learn how
34:05.730 --> 34:12.460
topology of the input space can be actually
recognized using this self-organizing map
34:12.460 --> 34:19.460
learning algorithm, only specifically Kohonen
lattice. Earlier, we used 1-D lattice. Now,
34:23.579 --> 34:29.070
we will use 2-D lattice and again, we will
take the same input space and we will see
34:29.070 --> 34:32.230
how it varies in the 2-D case.
34:32.230 --> 34:39.230
Again, the square input space – the structure
of the input space is a square and we take
34:39.470 --> 34:46.470
again a 2-D Kohonen lattice, so, 10 rows and
10 columns. With that, you see that the weight
34:48.099 --> 34:55.099
space is initialized in very small values.
This is my input space and this is my initial
34:58.640 --> 35:05.640
weight, very small weights around origin.
After training is over, we can easily see
35:10.580 --> 35:17.580
that they are very regularly arranged and
these neurons in 2-D in such a way that you
35:18.490 --> 35:21.380
can easily see the topology is a square.
35:21.380 --> 35:28.380
These neurons we have. This is neuron 1, neuron
2, 3, 4, 5, 6, 7, 8, 9. 1, 2, 3, 4, 5, 6,
35:31.550 --> 35:38.550
7, 8, 9, 10. Similarly, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10. You can easily see that: 1, 2,
35:39.640 --> 35:46.640
3, 4, 5, 6, 7, 8, 9, 10. What you are seeing
is that we have plotted all the neurons in
35:47.290 --> 35:54.290
the neural lattice and they exactly occupy
a space, perfectly distributed in such a way
35:54.369 --> 35:58.869
that the topology is now regular and it looks
like a square lattice.
35:58.869 --> 36:04.810
This square lattice will look even better
if I increase the number of neurons, 10 neurons
36:04.810 --> 36:11.810
in one lattice. I have taken a 2-D lattice
and I have placed 10 neurons in this axis
36:14.980 --> 36:21.980
and 10 neurons in this axis. If you increase
these neurons to be 100 and 100, this regularization
36:22.869 --> 36:29.869
of this structure will be much more pronounced.
You can do that experiment also.
36:31.080 --> 36:38.080
We can easily again see another video. You
see that earlier, the neurons were associated
36:40.119 --> 36:45.560
and they had no idea about the input space.
Slowly, you see that the neurons are exactly
36:45.560 --> 36:52.560
taking the shape of a square lattice. For
those who did not see it properly, again,
36:56.060 --> 37:03.060
for their benefit, I am repeating the same
video. You can easily see how the neurons
37:03.760 --> 37:09.820
did not have any idea in the beginning and
they again take the shape of the actual input
37:09.820 --> 37:16.820
space. We go to the other example.
37:19.550 --> 37:26.550
Just like we took the square space, now we
take an L space but again 2-D lattice. So,
37:32.210 --> 37:39.210
2-D SOM learns 2-D topology. Again, we will
do it in the same manner. This is L space.
37:42.109 --> 37:49.109
This is L shape structure and this is the
input data.
37:52.440 --> 37:59.440
This is the initial weight vector of the 2-D
lattice for capturing the topology of an L
38:00.920 --> 38:07.920
shape input space. We want to capture the
topology of L-shaped input space. This is
38:24.369 --> 38:29.670
the final structure after learning and you
can easily see the topology of the neurons
38:29.670 --> 38:34.330
is like an L shape here.
38:34.330 --> 38:41.330
You can easily see again how this happens
in the movie. What you are seeing is how neurons
38:45.500 --> 38:52.210
in a 2-D lattice are learning L shape in a
topology space. You can easily see this is
38:52.210 --> 38:59.210
L and this is the other half. For those who
could not see it, for them, I repeat it. You
39:03.550 --> 39:10.550
can easily see
how suddenly this is the structure. The structure
of the input space was L shape and that has
39:25.540 --> 39:32.540
been nicely recognized. As I told you, we
can always improve this by using more number
39:34.869 --> 39:37.400
of neurons for each.
39:37.400 --> 39:44.400
What we learnt until now is that we can use
Kohonen lattice to learn the topology of the
39:48.240 --> 39:55.240
input space. From which geometry is the data
coming? Is it 3-D, 2-D or is it coming from
39:55.590 --> 40:02.590
various shapes – circle, sphere, L shape,
triangular shape, prism shape? We can actually
40:04.599 --> 40:10.599
learn from where the data has come, the geometry
of the data – from where it is coming. That
40:10.599 --> 40:17.599
is a very nice identification of the feature
space, which we will be learning later. We
40:20.180 --> 40:25.320
are ready to represent, because this is a
very difficult problem. We will learn in this
40:25.320 --> 40:31.869
course this but I will just introduce to you
what is the visual motor coordination problem.
40:31.869 --> 40:38.869
Visual-motor coordination. For example, I
want to manipulate anything, I want to write
40:43.470 --> 40:50.470
something on the blackboard. In a biological
organism, there is a very nice correlation
40:51.589 --> 40:58.310
between the visual feedback and the movement
of hand or movement of leg. There is nice
40:58.310 --> 41:05.160
hand-eye coordination that we have particularly
in biological organisms, specifically for
41:05.160 --> 41:12.160
us humans. Some animals also have very high
degree of coordination between hand and eye,
41:13.050 --> 41:20.050
the visual information and the movement of
their organs.
41:21.830 --> 41:28.830
Can we do this kind of things in machines?
One example here is that this is a robot manipulator.
41:29.359 --> 41:36.220
You can easily see the three-link robot manipulator
and you can see that there are two cameras
41:36.220 --> 41:43.220
camera 1 and camera 2. They give the information
about the target, where these end effectors
41:44.609 --> 41:50.510
will go and based on this feedback, where
is the target? This end effector has to reach
41:50.510 --> 41:57.510
this point. How can that happen through learning
mechanism?
41:58.070 --> 42:05.070
This is the camera input. I have a control
algorithm that will formulate a map between
42:07.700 --> 42:14.700
Cartesian space to joint space. Joint space
means theta1, theta2, and theta3. If I give
42:15.859 --> 42:22.000
a specific angle set point theta1, the base
will go to that specific angle, this first
42:22.000 --> 42:29.000
link will go to specific angle theta2 and
the second link will go to theta3. Once I
42:30.720 --> 42:36.640
know what is the Cartesian space, based on
that information, how do I actuate theta1,
42:36.640 --> 42:43.119
theta2 and theta3 such that this end effector
reaches this point? That is the problem of
42:43.119 --> 42:43.480
visual-motor coordination.
42:43.480 --> 42:50.480
The more difficult problem is that if this
target is continuously moving…. If it is
42:52.540 --> 42:58.030
static target, the problem is simple, but
if the target is moving continuously, how
42:58.030 --> 43:03.839
will this go? If it is here now, it will take
a step in this direction and suddenly, this
43:03.839 --> 43:10.839
has gone somewhere else. How can this actually
follow a moving target? This is the problem.
43:12.859 --> 43:17.119
Two cameras provide visual feedback to the
control algorithm and thus help in positioning
43:17.119 --> 43:24.119
the end effectors to the relevant position.
That is the visual-motor coordination.
43:25.810 --> 43:32.810
This is our robot manipulator, three-link
manipulator. The idea is actually that we
43:34.810 --> 43:41.650
want to learn a map from Cartesian space,
which is actually now four-dimensional, because
43:41.650 --> 43:48.650
you see that here, we used the two camera
system, which is a stereo vision system. If
43:50.650 --> 43:57.650
I have one camera, it is very.… Even if
I am looking at a three-dimensional object,
44:03.220 --> 44:09.210
in the camera plane, it is only a 2-D point,
a 2-D projection of the three-dimensional
44:09.210 --> 44:15.619
object. In that sense, the depth perception
using a single camera may not be properly
44:15.619 --> 44:22.619
done. That is why we talk about stereo vision.
We have two cameras to make sure that we have
44:22.950 --> 44:29.950
a proper depth perception of the 3-D object
and hence, we can exactly locate where the
44:32.060 --> 44:36.160
3-D point is. That is why we are taking two
cameras.
44:36.160 --> 44:43.160
What is the objective? I will tell you how
we solve this problem using a learning framework
44:43.920 --> 44:50.920
or the type of neural network kind of learning
that we talk about. I have a Cartesian space
44:52.040 --> 44:59.040
here. This is the Cartesian space and this
is my joint space. To simplify, I say this
45:07.210 --> 45:13.750
is a point x, y, z in the space coordinate
and this joint theta1 coordinate, that is,
45:13.750 --> 45:20.320
the robot manipulator is theta1, theta2 and
theta3.
45:20.320 --> 45:26.930
How do I learn the map from Cartesian space
to joint space in such a way that the end
45:26.930 --> 45:31.770
effector of the robot manipulator reaches
exactly the target position through visual
45:31.770 --> 45:38.770
feedback? That is the objective. What we do
is that from the Cartesian space, we create
45:41.200 --> 45:48.200
small clusters and each cluster, we define
a linear mapping from the Cartesian space
45:48.730 --> 45:55.730
to joint space. This is called…. . In essence,
what I am saying is that theta is a function
45:58.290 --> 46:05.290
of x, y and z, or in this context, it is f
of u. u is the four inputs given by the cameras.
46:09.030 --> 46:13.230
Two cameras are there and each camera's output
is two-dimensional output. So, two cameras
46:13.230 --> 46:20.150
will give us four-dimensional output and that
is u.
46:20.150 --> 46:27.150
Given u, how do I go to theta? What happens
is I divide my Cartesian space into small
46:28.290 --> 46:35.290
clusters and within each cluster, I define
a linear relationship using Taylor series
46:36.280 --> 46:43.280
expansion from theta, that is, theta u equal
to thetas plus As into u minus ws, where u
46:45.230 --> 46:52.230
is the input from the camera, ws is the weight
associated with a specific neuron, that is,
46:53.890 --> 47:00.270
what I am trying to do is that like a Cartesian
space is a 3-D space, I take a 3-D neural
47:00.270 --> 47:06.430
lattice (a neural lattice is a 3-D neural
lattice) and each neuron represents a discrete
47:06.430 --> 47:09.010
cell in the Cartesian space.
47:09.010 --> 47:15.450
Then, within this discrete cell, I define
a linear relationship using Taylor series
47:15.450 --> 47:22.450
first-order expansion, which is theta. The
actual theta is thetas. This thetas is the
47:26.109 --> 47:33.109
specific discrete cell. The center of the
discrete cell is correlated to thetas in the
47:34.030 --> 47:40.770
joint space and s is the Jacobean matrix,
u is the input from the camera and ws is the
47:40.770 --> 47:45.070
weight associated with the specific neuron
that we are talking about, that discrete cell.
47:45.070 --> 47:50.950
So, s represents the s th neuron, theta is
the zero-order term, w is the neuronal weight,
47:50.950 --> 47:57.950
and s is the Jacobian matrix. This is the
idea of forming a map between input and output.
48:00.609 --> 48:05.109
Now, what we are trying to learn is a cluster.
48:05.109 --> 48:10.710
We will construct a 3-D neural lattice and
this 3-D neural lattice, first of all, must
48:10.710 --> 48:17.710
capture the robot workspace. Obviously, the
robot workspace may be a 3-D workspace, exactly
48:23.960 --> 48:30.960
cubical. There are two ways we can do the
cluster. In one case, we can form a cluster
48:36.020 --> 48:41.730
formation only in the Cartesian space and
the other case is the Cartesian and joint
48:41.730 --> 48:48.730
space together – we can also do that. After
doing that, we can actually make sure
given a target position, how the robot moves
48:56.109 --> 49:03.040
or given a target position, how the end effector
moves to the target position through learning.
49:03.040 --> 49:10.040
That is what we will now learn.
49:35.460 --> 49:42.460
What you are seeing is how a 3-D neural lattice
is learning a Cartesian workspace. You can
49:43.630 --> 49:49.359
easily see that the data is coming from Cartesian
workspace. For those who could not follow
49:49.359 --> 49:56.359
this, I will do it again. You can easily see
this and you can easily see how
the topology of the Cartesian workspace is
50:04.940 --> 50:11.940
being learnt by a 3-D neural lattice. This
is the first one. Now, I will combine the
50:12.030 --> 50:19.030
Cartesian space and joint space and again,
we will try it. We will see how that learning
50:23.020 --> 50:30.020
takes place. You can easily see that the 3-D
neural lattice learns the workspace of a robot
50:30.589 --> 50:36.930
manipulator very accurately and this is much
better actually in this case when we combine
50:36.930 --> 50:43.930
the Cartesian space and joint space and finally,
using this notion, I will show you how a robot
50:47.170 --> 50:51.800
will actually capture
50:51.800 --> 50:58.800
This is my target and this is my end effector
and it moves around it and it goes until it
51:06.320 --> 51:12.140
exactly converges. You see that it has exactly
converged now. For those who did not follow,
51:12.140 --> 51:19.140
I again start this. You see that this is my
end effector and it is going to try to reach
51:21.690 --> 51:28.690
this target. It
51:34.609 --> 51:39.740
goes very close and finally, it has reached
exactly. You can see now that this has exactly
51:39.740 --> 51:46.740
reached. Using Kohonen SOM, we can do many
things. First of all, let me summarize. Before
51:59.000 --> 52:04.790
I summarize, let me….
52:04.790 --> 52:11.790
Using a Kohonen lattice, we can do many things.
One of the things that I will teach you in
52:16.220 --> 52:23.220
this course is how to do system identification.
We can easily do system identification and
52:27.680 --> 52:34.680
some of the difficult tasks like visual-motor
coordination. We will take these two topics
52:37.589 --> 52:44.589
exclusively when we go into detailed application
of neural networks in intelligent control.
52:45.000 --> 52:52.000
Let us now summarize what we discussed today.
For a Kohonen SOM, first what we have to do
52:52.200 --> 52:59.200
is initialize. We select a specific lattice,
a Kohonen lattice. It can be 1-D, 2-D, 3-D
53:00.390 --> 53:07.390
or even higher dimensional lattice. Then,
we assign the associated weight vector with
53:09.180 --> 53:16.180
each neuron of the lattice randomly. Usually,
the associated vector of each neuron has the
53:18.859 --> 53:25.859
same dimension as that of the input data or
input vector. You draw a sample x from the
53:26.790 --> 53:32.460
input space randomly and then excite all the
neurons, find the winning neuron.
53:32.460 --> 53:38.859
Then after you find the winner, you do the
weight update. This is actually the Kohonen
53:38.859 --> 53:44.720
learning algorithm; this is the sum and substance.
It contains a very important item called neighborhood
53:44.720 --> 53:50.420
function hj, i. As I told you in the beginning
of the learning, the neighborhood function
53:50.420 --> 53:57.200
is adjusted in such a way that almost all
the neurons in the lattice take part in decision-making
53:57.200 --> 54:03.740
and slowly as learning progresses, very few
in the nearest neighborhood of the winning
54:03.740 --> 54:10.740
neuron take part in the decision-making. Then,
we continue the step from 2 to 4 until the
54:15.390 --> 54:21.770
training is over. Finally, when training is
over, we can easily see that a Kohonen lattice
54:21.770 --> 54:28.770
has actually learnt the topology of the workspace.
Thank you very much.
375