Destiny Project Notebook, Folio 8

Changes to roads
letter from Chris to Bill

14 February 1999

The old road layer allowed each map cell to have either zero or one road. Because roads were organized by material, this was not as restrictive as it might seem. Certainly a given square meter of road can be characterized as either macadam or concrete or gravel or unimproved, etc. Doesn't work all that well on a larger scale, however. Because roads were organized by material (all macadam cells were grouped into one entity, all concrete cells into another, etc.) there was no way to record that a certain set of cells was a named entity (e.g. Highway 85, as distinct from Highway 101). The main purpose of the new road design is to designate road entities by their identity rather than their material.

Having done that, it was no longer possible to assume that a new road was all of one material, or provided the same movement advantage along its length. To accommodate changes in these properties while maximizing convenience for those less detail-oriented, I provided "usual" values for material and advantage which are in effect for all cells of that road for which exceptions are not specified. As a consequence, I can represent a macadam highway in which several cells have been torn up, or a bridge washed out.

I took the opportunity to add a few properties that hadn't been in the original: width and height above (or below) grade. The former will be more useful as we move from cell-based representation to scale-less representation. The latter can be used to determine water depth on a road given depth of flooding on the terrain (even though terrain may be flooded, road may still be passable). The last thing I added was a knowledge subject id, so that the associative memory that serves as the "almanac" we discussed several months ago can be easily accessed. This is the subject id of the road entity itself in the associative memory.


Perceptibility of Events

16 February 1999

A renegade wizard casts a spell which causes immense devastation within a city.

One person might feel the gathering or release of magical power. Another the "tremor in the force" as citizens are slain (disturbance in life energy fields). Another may see the flash while, further away, the flash can not be seen directly but is reflected off of low-hanging clouds. Still others may hear the roar, feel the wind, feel the earth vibrate under their feet, feel the heat of the fire, smell or taste the acrid smoke, or perhaps feel only pain, then death.

Others may detect variations in electromagnetic fields or notice effects on electromagnetic communication as charged particles are released into the atmosphere. An explorer might discover a crater where once the city stood. Still others may perceive the event only as a brightening of auroral effects a month later, or as a global cooling after a year or two, or detect its subtle effects in fossil layers many millions of years later.

One gifted with premonition might even perceive the event before it takes place, and may even be able to persuade someone to do something to prevent it or to mitigate its effects.

In short, there is no boundary in time or space where we can say "beyond this point, the event can not be perceived", except in the usual sense of a shell expanding out from the planet at the speed of light (and that is presuming no premonition), provided the observer were endowed with a suitable sensing apparatus.


Testing an entity's ability to respond to its environment

17 February 1999

What should be in the test environment

What the tester should be able to do

What the entity should be able to do


Semantic categorization rules and neural networks
letter from Bill to Chris

28 February 1999

Do you think you could dig up some of your old neural-network research? I've been playing with this idea of using neural networks for categorization, and it's got me to thinking about different types of neural networks. In particular, I've been playing with a concept of using a network that has a stronger biological basis than standard back-prop, but may have many of its advantages.


'Perfect' neural network ?
letter from Bill to Chris

9 March 1999

I think that I've found a way of automatically setting up a neural network so that it will _always_ properly map the training set. I don't know yet how well it degrades with inputs that don't match the training set.

The idea comes in two parts: the creation of a "perfect" Kohonen feature map and the mapping of this feature map into a target set of outputs.

The "perfect" Kohenen map, I am going to rename to a PKMap (it's easier to write :)). The process starts by defining the training set T of input vectors. The first thing that has to happen is that you need to reformat the training set so that its inputs are normalized without losing the magnitude information. (If the magnitude information was lost, it would no longer be the same feature space). This is actually a two step process because in order for the PKMap to work properly, the inputs must all be normalized. So this is how it works:

Given vector V1, produce a pair {V2,magnitude} where V2 = V1/|V1| and magnitude = |V1|. Create vector V3 from the pair {V2,magnitude} such that the first element of V3 equals the first element of V2, V3.secondElement = V2.secondElement, and so forth and finally, the last element of V3 = |V2|. Finally, create vector V4 = V3/|V3|. This transformation is possible so long as |V1| > 0. If this is not possible then the feature space should be re-defined.

So, apply the above normalization process to all input vectors in the training set. Then, for each vector of the training set, create a PKMap element with input weights exactly matching the inputs from the training vector. (i.e., if input training vector = {0.6, 0.5656, 0.5656}, create a PKMap element with input weights = {0.6, 0.5656, 0.5656}).

When firing the PKMap, take the dot product of the input vector and the weights vector for each element. The resulting value of each neuron will provide "interesting" feature map results. (A winner-take-all strategy could be used, but for the purposes of the rest of this feature, "winner-take-all" is not recommended).

So there's the first part: the PKMap. Note that its strengths lie in the fact that A*B for normalized vectors give the greatest magnitudes when A==B. Note that this will not be able to "recognize" the XOR problem. In fact, -1,1 and 1,-1 would be mapped into opposite sides of the vector space. That is okay, and we can deal with XOR in the second part to this problem.

The second part of this problem is doing the mapping from the PKMap to the desired output results. I'm running short on time at the moment, so I'll discuss the second part later. (Got to keep you guessing :)). The brief description of the problem though is to realize that Back-Prop neurons are effectively AND/OR gates. I can map the output to the input using AND/OR logic. I know that the output of the first layer (The PKMap layer) is going to be strongest for the neuron that most closely resembles the input neuron (I'm detecting which region of the feature map I am in). Using AND/OR logic, I can map the strongest output from the PKMap to the correct output for the training set. (I know this is a bit fuzzy; I'll describe it in more detail later).

Again, I'm not sure how well my methods perform on input that doesn't match the training set. However, the nice thing about this is that it is always easy to add more neurons and map them appropriately, thus increasing reliability of the network. Provided feedback, this network will always get better at mapping the inputs as long as you're willing to spend more memory on the network.

I'll talk to you about this more later. Anyway, I think this may be very helpful in the categorization problem.


'Perfect' neural network from perceptrons
letter from Bill to Chris

25 March 1999

Unbelievable though it may sound, I have found a way of creating a neural network from perceptrons that will always perform at least as well as can be expected from the training set. No hill-climbing procedures, no guesswork as to what the network is doing, and no questions about how many layers to create or how many nodes to put in any given layer. It just works. Always.

However, I proved it in Hawaii mathematically without access to my computer; I have to actually implement it now!

Here's an outline of the proof. The first part of it, I believe is known as the "Perceptron Convergence Theorem," (??) which has already been proven (Rosenblatt? Papert and Minsky?). It really is nothing new. The second part, the "Perceptron Training Theorem" is where I discuss the implications for training a perceptron network, and I believe that this is new and unique. Ironically, it follows very simply from the original proof, and I am absolutely amazed that I have not seen this before.

Perceptron Convergence Theorem

A perceptron can act either as a cutting plane in N-dimensional space (where N is the number of inputs to the perceptron), or as a logical gate of its N boolean inputs.

A perceptron as a cutting plane:

Given input vector I and weight vector W and threshold T, the output for the perceptron will be: sign(W*I - T)

Example: number of inputs == 3, Input1 labeled x, Input2 labeled y, Input3 labeled Z

f(input) = sign(W1*x + W2*y + W3*z - T)

So, if the inequality W1*x + W2*y + W3*z - T > 0 holds true, the output will be 1, otherwise the output will be -1 (occasionally 0, but the activation function "sign" can be modified to assign a specific non-zero value if the result W1*I - T == 0). Notice that the inequality describes a plane in three-dimensional space with a normal-vector of W.

A perceptron as a logical element:

Consider the case where all of the inputs into the perceptron are in the set {-1,1} (optionally the set {0,1} or even {-1,0,1}). The perceptron can perform some degree of boolean logic on the input. The precise degree of boolean logic it can perform is a bit beyond what's necessary for our purposes at the moment, but suffice it to say that it can at least act as an AND gate or an OR gate with optional NOTS at the input (each input can be handled independently). I'll save the proof of this for later. I expect that you may have already seen this proof anyway.

A perceptron layer:

A single-layer perceptron network can act in such a way as to "carve-up" the N-dimensional input space. Each perceptron creates a cutting plane, and collectively, a set of cutting planes, if their number is allowed to grow without bounds, can carve up the space to an arbitrary degree.

Alternatively, a single layer perceptron network can act as a boolean function of its inputs. Two successive boolean-function perceptron layers can produce any possible boolean function of its inputs. (Or multiple functions of its inputs).

Given a space-carving perceptron layer and given a pair of boolean-function layers after it, any arbitrary input space can be arbitrarily classified.

Multi-layer Perceptron Training Theorem

The Perceptron Convergence Theorem describes a three-layer perceptron network with a particular architecture: Plane-Cutting Layer, Boolean Function Layer, Boolean Function Layer. The first part of the Perceptron Training Theorem is that ALL three layer perceptron networks have an identical architecture to the one described by the Perceptron Convergence Theorem. I won't go too far into this here, but suffice to say that once the plane-cutting is done, all that is left is a set of boolean results. The later two layers can exert no more power over the initial input space than performing boolean re-combination with a boolean output result. No more plane-cutting over the original input space to the first layer can occur; only re-combination of the outputs.

The second part to the theorem is that since the second two layers can not divide up the input space anymore than the original first layer, all necessary space-carving MUST occur on the first layer. So, for a given set of training patterns, each with desired results in different output regions, each of these patterns must map into a different bit-pattern.

Example:

Trainset
Input Pattern
Desired Output
A
1
B
2
C
1
D
2
E
3
Input Pattern
Perceptrons
P1
P2
P3
A
0
0
0
B
0
0
0
C
0
1
0
D
0
0
0
E
1
0
0

In the sample above, patterns B and D map into the same bit-pattern output. This is the usual case for patterns in the same region, and is expected. Patterns A and C map into different regions. This is also okay since we can use boolean-recombination to say that if the output is (0,0,0) OR (0,1,0) then we have desired-output 1. However, pattern A and pattern B both map into the same bit-pattern, so later network layers can not distinguish between them. This is not good! This is also true for patterns A and D. To fix this problem, a new perceptron P4 needs to be added that will distinguish A from B. If required, a new perceptron P5 can be added to distinguish A from D.

The key to the multi-layer perceptron training theorem is that the problem can be broken down into two portions: the training of the cutting-layer, and the boolean recombination of it parts.

This is taking far too long to describe, so I'll have to cut this short and describe the rest of the training algorithm in person. (As well as describing some optimizations). Still, I'd like to share some results.

The algorithm takes at most, order N-squared steps on the number of training sets. The algorithm will be very heavily dependent on the training set, and I believe that tools can be created to test the quality of the training set as well as describing what kinds of new training elements should be added (though this part may provide strictly qualitative results, not quantitative).

It will still take an explosive number of neurons to perform parity checking. This is because, in order to have parity checking on 15 bits, you need 2^15 training patterns. This is an inherent problem with neural networks in general, and I don't think that it can be solved without adding more to the model. What should be added? Well, I'm glad you ask; it's called feedback! I have discovered a 3 (4? 5? I don't remember; need my notes) perceptron network that uses feedback that can do parity checking over an arbitrary number of bits, if it's able to look at the pattern a single bit at a time. I have not looked at training algorithms for networks with feedback yet. I probably won't until I implement my current solution. Still, I'm confident that they can be built.


'Perfect' neural network
letter from Chris to Bill

29 March 1999

I'm having some trouble understanding why you're proposing to do what you are.

In the first place, I'm unclear as to why you need normalized vectors. Either you believe that Kohonen Feature Map weights must be normalized and are trying to not lose the magnitude information, or you're trying to exploit the properties of normalized vectors in some way that I failed to grasp. As to the first, Kohonen Feature Map weights aren't normalized, so they already take the vector magnitude into account during the matching process. (It's the Kohonen LVQ-1 weights that are normalized.)

Secondly, It doesn't seem that seeding the Fmap with input vectors will have any advantage over the usual technique of initializing the Fmap with random vectors. Seeding the input vectors into the FMap will assure a good match, for the first case of the first cycle only. After that, assuming the neighborhood is set sufficiently large to do some good, all the weights will have been adjusted according to how close they were to the first match. By the end of the first training cycle, any effect of the seeding should be totally submerged. (If the neighborhood is not set sufficiently large, then the network won't self-organize.)

I'm attracted to the notion of taking the raw activiation of each Fmap element and using that as the input to a backpropagation network for classification. For a decent sized Fmap, that's a lot of backprop inputs, though. (For a 10-class problem, I typically use anywhere between a 20 x 20 Fmap and a 40 x 40 Fmap.) I suspect that you could do just as well by inputing the x, y of the winning Fmap cell to a 2-layer backpropagation network. The first layer slices the space using hyperplanes (though in this 2-dimensional case they're just lines), so you'll need enough first-layer neurons to define enough lines to separate every Fmap blob from every other Fmap blob [say, 6 to 8 times as many neurons as the number of classes, corresponding to bounding each class using a 6- to 8- sided polygon (in practice, though, the training of the Bprop will allocate more boundaries to those classes that need more and fewer to those that need less)]. Add to that a second (classifier) layer with as many cells as classes. This is assuming you have classification truth data to train from.

If you don't have truth data, especially if you don't know how many classes to expect, it becomes a far more interesting problem.

By now you're probably frustrated that I didn't understand your intent at all (I feel like one of us has missed someting crucial and that it's probably me), so I'll stop trying to comment on something that I don't understand.

I would like to leave you with something good, however. Perhaps if I just mention that I've spent the last month developing a CBuilder interface to the NNITE neural network testbed? I've even got most of the plots working. [No progress on Destiny, however, except that I'm continuing to work my way through Metamagical Themas.]


Back to the Destiny documents

Back to the Previous Folio of the Notebook

Onward to the Next Folio of the Notebook