Destiny Project Notebook, Folio 9

PKMap neural network
letter from Bill to Chris

30 March 1999

Let me start out by saying that my final version of a "perfect" neural network doesn't rely on a Kohonen Feature Map at all, or even anything like it. However, it was where I got the initial idea from. There's still a way to use the PKMap (though it is not really a "perfect" Kohonen map; it's something else entirely. I will continue to call it a PKMap because I don't know what else to call it).

However, the networks arising from the use of the PKMap will tend to be a lot more strict about the data it sees than the final solution that I sent you (see your work e-mail; the message there has my "preferred" solution).

Why normalize the inputs??

What I was attempting to do was to take advantage of the following property:

If "I" and "W" are vectors, then I*W = |I|*|W|*Cos(angle between I and W). In the Kohonen map, "I" is the input vector, and "W" is the weight vector. If I can normalize the weights and inputs then |I|*|W| is simply a constant whereas Cos(angle between I and W) is what I would be measuring. The closer together I is to W, the greater the value of Cos(angle between I and W). I normalize the data to include the original magnitude because otherwise, I will lose a crucial piece of information. The normalization process is guaranteed to retain all of its original information. However, a vector of 0 magnitude can not be normalized. This is a pretty tough restriction on the inputs.

What to do with these PKMap elements??

The goal of setting up the PKMap is not to use in a Kohonen-style feature mapping process. Really, the PKMap elements grabbed from the training set are intended as the first layer of a three-layer pre-trained perceptron network. I won't call it a back-prop network anymore, because back-propogation through an error-space is not necessary. The map is set up to match the training set the first time, and there will not exist a "better" configuration regardless of training processes. The caveat here is that it will only perform as well as its training set. Also, if any of the training elements have contradictory information, the network will give some pretty strange results (as to be expected: the input information was contradictory!!).

Another reason to use this PKMap instead of a true Kohonen feature map is because of the size problem on the number of inputs. While it may be okay to use these extra feature-nodes that exist in a Kohonen map, they don't really buy us that much.

There's an additional step that I have found useful in creating the PKMap that I didn't tell you earlier. (I figured this out in Hawaii). That additional step is to turn the PKMap into thresholding unit. The threshold value should be set so that only units that can map into the same desired output space will get turned on in the event of receiving a perfectly matched input. (You should not accidentally turn on a unit that maps to a different desired output space, but if it turns on a unit that maps into the same output space, that's okay). Once you've turned the PKMap elements into thresholding units, doing boolean logic on the resulting data becomes straight-forward.

So, here's how to create an XOR perceptron network using the PKMap idea:

Identify Training Set:
Training Set
Input
Desired Output
{-1,-1}
-1
{-1,1}
1
{1,-1}
1
{1,1}
-1
Identify Normalized Training Set:

(NOTE: The input set is already normalized, though it has a magnitude > 1. This is okay. We will use the existing training set without transforming it. If we had to transform it, the first training set input would be: {-0.5, -0.5, 0.707}

Create PKMap:
PKMap
Element
Mapping
0
{-1,-1}
1
{-1,1}
2
{1,-1}
3
{1,1}
Add Threshold values:

Find Threshold for Element 0:

dot-product of {-1,-1} with all elements gives:

PKMap
Element
Dot Product
0
2
1
0
2
0
3
-2

Identify element closest to Element 0 that maps into a space different than that of Element 0.

Element 0 should map to -1 according to the training set. Elements 1 and 2 both map to the output 1, which is not equal to -1, and they tie for their distance from Element 0. Their dot-product is 0. Set threshold so that it bisects the angle between element 0 and 1. (We need to use the cosine-half angle formula here, as we need to bisect the angle between the training pattern and the closest non-matching element. The math may look a little strange, but it does work). Threshold = |I|*|W|*sqrt((1 + ((dot-product for element)/(|I|*|W|))/2)) |I|*|W| = 2 because of normalization. = 2 * sqrt((1 + (0)/2)/2) = 2 * sqrt(1/2) = 1.414 It turns out that all of the training elements will have a threshold of 1.414.

Define first layer of Perceptron network: (First element is threshold value. It's separated by a "|" symbol to make it stand out from the others).

Define second layer AND-NODES for trainset input: (Threshold value 3.9 chosen because it falls in range (3 < x < 4). Any value for x in that range will do fine). And-Node for first :

OR-NODE for OUTPUTs

We will of course, only pay attention to the output for OUTPUT 1, but this shows the general pattern of how these things are created. This process is extensible to defining as many output patterns as desirable. In general, there will be 1 output node per possible output pattern.

One strongly limiting factor with this idea is that there will be a rather large class of patterns that will simply not be recognized. This is not true for the XOR problem, but it will be true for patterns with more than two inputs.

Hopefully, I haven't confused you more with this example. The short answer to your first question is, "yes, I'm using properties of normalized vectors in a strange way." My final "mathematically proven" version will do something very similar to the stuff shown above, but will not require normalization of the inputs. Instead, it draws new dividing hyperplanes as needed. I don't know how to apply this approach to the problem without truth data. Perhaps a Kohonen map can be used to devise the categories and then another neural network can be used to find out the number of categories and yet another neural network can be used to classify them??

-----------------------------------------------------------------

Excellent stuff on the NNite testbed! I was tempted to do it myself, but the lure of doing work in Java made me write my code using JBuilder. Actually, I'm tempted to convert the NNite stuff into Java so that it may be further developed there.


Turing Machines are Recurrent Neural Networks
letter from Bill to Chris

12 April 1999

I thought that you would find this interesting.

http://www.uwasa.fi/stes/step96/step96/hyotyniemi1/

By the way, I found some work that is pretty similar to what I have done. My new nodes are very similar to nodes known in the neural net world as "Radial-Basis-Function" nodes. (RBF for short). Somebody else (I forget who at the moment) even uses a scheme for creating the first layer of nodes in a manner very similar to how I do it. In fact, my method is just a special case of their method (though they may not realize it).

I found a couple of other resources dealing with recurrent neural networks doing natural-language-processing. The results achieved did not look very promising, but I believe I know why they failed.


Parsing Natural Language with Neural Networks
letter from Bill to Chris

20 April 1999

I've done quite a bit of web-browsing recently and found some pretty amazing stuff out there. My parsing and diagramming of English is going to build off of work previously done by other researchers. They achieved a 70% success rate in classifying sentences as grammatically correct/incorrect. I am quite certain that I can improve upon their work; there was a basic flaw in their training process that they did not recognize which I intend to correct. The problem with my intended approach is that it will require a human being to parse and diagram a _lot_ of English sentences; in particular, a lot of incomplete English sentences! I suspect that it may actually require less work to perform some of the training interactively. (A kind of "student-teacher" mode).

http://www.neci.nj.nec.com/homepages/lawrence/papers/nl-tkde98/latex.html


Spreading Activation Model of Recall

1 May 1999

While Bill has been thinking about natural language, I have started pondering a truly associative mode of recalling information from the associative memory. Rather than a highly managed algorithm for information retrieval, as was developed for the query knowledge assembly problem, this should be more of a general recall algorithm: given two or more subjects, identify the knowledge items that connect them.

Two of the possible approaches seem appealing: path finding and spreading activation. Path finding relies on threading a path through the knowledge space from item to item until the connections between the desired subjects is sufficiently defined. Spreading activation, on the other hand, is more of a neural network approach: by stimulating the desired subjects, spread their energy through the knowledge space according to the relationship between the knowledge items until the most highly activated knowledge items form a path between the subjects.

Experimental implementations of spreading activation. A neural network is configured and built according to the contents of an associative memory. I am inclined to use knowledge subjects as nodes, starting the process by activating (or "illuminating") nodes corresponding to the subjects of a statement or query. During processing, the network would divide the energy of each node into equal portions (could later try apportioning by the connection weight) and feed back along each connection as inputs to connected nodes for the next processing cycle. Don't know yet whether I chould hold the initial illumination high, or treat it as an impulse; inclined to try the impulse and see where the energy ends up in the network. Suspect a stable pattern will emerge, or perhaps a collection of meta-stable patterns (like cyclic patterns in a game of Life, but a whole bunch of them with differing periods superimposed). Will need to develop measures to know when the network is done. The most-illuminated nodes would be the determining factor for which knowledge subjects to extract from the associative memory and assemble in a working memory.

Another possibility, though, is to illuminate relationships initially (input lines in the previous model). Ideally we would like to get relationships back out of the network, rather than subjects, and this seems to point the way toward making the relationships (knowledge items) the nodes instead of the subjects. Inputs to other nodes would be established by commonality of subject, and activation energy would be distributed to each subject. In this latter case, energy would not be conserved unless one took into consideration the number (and possibly weight based on usefulness or relevance) of relationships involving each of the subjects. While there's no evidence I'm aware of that indicates that energy of activation (or its equivalent) is actually conserved in normal brain function, one doesn't want the energy to be amplified endlessly either.

Spreading Activation Recall, cont'd

5 May 1999

To smooth out cyclical relationships, an idea I had was for each node to release half of its accumulated "energy" in each cycle to the nodes connected to it. One nice thing about this model is that it permits near-concurrent events to influence each other more readily. One can imagine, in a sufficiently complex network, that many "thoughts" could be processed essentially at the same time, with subtle influences on one another (or not so subtle, if the "thoughts" are closely related).

Some notion of loss can also be incorporated, or a minimum threshold for the energy to be passed along, to dissipate energy in the network. With some kind of loss mechanism, the initial stimulus could be "held high" without fear that the background energy of the netowrk would be unduly raised.

Then, too, there's the idea of limiting the total energy that a node can distribute in any one cycle (or limiting the per-output energy that a node can distribute), or limiting the total energy that a node can accept as input.

Bill suggests:

Try something like Dijkstra's algorithm as a model for spreading activation.

Try a diffusion model for spreading activation. This will avoid cyclic activation patterns because energy will flow only from high reservoirs to low reservoirs.

Incorporate recurrency (time-delayed feedback).

Spreading Activation Recall, cont'd

12 May 1999

Spreading Activation Recall, cont'd

13 May 1999

What if all outputs were reservoirs feeding back into the no-longer-quite-strictly recurrent inputs?

Recurrent self-feedback is somewhat like a reservoir implementation.

If neurons are subjects then subjects will be what is recalled. If items are the neurons, then items (facts, beliefs, suppositions) will be what is recalled.

If neurons are items, the basis of connectivity is commonality of subject. The strength of a connection might reasonably be influenced by the overall usefulness of an item as well as the number of competing items (or perhaps number of similar and competing items?).

Simplified reservoir neuron
Mapping output to neuron inputs

Another simplified reservoir neuron

Derivation of Wout

The entire fanning-out process is characterized by Wout, the ratio between the final output for an item and the total output that was to have been fanned out.


Spreading Activation Recall, cont'd

22 May 1999

Could reverse so that weight & connection information applies to the input instead of the output. Not as intuitively clear, however. Each neuron's input weight for a given source would depend on how many other neurons the source neuron is connected to.

Sparsely connected feedback slab: different way of storing weights & connection information, more efficient than fully connected neuron slab since number of valid connections (usually from 3 to 30) is much less than the number of possible connections (230).

A further-simplified reservoir neuron

Alternative computation of Wout: number of common subjects for this connection, divided by the number of common subjects for all connections.

For each neuron: a set of input connections and weights or a set of output connections and weights. Weights total to one.

Routing info

Aha! Subjects as inputs, but items as nodes: two layer network (if you want to think of it that way).

Two layer flow:

  1. Initial input
  2. Subject input (input)
  3. Subject reservoir (collection)
  4. Subject output, flow restricted (output)
  5. Subject-to-item distribution (mapping to next layer)
  6. Item input (input)
  7. Item Reservoir (collection)
  8. Item output, flow restricted (output)
  9. Item-to-subject distribution (mapping to first layer)
  10. Holding tank (one cycle), then back to step 2.

Initially, make this a lossless reservoir system, testing response to a single combination of inputs. Any amount of energy can be deposited initially in the subject reservoirs.

Later, will want to be lossy because memory will constantly be stimulated with an ever-changing set of inputs. Makes possible context-sensitive recall. Very important.

Each of the two layers can be stimulated as sparsely-connected slabs. Eventually, the reservoirs can be moved directly into the associative memory itself, but there is a bit of experimentation to do before then, assuming one wishes to use the neural net simulator to do the experimentation. I suppose one question is whether it is easier to teach the existing neural network testbed about subjects and knowledge items or to teach the existing memory testbed about neural networks.


Conversion to Borland CBuilder 3

22 May 1999

Converted the Destiny code base to CBuilder 3 (from CBuilder1).

Borland has changed some of the includes included in their includes. Internal structures, notably the trees used in the map implementation, have changed their namespace (They didn't fix the bug that requires the code to explicitly mention these hidden structures, they just changed the namespace.)

Changes

Note

I was not able to convert the NNITE neural network testbed project into CBuilder3. It compiled and linked OK, but gave some runtime error 14 levels down inside Borland's library code. I have lost faith in Borland's products; they used to be first out on conformance to the C++ standard, and I was willing to tolerate a somewhat buggy development environment, but now the generated code itself seems to be fatally buggy. CBuilder 4 is even worse: the standard template library code supplied by Borland does not even compile with their own compiler!


Spreading Activation Test Interface

24 May 1999
8:20 PM

Interface elements required for spreading activation test interface:

Processing functions required:


Back to the Destiny documents

Back to the Previous Folio of the Notebook

Onward to the Next Folio of the Notebook