Destiny Project Notebook, Folio 6

Variable Pool Trees

22 November 1998

Structure is like that of the query tree, for each variable mentioned in the query. Each pool tree is then pruned to remove all nodes which do not contain a mention of its particular variable.

 

Query Tree
Pool Trees

The purpose of a pool tree is to represent the semantics of satisfying a particular variable of the query. As a result, it is a model for propagation of information from one part of the query to another.

To facilitate the propagation of information, we define pools (lists of subject ids which are candidates for satisfying the variable in question) at each node of the pool tree. Because information can flow both down a branch, then up the tree, it is necessary to define several pools at each node.

"In" Pool

The in pool holds information that is being propagated toward the leaves of its branch. This information (variable candidates) has been developed by some other branch of the pool tree and is being made available for testing by this branch. The in pool may be empty: if so, information developed at the leaves of this branch is as yet unconstrained.

"Out" Pool

The out pool holds the result of node processing which is to be propagated towards the root of the tree.

"Base" Pool

The base pool holds interim results from children of the node. During processing, the base pool of an OR node would hold the candidate subjects from one of the child expressions of the OR node, while the out pool would hold the union of the candidates from all children processed.

Because nodes are connected via the tree, the base pool of one node may be the same as the out pool of a different node. A node either owns its own pool of a particular type or refers to a pool in some other node. Among the references, some can be fixed when the pool tree is formed, while others must be varied as processing procedes.

Node Type
Pool
IN
OUT
BASE
AND
own (for efficiency, may remove candidates as they are disqualified by child branches) own (need to form intersection of child results) variable reference to child out pool
OR
own (for efficiency, may remove candidates as they are qualified by child branches) own (need to form union of child results) variable reference to child out pool
NOT
reference to parent in pool own (need to form negation of child results) reference to child out pool
LEAF
reference to parent in pool reference to item out pool reference to item out pool (not really needed)
ITEM
own (may expand a copy of the leaf pool to accommodate is-a relationships) own (need to form union of matching knowledge item subjects) not applicable

Query Item

A query item is an ordered list of subjects, at least one of which is usually a variable. [If a query item has no variables, there is no need to query the source memory for matching knowledge items, and the item is treated as a supposition.] For example <%2><is a><car> is a query item referring to variable 2.

Each subject in the query item has an out pool and, optionally, a non-empty in pool. If the in pool is not empty, it represents a constraint of matching source memory knowledge items to those with one of the candidate subjects from the pool in that position. An empty in pool indicates a lack of constraint on matches to that subject.

The out pool for each subject is the union set of subjects in that position within each knowledge item that matches the query item.


Musings: Propagating variable pool information through the tree

When the out pool for a variable in a query item is defined

  1. Find the leaf to push the information to (mapped by item number, subject number or by item number, variable number)
  2. Invoke the leaf's parent to pull information through the leaf into its base pool, then process it.
  3. If all children of the node are satisfied, invoke the node's parent to pull information through.

The above seems pretty stupid; the process should be parent driven, but because processing a query item can result in information being revised for more than one variable at a time, the process must incorporate a data push model of propagation.

Here's a notion: instead of priority queueing query items, start with queueing all the leaf nodes with their respective cost estimates. When a leaf is satisfied, the cost estimate of other children of its parent are revised. When all children of a node are satisfied, it pushes its result to its parent.

Want to prevent evaluation of an expensive NOT when its cheap underlying condition is evaluated.

Perhaps we could queue only non-leaf nodes (start with just the nodes with leaf children?) or check nodes pulled from the queue to ensure that their cost estimate hasn't increased, or both?


Algorithm for efficient resolution of general Boolean queries

  1. Estimate costs for all nodes based on best association cost of query items. The cost of a query item is estimated as the cost of the lowest-cost association that can be formed from the item. The cost of an AND node is estimated as the sum of the estimated costs of its children, plus one. The cost of an OR node is estimated as the cost of its lowest-cost child, plus one. The cost of a NOT node is estimated as the cost of its child, subtracted from the total number of knowledge subjects known to the memory, plus one.
  2. Queue all nodes based on their estimated costs. (Use a priority queue.)
  3. Pop a node from the queue.
  4. If the node is already satisfied, discard the node.
  5. If the node's current cost is greated than the cost with which it was queued, discard the node. There is guaranteed to be another entry for that node in the queue with the more recent cost estimate.
  6. Satisfy the node's unsatisfied children, lowest-cost first.
  7. Perform node processing: this is any processing done by the node after all its children are satisfied. Only one node type (NOT) needs this kind of processing: it must compute O = I - I union B (if I is defined) or O = U - B (if I is not defined), where O is the output pool, I is the input pool, B is the base pool, and U is the universe of subjects known to the memory. We want to keep this computation separate so that it is triggered based on retrieval of the NOT node (and its cost) from the queue, not based on the satisfaction of the child node, because processing the NOT might be very expensive.
  8. Tell the node's parent to process this node. This is where the main action for AND and OR nodes takes place.
  9. Tell the node's parent to perform an estimate of its own cost, which will force the evaluation of new estimates for its remaining unsatisfied children. Requeue any nodes for which the cost estimate is changed by this step.
  10. If the queue is not empty, return to step 3.

A possible data flow for the reference query

 


Toward a tree-driven query evaluation order

Each query item, at any given point in time, has a best (lowest) cost estimate and a corresponding best association. After the initial determination, which tests only associations with the concrete (non-variable) in pools of the query item, the best cost and best association can only change in response to the re-definition of an in pool associated with a variable.

The pool tree is the representation that guides the propagation of variable pool re-definitions. The question is whether the pool tree can be used to good advantage to direct the processing order for query items, or whether the present technique of processing the query items in strict order of estimated cost is more advantageous (or the two techniques may turn out to be equivalent).

In the case of an AND of LEAFs, the two approaches are equivalent. The lowest-cost child is evaluated (= lowest-cost leaf), the information obtained is used to re-evaluate the cost of the remaining children, and the lowest-cost un-processed child is evaluated, etc.

In the case of a NOT of LEAF, the two approaches are also equivalent; there is only one query item to evaluate.


Implementation notes

Don't forget:

Find common ground between AddAllSubclasses and AugmentVariableOutPools. Have the new function take an initial IdList as an input. Need a way to specify the kind of reference (subject or variable) to create.

There are hard-coded 3s, 4s, etc. for number of subjects to be considered.


Emotion and Memory Research Results
letter from Chris to Bill

1 December 1998

Here's the result of my poking about in a psychology text over the weekend.

As cited in McNeil & Rubin "The Psychology of Being Human".

Emotion

More detail on Bridges (1932) work on emotional development (differentiation of emotion) in infants. General Excitement is the first "emotion", detectable pretty much from birth. Within the first month, Distress becomes distinguishable from General Excitement. By the second month, Delight can also be distinguished from General Excitement. Thereafter, other emotions are seen to derive from either Distress or Delight: Anger from Distress at month 4, Disgust from Distress at month 5, Elation from Delight at month 6, Fear from Distress at month 7, Affection for Adults at month 10, Affection for Children as a mutation of Affection for Adults at month 15, and Jealousy from Distress, also at month 15.

Ekman (1975) studied the universality of facial expressions of certain emotions. His work established that the facial expressions of happiness, sadness, anger, disgust, surprise, and fear were reliably identified by people from various mainstream cultures in Europe, the Americas, Africa, and Asia. To discount the notion that commonality of interpretation of facial expressions of emotion were due to contamination by Western culture, he extended his study to an obscure and hitherto isolated culture in New Guinea, among the denizens of which only fear and surprise were frequently confused. While this tends to establish that humans are hardwired to produce and understand these selected emotions in a common way, it does nothing to show whether these are the only emotions so favored.

Memory

Atkinson & Shiffrin (1968) proposed a model of memory in which incoming information is routed to sensory storage, from which it is either quickly lost (= not noticed) or passed through a preliminary sorting process and placed in short term memory. Short term memory is characterized as having a capacity of 5 to 9 "items", poor recall after 12 seconds, and no recall after 20 seconds. Items are maintained in short term memory by a "refresh cycle" called rehearsal, in which the items are repeated to one's self frequently. Items in short term memory are either maintained there by rehearsal, forgotten, or consolidated (coded and linked to other items in long term memory) into long term memory. (cf. Bower (1972) on the subject of coding.) Atkinson & Shiffrin noted that certain types of images (e.g. faces) went directly into long term memory, bypassing short term memory. Another assertion of their model was that items recalled from long term memory, when they were to be used in reasoning or planning, were placed into short term memory. This last I have trouble believing, except as a side effect of such activities, because short term memory (as specified in the model) does not have sufficient capacity to conduct planning or complex reasoning.

Ebbinghaus (1850-1909) spent most of his professional life memorizing lists of meaningless "words" under various conditions to test conjectures about the learning and forgetting processes. He concluded that forgetting occurs rapidly at first, then slows greatly with the passage of time. He conducted experiments showing that slow learners and fast learners forget at the same rate. His experiments also showed that when material is well learned, its retention is not dependent on the difficulty nor complexity of the material. Three specific kinds of remembering have been identified: Recall, the recounting or reproduction of learned material, vanishes most quickly with the passage of time. Recognition, the ability to identify learned material, lingers longer. Relearning differential, in which the difference between the time required to relearn material and the time required to learn it the first time are measured, can be detected long after recall and recognition have ceased to function.

Underwood (1957) developed the idea of interference to explain why some material is learned or retained better. The basic idea is that further inputs (e.g. distractions) can interfere with the learning process. He also identified two ways in which learning can interfere with itself. In proactive interference, items learned earlier prevent learning a new item. In retroactive interference, a newly-learned item interferes with the recall of a previously-learned item.

Jenkins & Dallenbach (1924) and Ekstrand (1967) found greater retention of learned material among those who slept between learning and being tested for retention, presumably because sleep exposed them to fewer distractions, subjecting them to less interference.

McNeil & Rubin (op. cit.) cite "recent research" to suggest that after a certain point, long term memory does not decay at all over time. In some cases, the material cannot be retrieved due to interference or due to not wanting to recall it ("motivated forgetting"). Classical repression is a related form of forgetfulness, in this case induced to protect one's self concept.

Memory does not seem to be highly localized (at least in rats); large chunks of memory could be removed before learned material was noticeably degraded.

From a few minutes to one hour after experiencing an event is a consolidation phase in which the memory becomes encoded in long term memory. Shocks or blows to the head during this time can induce forgetfulness, whereas similar shocks or blows after this period are unable to induce forgetfulness.

Intelligence

Thurstone (1938) identifies 8 components of intelligence: verbal comprehension, perceptual speed, numerical ability, rote memory, word fluency, spatial visualization, inductive reasoning, and deductive reasoning.


Notes from Jackendoff (1983)

19 December 1998

Cognitive elements

Common actions

Relations between things


Specification of Paths per Jackendoff (1983)

 

Specification by end point

Specification by direction (i.e. not necessarily arriving at <PLACE>)

Specification by waypoint (intermediate point)

Specification by distance

Specification by path

A <THING> can take the place of a <PLACE> in the above specification syntax. When it does, the meaning is <PLACE OF <THING>>.


How Paths are Used per Jackendoff (1983)

19 December 1998

 

<ACTION> Meaning Usage
GO Path traversal <THING> traverses <PATH>
GOext Extent / range <THING> extends over <PATH>
ORIENT Orientation <THING> is oriented along <PATH>

Words having to do with paths, and their relation to Jackendoff's path specifications

Here, I try to fit common words associated with places and paths into Jackendoff's scheme of things, but it is apparent that several path concepts, especially that of "direction", must be extended to accommodate the range of usage.

Places: OP <THING> or OP <PLACE>

at <THING> the location of the <THING>
in / inside (of) <THING> inside the extent of the <THING>
on <THING> on the boundary of the extent of the <THING>
outside (of) <THING> outside the extent of the <THING>
under <THING> a <PLACE> beneath the <THING>
above <THING> a <PLACE> above the <THING>
near <PLACE> defines a <PLACE> in the intermediate vicinity of another <PLACE>

Paths: OP <PLACE> or OP <PATH>

up / down direction or via ("up / down the road" really means "via the road", whereas "up / down the hill" really means "toward the top / base of the hill".
toward <PLACE> / away from <PLACE> direction
through <PLACE> via
east / west / north / south direction (also, "toward the East")
to <PLACE> final terminus
from <PLACE> initial terminus
into <PLACE> equivalent to TO <IN <PLACE>>
out from <PLACE> initial terminus
out of <PLACE> via ("threw it out of the window")
around <PLACE> Defines either a <PLACE> or a <PATH> in intermediate proximity to a another <PLACE>. Often used in a via <PLACE> or along <PATH> specification.
off (of) <PLACE> initial terminus
on (to) <PLACE> final terminus
forward / backward direction with respect to orientation
ahead / behind direction with respect to orientation
left /right direction with respect to orientation
upward / downward direction with respect to local vertical
along <PATH> along ("along the path", but also "along the cliff"; in the latter the <THING> defines an <EXTENT> which defines a <PATH>)
past <PLACE> via
between <PLACE> and <PLACE> defines an extent <PATH> from one <PLACE> to another <PLACE>
across <PATH> from <PLACE> direction with respect to both a location and a path ("across the road from the barber's shop")

As before, a <THING> in the place of a <PLACE> carries the meaning AT <THING>.

Direction

19 December 1998

Jackendoff has direction with respect to a place, but there are other direction specifiers.

Jackendoff has "upward" = "toward the sky" to make it with respect to a thing, but "upward" and other planetary directions are really more akin to directions with respect to an orientation than to directions with respect to things. Only in the last two centuries have we been able to define up and down with respect to a place (local center of gravity), so our minds almost certainly use the planetary reference as a reference orientation.

Distance, direction, and along

Distance is preferentially assumed to be on the along path rather than exactly in the specified direction when both a direction and an along path are specified. The sentence "Y is six miles north of X along highway 101" means that Y is six miles from X when the indicated path is taken, not that the lattitude of Y is six miles north of the lattitude of X.

Azimuth + Elevation, Heading + Altitude

Both of these pairs are directional specifiers with respect to the planetary reference, but at a higher resolution than up, down, east, etc. Azimuth, of course, is an angular measure from local north in a plane parallel to a plane tangent to the mean local surface; elevation being an angular measure above or below that plane. Heading and altitude assume travel at near-constant altitude; heading is equivalent to azimuth, and altitude specifies a <PLACE> described as a thin shell at a certain altitude from the nominal planetary surface.

The "o'clock" directions, as in "bogie at one o'clock" are a somewhat coarser direction with respect to some sensor plane (such as that of a radar display).

"Counterclockwise" and "clockwise" are also directions. They are especially suited to "around" paths, but can also be used as in "The boomerang sped in a counterclockwise arc toward the hapless victim's neck."

This brings up a type of path that Jackendoff didn't mention: the shape. His analysis shows "straight" as an undifferentiated <PROPERTY> in the phrase "straight down the road," but he misses the broader class of "shapes describing paths" to which it belongs. Other members of this class include "arc", "circle", "spiral", and "el" (L), to mention a few.

To this we might have to add changes in orientation as components of a <PATH>, as in "North on Highway 85 to El Camino, El Camino east to Fair Oaks, turn left on Fair Oaks, proceed one block, and park."

Also, it is clear (as in the last example) that Jackendoff's path elements can be concatenated to describe more complex or more detailed paths.


Back to the Destiny documents

Back to the Previous Folio of the Notebook

Onward to the Next Folio of the Notebook