A Learning Circuit
On this page I'll discuss the basic circuit I developed to experiment with robot learning. While the circuit shown here uses analog circuits common to BEAM robotics, it could easily be done using digital elements or a hybrid of digital and analog. This circuit will learn which one of four possible outcomes it should choose when it is activated. That's enough choices to illustrate the learning principle, yet the circuit will fit on a double breadboard and not be too expensive to make. Later I'll show how several of these simple learning circuits can be combined to learn more complicated behaviours.
One important element in the circuit is the learning selector. A pseudo-random
selector can be made by chaining together several Pulse Delay Circuits (Nv neurons) in
a loop. Each neuron is on for a short period of time and as it turns off, the next
neuron switches on. This provides a single pulse circulating around the loop of neurons.
When the learning circuit is triggered, the currently active neuron in the loop
is used to select the agent for that trial.
A second essential element in the circuit is the short-term memory. When the
learning circuit is triggered, it must "remember" which of the learning selector
outputs was "on" until after the learning cycle is complete.
A simple short-term memory element can be made by connecting the output of a tri-state
inverter to the input of a neutrally-biased Nv neuron. As long as the tri-state
inverter is enabled, the output of the circuit will be exactly the same as the input,
with a delay of a few nanoseconds. When the tri-state inverter is disabled, its
output goes to a high-impedence state, effectively disconnecting it. The biasing
resistors (R6 & R7) will pull the voltage at the Schmitt inverter input to a point
midway between its high and low thresholds, so the inverter cannot change
other words, the inverter will remain indefinitely in whatever condition is was in
when the enable line was turned off. When one of these memory elements is connected
to each of the output lines from the learning selector, the status of the selector
can be "remembered" by pulsing the enable lines of all the tri-state inverters at
A third component of the learning circuit is the long-term memory which is
used to store the biasing voltages (Vr) of the Nv neurons in the learning selector.
The simplest way to "remember" a voltage is to use a large capactitor (C3). However, if
the capacitor is connected directly to the neuron, the neuron will alter the voltage
on the capacitor each time it fires and will gradually alter the "memory". To prevent
this unwanted feedback, an LM324 op-amp is used to isolate the capacitor. The circuit at
the left shows how the charge (voltage) on C3 can be changed by sending short pulses
into the "Fail" and "Succeed" lines. A high-going pulse on the succeed line increases
the voltage slightly, while a low-going pulse on the fail line decreases the voltage.
Resistors R3 and R4 determine how much the voltage changes with each pulse. This
arrangement is not perfect. The op-amp has an input impedence of around 400 megohms
and will act as a 400 Meg resistor connected to Vcc. This is still much better than
connecting C3 directly to the Nv neuron in the selector.
The Complete Learning Circuit
The learning circuit is assembled from the sub-circuits discussed above, plus a few extra elements. When triggered it will select one of 4 outputs which can be use to activate various agents. After several successes, the circuit will only select the output that led to the previous successes.
The learning selector is in the upper left area of the diagram. It contains 4 variable delay neurons connected in a loop. The diodes (D3) ensure that the circuit starts with only one output active at a time. The optional LED's don't contribute to the actual functioning of the circuit but they are very useful for debugging. They also show the changing delays as the circuit learns. A single current-limiting resistor is used for the 4 LED's and must be sized for the power supply voltage and the LED current rating.
The short-term memory elements are shown in the upper right, one connected to each of the learning selector neurons. The 74HC240 tri-state inverter in the memory neurons uses a single enable line to control all 4 devices.
The 4 LED's in the lower right of the diagram show which output has been selected. They can be used for debugging and also for "teaching" the circuit. Choose which LED is the "goal" and each time it lights up, press the "success" button (see below). Eventually the circuit will learn to light only the "goal" LED.
Four identical long-term learning memory cells are shown in the lower left part of the diagram shows. One of these is exploded to show the details. Each memory cell contains a long-term memory element as described above, plus two neurons and an inverter to send fail/succeed pulses to the memory. A single pair of fail/succeed lines is connected to all 4 memory units. However, the neurons that create the pulses which change the cell's memory are normally disabled. Only the memory cell that is connected to the active output is enabled, so it is the only one that can receive a fail or succeed signal.
The behaviour of this arrangement for the long-term memory cell is particularly suitable to the learning mechanism. As the circuit experiences a few successes, the charge on C3 rises slowly. At first this has only a very slight effect on the learning selector. Although the charge is steadily increasing, the less successful agents have almost as much chance of being selected. Finally, however, the charge on C3 begins to approach the threshold of the selector neuron. This increases the delay to the point where it greatly increases the chances of the successful agent being chosen.
Special care needs to be taken with this circuit. Bypass capacitors (between ground and Vcc) must be located near all the IC power lines. As the charge on C3 approaches the falling threshold of the selector neuron it becomes extremely sensitive to electronic noise caused when other neurons in the circuit fire. The 0.1 uF bypass capacitors (shown in the second circuit diagram) reduce the effect of this noise.
The following table lists the component values I used in my test circuit. These can be obtained from most electronic supply firms. Solarbotics does not carry many of the components I used so I have provided a list of substitutes that ought to work. At the bottom of the page I provide some formulae for calculating other values in case you don't have exactly what is listed here.
Trigger and Feedback FunctionsTo experiment with the learning circuit a simple interface circuit is useful. This will provide the trigger, fail, and succeed inputs to the learning circuit and can also start an agent after it has been selected.
In the upper left of the diagram is the "start" input. A high pulse at this input will start a learning attempt. For experimenting, a pushbutton can be used to provide the pulse. The start pulse fires Nv1, which limits the pulse duration to a millisecond or so. Nv1 also triggers Nv2, which is a filter neuron. This neuron is held active through diodes D6 until the learning cycle has finished. During this period, any extra "start" signals received by Nv1 will be blocked. The input labelled "wait" can also be used to block a "start" signal. This is useful if a robotic device has completed a learning attempt but has to return to a starting position before the next attempt can begin.
Nv3 fires a few nanoseconds after Nv2 and does three things. It sends the "trigger" signal to input (A) in the learning circuit, it sends a signal that will start the selected agent operating, and it triggers Nv4 which has a very long delay. Nv4 must remain active at least as long as it will take to complete the learning attempt. When Nv4 times out, it triggers Nv5 which sends a pulse to the "fail" input (B) of the learning circuit. This is the default condition -- if the attempt didn't succeed by the time Nv4 times out, it's assumed to have failed. The LED connected to the output of Nv4 indicates that an attempt is in progress. When the LED goes out, the attempt is finished and another attempt can be made at any time.
The input at the middle left signals a successful attempt. It can be a high pulse, or a "teacher" can press a button any time an attempt succeeds. The pulse fires Nv6 which sends a pulse to the "succeed" input (C) of the learning circuit. Nv6 also cuts short the delay on Nv4 through dioded D4 and resistor R14. At the same time, it holds the "fail" neuron (Nv5) in an "off" condition through diode D5. This prevents Nv4 from triggering Nv5, so the "fail" signal is automatically blocked by a "succeed" signal. R14 is used to introduce a tiny delay before Nv4 is turned off which ensures that Nv5 is securely turned off before Nv4 can trigger it.
The following chart shows the values of the various components used in the original test circuit, with Solarbotics substitutes.
Component selectionThe tables give the values of the components I used in my test circuit, plus some substitute values available from Solarbotics. The following formulae will let you calculate other values which you may already have.
Circuit operation"Teaching" the circuit is the best way to test it and to see how it works. Power it up and press the "start" button once to reset all the neurons. Then begin helping the circuit to learn.