Learning Robots

A Learning Circuit


[Home] [Articles]
[Intro] [Learning Circuit] [Discussion] [Contest]

On this page I'll discuss the basic circuit I developed to experiment with robot learning. While the circuit shown here uses analog circuits common to BEAM robotics, it could easily be done using digital elements or a hybrid of digital and analog. This circuit will learn which one of four possible outcomes it should choose when it is activated. That's enough choices to illustrate the learning principle, yet the circuit will fit on a double breadboard and not be too expensive to make. Later I'll show how several of these simple learning circuits can be combined to learn more complicated behaviours.

One important element in the circuit is the learning selector. A pseudo-random selector can be made by chaining together several Pulse Delay Circuits (Nv neurons) in a loop. Each neuron is on for a short period of time and as it turns off, the next neuron switches on. This provides a single pulse circulating around the loop of neurons. When the learning circuit is triggered, the currently active neuron in the loop is used to select the agent for that trial.
Variable delay neuron. A loop of standard Nv neurons can select randomly from several choices but can't provide a biased selection. To allow successful outcomes to influence the selection process, the neuron must have a variable delay that can be adjusted by other circuits. One way of accomplishing this is shown at the left. Instead of grounding resistor R5, the resistor is connected to a variable voltage source (Vr). When Vr=0, the neuron has a standard delay. As Vr rises above zero the delay increases until Vr reaches the falling threshold voltage of the inverter at which point the neuron remains turned on permanently. Several variable neurons of this type connected in a loop, each with a separately-controlled Vr connection, will give a circuit that can make either random or biased choices.

A second essential element in the circuit is the short-term memory. When the learning circuit is triggered, it must "remember" which of the learning selector outputs was "on" until after the learning cycle is complete. Short-term memory. A simple short-term memory element can be made by connecting the output of a tri-state inverter to the input of a neutrally-biased Nv neuron. As long as the tri-state inverter is enabled, the output of the circuit will be exactly the same as the input, with a delay of a few nanoseconds. When the tri-state inverter is disabled, its output goes to a high-impedence state, effectively disconnecting it. The biasing resistors (R6 & R7) will pull the voltage at the Schmitt inverter input to a point midway between its high and low thresholds, so the inverter cannot change state2. In other words, the inverter will remain indefinitely in whatever condition is was in when the enable line was turned off. When one of these memory elements is connected to each of the output lines from the learning selector, the status of the selector can be "remembered" by pulsing the enable lines of all the tri-state inverters at once.

A third component of the learning circuit is the long-term memory which is used to store the biasing voltages (Vr) of the Nv neurons in the learning selector. Long-term memory. The simplest way to "remember" a voltage is to use a large capactitor (C3). However, if the capacitor is connected directly to the neuron, the neuron will alter the voltage on the capacitor each time it fires and will gradually alter the "memory". To prevent this unwanted feedback, an LM324 op-amp is used to isolate the capacitor. The circuit at the left shows how the charge (voltage) on C3 can be changed by sending short pulses into the "Fail" and "Succeed" lines. A high-going pulse on the succeed line increases the voltage slightly, while a low-going pulse on the fail line decreases the voltage. Resistors R3 and R4 determine how much the voltage changes with each pulse. This arrangement is not perfect. The op-amp has an input impedence of around 400 megohms and will act as a 400 Meg resistor connected to Vcc. This is still much better than connecting C3 directly to the Nv neuron in the selector.

The Complete Learning Circuit

The learning circuit is assembled from the sub-circuits discussed above, plus a few extra elements. When triggered it will select one of 4 outputs which can be use to activate various agents. After several successes, the circuit will only select the output that led to the previous successes.

Learning circuit.

The learning selector is in the upper left area of the diagram. It contains 4 variable delay neurons connected in a loop. The diodes (D3) ensure that the circuit starts with only one output active at a time. The optional LED's don't contribute to the actual functioning of the circuit but they are very useful for debugging. They also show the changing delays as the circuit learns. A single current-limiting resistor is used for the 4 LED's and must be sized for the power supply voltage and the LED current rating.

The short-term memory elements are shown in the upper right, one connected to each of the learning selector neurons. The 74HC240 tri-state inverter in the memory neurons uses a single enable line to control all 4 devices.

The 4 LED's in the lower right of the diagram show which output has been selected. They can be used for debugging and also for "teaching" the circuit. Choose which LED is the "goal" and each time it lights up, press the "success" button (see below). Eventually the circuit will learn to light only the "goal" LED.

Four identical long-term learning memory cells are shown in the lower left part of the diagram shows. One of these is exploded to show the details. Each memory cell contains a long-term memory element as described above, plus two neurons and an inverter to send fail/succeed pulses to the memory. A single pair of fail/succeed lines is connected to all 4 memory units. However, the neurons that create the pulses which change the cell's memory are normally disabled. Only the memory cell that is connected to the active output is enabled, so it is the only one that can receive a fail or succeed signal.

The behaviour of this arrangement for the long-term memory cell is particularly suitable to the learning mechanism. As the circuit experiences a few successes, the charge on C3 rises slowly. At first this has only a very slight effect on the learning selector. Although the charge is steadily increasing, the less successful agents have almost as much chance of being selected. Finally, however, the charge on C3 begins to approach the threshold of the selector neuron. This increases the delay to the point where it greatly increases the chances of the successful agent being chosen.

Special care needs to be taken with this circuit. Bypass capacitors (between ground and Vcc) must be located near all the IC power lines. As the charge on C3 approaches the falling threshold of the selector neuron it becomes extremely sensitive to electronic noise caused when other neurons in the circuit fire. The 0.1 uF bypass capacitors (shown in the second circuit diagram) reduce the effect of this noise.

The following table lists the component values I used in my test circuit. These can be obtained from most electronic supply firms. Solarbotics does not carry many of the components I used so I have provided a list of substitutes that ought to work. At the bottom of the page I provide some formulae for calculating other values in case you don't have exactly what is listed here.

Component Qty Test value Solarbotics part
R1 4 120 k 680 k
R2 4 120 k 680 k
R3 4 4.7 k 1.0 k
R4 4 22 k 2.2 k
R5 4 220 k 220 k
R6 4 470 k 470 k
R7 4 360 k 360 k
C1 4 0.22 uF 0.10 uF
C2 4 0.22 uF 0.10 uF
C3 4 100 uF 1000 uF
C4 4 0.22 uF 0.22 uF
C5 4 0.10 uF 0.10 uF
Bypass Cap 6 0.10 uF 0.10 uF
D1-D3 10 1N4148 1N914
LED 8 10 mA T1-3/4 Ultra-Bright
IC 2 74HC14 74AC14
IC 3 74HC240 74AC240
IC 1 LM324 n/a

Trigger and Feedback Functions

To experiment with the learning circuit a simple interface circuit is useful. This will provide the trigger, fail, and succeed inputs to the learning circuit and can also start an agent after it has been selected.

Learning circuit - feedback & trigger.

In the upper left of the diagram is the "start" input. A high pulse at this input will start a learning attempt. For experimenting, a pushbutton can be used to provide the pulse. The start pulse fires Nv1, which limits the pulse duration to a millisecond or so. Nv1 also triggers Nv2, which is a filter neuron. This neuron is held active through diodes D6 until the learning cycle has finished. During this period, any extra "start" signals received by Nv1 will be blocked. The input labelled "wait" can also be used to block a "start" signal. This is useful if a robotic device has completed a learning attempt but has to return to a starting position before the next attempt can begin.

Nv3 fires a few nanoseconds after Nv2 and does three things. It sends the "trigger" signal to input (A) in the learning circuit, it sends a signal that will start the selected agent operating, and it triggers Nv4 which has a very long delay. Nv4 must remain active at least as long as it will take to complete the learning attempt. When Nv4 times out, it triggers Nv5 which sends a pulse to the "fail" input (B) of the learning circuit. This is the default condition -- if the attempt didn't succeed by the time Nv4 times out, it's assumed to have failed. The LED connected to the output of Nv4 indicates that an attempt is in progress. When the LED goes out, the attempt is finished and another attempt can be made at any time.

The input at the middle left signals a successful attempt. It can be a high pulse, or a "teacher" can press a button any time an attempt succeeds. The pulse fires Nv6 which sends a pulse to the "succeed" input (C) of the learning circuit. Nv6 also cuts short the delay on Nv4 through dioded D4 and resistor R14. At the same time, it holds the "fail" neuron (Nv5) in an "off" condition through diode D5. This prevents Nv4 from triggering Nv5, so the "fail" signal is automatically blocked by a "succeed" signal. R14 is used to introduce a tiny delay before Nv4 is turned off which ensures that Nv5 is securely turned off before Nv4 can trigger it.

The following chart shows the values of the various components used in the original test circuit, with Solarbotics substitutes.

Component Qty Test value Solarbotics part
R8 1 10 k 4.7 k
R9 1 100 k 100 k
R10 1 1.0 k 1.0 k
R11 1 3.0 Meg 3.6 Meg
R12 1 10 k 4.7 k
R13 1 10 k 4.7 k
R14 1 220 ohm 470 ohm
C6 1 0.10 uF 0.10 uF
C7 1 0.10 uF 0.10 uF
C8 1 0.10 uF 0.10 uF
C9 1 1.0 uF 1.0 uF
C10 1 0.10 uF 0.10 uF
C11 1 0.10 uF 0.10 uF
Bypass Cap 1 0.10 uF 0.10 uF
D4-D6 4 1N4148 1N914
LED 1 10 mA T1-3/4 Ultra-Bright
IC 1 74HC14 74AC14
Switch 2 SPST momentary SWT5

Component selection

The tables give the values of the components I used in my test circuit, plus some substitute values available from Solarbotics. The following formulae will let you calculate other values which you may already have.

  • The most important relationships concern the long-term memory components. While you can play around with the values, there is one very important restriction: C3 must be large! Use an electrolytic cap at least 47 uF in size.
                  R3 x C3
       N = 0.60 x -------
                  R1 x C1
    
       Where N is the approximate number of successes required to finish learning.
       
    When you're "teaching" the circuit, if you make N too high you will spend a lot to time pressing the "success" button. If you make N too low, the circuit becomes a very complicated switch. I suggest a starting value of between 6 and 12. Ideally all 4 long-term memory cells will have identical values. But if you're short a couple of components, the important thing is to get the value of N approximately the same in each cell.

  • It's a good idea to require at least two failures to "undo" a success. That way if you don't press the "success" button in time, the resulting fail signal won't completely undo the last success. There's two ways to work this out. The first is easiest; the second one is useful if you don't have 8 sets of identical components.
       R2 = R1
       C2 = C1
       R4 >= 2 x R3
    
       -- or --
    
                 R2 x C2
       R4 >= 2 x ------- x R3
                 R1 x C1
       

  • The values of R5 and C4 are very flexible. The idea is to get a loop that "runs" fairly quickly; otherwise it tends to lose its randomness. On the other hand, for testing purposes the loop should not be too fast or the optional LED's won't stay lit long enough to see. As a rough guide, use
       R5 x C4 < 0.05
    
       where R5 is in megohms and C4 is in microfarads.
       

  • C5 can be any small value from 0.01 uF to 0.22 uF. R6 and R7 are selected so the voltage at the inverter input lies between the upper and lower thresholds of the Schmitt inverter. These numbers will vary from chip to chip, but a rough guide is
       R6/R7 = 1.4  (between 1.1 and 1.8 should be safe)
       
    Keep in mind that R6 & R7 are connected in series across the supply lines, so current will be flowing through them all the time. The values I chose will draw 7 microamps per memory cell from a 6 volt supply -- insignificant compared to the other current draws in the circuit. Try to keep R6 above 22 kilohms to avoid excess current draw.

  • In the trigger circuit the values are very flexible. You can be off by a factor of 2 or 3 either way and still be in good shape. Smaller is better than bigger. The one value you may need to pay attention to is the delay of Nv4. When you connect this circuit to a robotic device, the delay of Nv4 must be greater than the longest time it will take the mechanism to succeed.
       R11 x C9 > maximum delay time.
    
       where R11 is in megohms and C9 is in microfarads.
       

IC pinouts

IC pinouts.

Circuit operation

"Teaching" the circuit is the best way to test it and to see how it works. Power it up and press the "start" button once to reset all the neurons. Then begin helping the circuit to learn.
  1. Choose one of the output LED's to be the "proper" output.
  2. Press the "start" button.
  3. If the "proper" output lights up, press the "success" button.
  4. Otherwise, wait until the "Learning in Progress Indicator" LED goes out.
  5. Repeat starting at step 2.
If you installed the Optional Monitor LED's, you can watch your circuit learn.

[<= Prev] [Next =>]


Notes:
  1. John A. DeVries II first brought this type of memory "neuron" to my attention. I didn't see how to separate the neuron from its input until I hit on the idea of preceding it with a tri-state inverter.


[Home] [Articles]
[Intro] [Learning Circuit] [Discussion] [Contest]

Copyright © 2002 Bruce N. Robinson. Last updated 2002-08-21.