Learning Robots

Introduction


[Home] [Articles]
[Intro] [Discussion] [Contest]

Learn - Acquire knowledge, skill, etc.

What do we mean when we say someone learns something? Can a robot learn? And if it can, does it need a computer to do the learning? These are some of the issues I'll discuss in this article. I'll also present a simple, general purpose circuit that learns, and I'll discuss the winning entry in a contest I held to promote learning circuits for robots.

An Operational Definition of Learning

Did Mark Tilden's walking robot, Walkman, "learn" to deal with obstacles in its path, as some people claim? Or did it just adapt to different physical surroundings? Some will argue that it didn't really learn because it didn't use reasoning to deal with the obstacle. Others will argue that it didn't remember how it dealt with an obstacle so it didn't really learn. These are weak arguments at best. Reasoning is a good problem solving tool but it's not an essential part of learning. Many animals learn without reasoning. The memory argument is a more compelling one, but how many people remember everything they learned in school? While memory plays an important role in learning, we can't very well insist that something that was "learned" must be remembered forever. Where do we draw the line?

The definition at the top of this page describes just one of many different kinds of learning. Unfortunately, dictionary definitions can only tell us how words are used. They don't tell us, for example, what is involved in "learning". Nor are they concerned with how to recognize the difference between actual learning and the appearance of learning. Those issues are left to "experts" -- who often can't agree among themselves.

A useful way to explore these issues is to examine the process of learning. Then when a debate arises about whether a robot is actually learning or just appearing to learn we can look inside the robot and see if it contains the required process. More important, the process description can help us design a learning circuit or procedure. Here are the essential elements in the learning process.

  1. A goal.
  2. An agent1, not yet discovered, that will act to reach the goal.
  3. A way to select which agent to try, and a way to start the attempt.
  4. A way to measure success or failure.
  5. A way to influence the selection process, so the next time it is called upon it is more likely to use a successful agent.
  6. A way to accumulate successful repetitions so that eventually the "best" agent is choosen every time.
  7. Repetion of the process (from step 3) until the way to reach the goal has been learned.

The goal can be the solution to a problem, a physical motion, or a skill that has to be acquired. The learner doesn't actually need to know what the goal is, or even that there is one. The notion that a goal might be unknown to the learner bothers some people. How can people (or machines) learn if they don't know what it is they're trying to achieve? Consider the pupils in an elementary school, reciting the multiplication tables. The goal is to get each one of 55 different product combinations correct. But how can they tell if an answer is correct when they haven't learned the answer yet? They must take guidance from the teacher, who does know what the goal is.

There could be just one agent that is able to reach the goal. Or there might be several possible agents; which is best? It may even be that there is no way to reach the goal. Discovering that fact is also a form of learning. Some critics object to the notion that people already possess the capability to reach all the goals they aspire to. Why bother with learning at all? If all the neurons and muscles and bones that a baby needs to walk are already in place, why should the baby have to go through so much effort to learn to walk? Why not just "hardwire" humans so they can walk at birth? There are some important reasons why we need to learn something like walking. One is the fact that we grow. Our size and shape change. We have to deal with more momentum as we grow heavier. If we had walking skills programmed into us at birth, we would need thousands of different sets of those skills to handle all the different possible combinations of size and shape we might experience from birth to adulthood. Learning only the skills we need is more efficient. Yet all the fundamental pieces we need to learn those skills were provided to us at birth.

Selection methods vary tremendously. When human babies first learn how to move, they wave their arms and legs around randomly. This is the simplest form of selection -- random choice. Later, as they acquire a few basic movement skills, they will select from skills that already work in other situations. If existing skills don't work, they will select skills similar to those that almost worked. And sometimes they will have to construct new skills from basic building blocks that they already possess.

Measuring success can be simple; was the goal reached, yes or no? It can also involve deciding how close the result was to the goal. But what about when the learner doesn't know what the goal is? How is it possible to measure success then? The answer: whoever "knows" the goal can provide information about success.

Influencing the selection process involves two things. First, when the goal is reached the learner has to remember which agent he/she/it used. Second, the learner has to alter the selection process so the chance of selecting the successful agent again increases. "Remembering" implies some kind of short-term memory. Why not permanent memory? Because learning frequently demands a great deal of repetition. Imagine how much permanent memory would be need to remember every single result of every single attempt ever made to learn something. Most of that information is useless anyway, once a skill is learned.

To accumulate successes, some kind of long-term memory is needed. Human learning usually takes place over an extended period of time and there's no reason to expect robot learning to be different. In any event, when a skill has finally been learned, the learner has to remember it. (Humans don't consciously remember how to move skillfully, but they do remember. In the sports world this is often called "muscle memory".)

Why do humans need so much repetition needed to learn something? True, if the first few attempts to reach a goal fail we have to keep trying. But once the goal is reached, why not stop there? It certainly seems like it would be more efficient. There will be times when we only need to succeed once to learn something. However, for many skills, especially physical ones, a great deal of repetition seems to be desirable. Why?

When we want to learn precise movement skills, there is a good chance that several different agents will "come close" but not be very precise. How could we know if another attempt would give a better result, unless we tried again? Another important reason for having to obtain several successes before learning something has to do with the learning environment. What if, by chance, something happened in the enviroment that caused us to be successful? Pure luck! If we "learned" with that one success, we could never repeat the success unless the freak event occurred again. Even more likely, in a complicated environment some random event might confuse our measurement of success. Then we'd think we were successful when we weren't. Once again, learning with just one successful attempt leads us astray. In a contrived environment we might need only a few successes to learn something. But humans live (and always have) in a very complicated environment. We seemed to be designed to require a great deal of repetition to learn things, especially physical skills.

This discussion of the learning process has laid the groundwork for developing a system of robotic learning. The following diagram shows the structure of a learning circuit.

Learning structure.

Robotic Learning

What sort of learning might a robot need to do? Is the learning process described above general enough to deal with all types of learning? It's very likely that a robot must learn to recognize patterns of sound and vision but this is a very complex application, better suited to micro-processors. It might also be useful for a robot to learn to move skillfully. That should be a little more within the scope of the hobbyist.

At first, it seems that the reasons humans have to learn so many movement skills won't apply to robots. We have to re-learn how to move ourselves as we grow, but robots don't grow. Would it not be easier to program a robot to walk, right from the start? Or to build in all the various combinations of movements a particular type of robot might have to do.

If we were building identical robots to perform identical tasks in a very constrained environment, this would work. It's been done, in fact. But what about general-purpose robots working in a variety of environments. Trying to program them to deal with every possible demand would involve huge amounts of hardware. Even compact microprocessors might not be up to the job. An alternative is to provide plug-in modules that allow a robot to perform specific tasks. This would allow the robot to be configured to do only those chores it was needed for. But is this easier than having a robot learn the tasks? Wouldn't it be better to build in some general patterns of movement and then have the robot learn specific skills, starting with a few approximate patterns?

Most of the elements of the learning process can be created using microprocessors, or discrete digital components, or even analog circuits. And while different types of learning may need some specialized features, the general outline can serve as a model for special learning circuits. There is really only one part of the process that presents serious difficulties -- measuring success. Or to put it another way, observing the result and comparing it to the goal. To a very large extent, this is a sensory problem. It's interesting that so many people believe a micro-processor is a pre-requisite for robot learning, yet the greatest difficulty in creating a learning robot is to acquire and process sensory information. That problem exists regardless of what hardware is used to learn.

The simplest way to measure success is to have a "teacher" press a button whenever a successful attempt is made. This gets pretty boring in a hurry (for the teacher; the robot doesn't care.) Going a step further, a robot might be physically constructed so a successful attempt pushes the button automatically. That eliminates the teacher. From there the complexity of recognizing success rises dramatically. To some extent the robot designer can control the type of sensory information needed to measure success. Ingenuity rather than technololgy is the point. For initial experiments with learning circuits, the best approach is the simple one -- keep the task itself simple so it can be easily measured.

Circuits

This is the latest version of the original circuit which I developed to show that a simple analog learning circuit was possible. It's still quite primitive and there is lots of room for improvement. The circuit can easily be emulated on a small processor, which is exactly what the contest winner did. To begin with, here are some of the elements used in the learning circuit.

One important element is the learning selector. A pseudo-random selector can be made by chaining together several Pulse Delay Circuits (Nv neurons) in a loop. Each neuron is on for a short period of time and as it turns off, the next neuron switches on. This provides a single pulse circulating around the loop of neurons. When a response is called for the currently active neuron in the loop is used to select the agent for that trial.
Variable delay neuron. A loop of standard Nv neurons can select randomly from several choices but can't provide a biased selection. To allow successful outcomes to influence the selection process, the neuron must have a variable delay that can be adjusted by other circuits. One way of accomplishing this is shown at the left. Instead of grounding resistor R5, the resistor is connected to a variable voltage source (Vr). When Vr=0, the neuron has a standard delay. As Vr rises above zero the delay increases until Vr reaches the falling threshold voltage (Vt-) of the inverter at which point the neuron remains turned on permanently. Several variable neurons of this type connected in a loop, each with a separately-controlled Vr connection, will give a circuit that can make either random or biased choices.

A second essential element in the circuit is the short-term memory. When the learning circuit is triggered, it must "remember" which of the learning selector outputs was "on" until after the learning cycle is complete. Short-term memory. A simple short-term memory element can be made by connecting the output of a tri-state inverter to the input of a neutrally-biased Nv neuron. As long as the tri-state inverter is enabled, the output of the circuit will be exactly the same as the input, with a delay of a few nanoseconds. When the tri-state inverter is disabled, its output goes to a high-impedence state, effectively disconnecting it. The biasing resistors (R6 & R7) will pull the voltage at the Schmitt inverter input to a point midway between its high and low thresholds, so the inverter cannot change state2. In other words, the inverter will remain indefinitely in whatever condition is was in when the enable line was turned off. When one of these memory elements is connected to each of the output lines from the learning selector, the status of the selector can be "remembered" by pulsing the enable lines of all the tri-state inverters at once.

A third component of the learning circuit is the long-term memory which is used to store the biasing voltages (Vr) of the Nv neurons in the learning selector. Long-term memory. The simplest way to "remember" a voltage is to use a large capactitor (C3). However, if the capacitor is connected directly to the neuron, the neuron will lower the voltage on the capacitor each time it fires and this will gradually alter the "memory". Originally I used a voltage follower to isolate the capacitor. This worked but it introduced new problems. In the current verision of the circuit I use a drain resistor and a feedback resistor from the neuron (R3 & R4 respectively) to gradually draw the capacitor voltage toward an equilibrium point. Learning signals from the "Fail" and "Succeed" lines upset this equilibrium. A high-going pulse on the succeed line increases the voltage slightly, while a low-going pulse on the fail line decreases the voltage. Resistor R2 and the length of the pulse determine how much the voltage changes. A 74HC4066 analog switch directs the succeed/fail pulse to the appropriate memory unit.

The Complete Learning Circuit

The learning circuit is assembled from the sub-circuits discussed above, plus a few extra elements. When triggered it will select one of 4 outputs which can be use to activate various agents. After several successes, the circuit will only select the output that led to the previous successes.

Learning circuit.

The learning selector is in the upper left area of the diagram. It contains 4 variable delay neurons connected in a loop. The diodes (D3) ensure that the circuit starts with only one output active at a time. The output of each neuron is directed through an inverter to produce the proper feedback signal and the inverter outputs then go to four short-term memory cells.

The short-term memory elements are shown in the upper right, one connected to each of the learning selector neurons. The 74HC240 tri-state inverter in the memory neurons uses a single enable line to control all 4 devices. When the learning selector loop operates at relatively high speed the short term memory can be affected by circuit noise. Adding 0.1 uF bypass capacitors (C6) solves this problem.

The 4 LED's in the middle right of the diagram show which output has been selected. They can be used for debugging and also for testing the circuit. Choose which LED is the "goal" and each time it lights up, press the "success" button (see below). Eventually the circuit will learn to light only the "goal" LED.

Four identical long-term learning memory cells are shown in the middle left part of the diagram shows. One of these is exploded to show the details. Each memory cell is connected to a common success/fail line and also has its own selector line from a short-term memory cell. An inverted feedback signal comes from the Nv neuron and the memory output line is connected to the delay resistor (R5) of the same Nv neuron.

To the lower left is one example of how a success/fail signal can be generated. There are many possible ways of producing these signals -- what is important here is that the pulse length (determined by C1 x R1 and C2 x R1a) and the output resistor (R2) produce an appropriate change in the long term memory. It's also very important to prevent simultaneous success and fail signals from occuring, as this will cause a direct short circuit through diodes D1 and D2. In the example circuit, diode D5 blocks the fail signal whenever the success neuron is active.

Finally, a set of optional LED's is shown at the lower right. These are very useful for debugging as they show how the delays in the selection loop change as the circuit "learns". Just one current-limiting resistor is required for all four LED's.

The behaviour of this arrangement for the long-term memory cell is particularly suitable to the learning mechanism. As the circuit experiences a few successes, the charge on the "successful" C3 rises slowly. A failed attempt will lower the charge on the "unsuccessful" C3. At first this has only a very slight effect on the learning selector. The less successful agents have almost as much chance of being selected as the successful agent. Finally, however, the charge on one C3 capacitor begins to approach the threshold of the selector neuron. This increases the delay to the point where the successful agent is far more likely to be chosen.

Special care is needed with this circuit. Bypass capacitors (between ground and Vcc) must be located near all the IC power lines. As the charge on C3 approaches the falling threshold of the selector neuron it becomes extremely sensitive to electronic noise caused when other neurons in the circuit fire. The 0.1 uF bypass capacitors (shown in the second circuit diagram) reduce the effect of this noise.

The component values I used are shown on the diagram. Guidelines for changing these values are given later on this page.

Trigger and Feedback Functions

To experiment with the learning circuit a simple interface circuit is useful. This will provide the trigger, fail, and succeed inputs to the learning circuit and can also start an agent after it has been selected.

Learning circuit - feedback & trigger.

In the upper left of the diagram is the start input. A low pulse at this input will start a learning attempt. For experimenting, a pushbutton can be used to provide the pulse. The start pulse fires Nv2, which immediately triggers Nv3 in turn. Nv2 is a filter neuron which prevents additional start signals from interupting the learning sequence. It is is held active through diodes D6 until the learning cycle has finished. The input labelled wait allows external agents to block a start signal as well. This is useful if a robotic device has completed a learning attempt but has to return to a starting position before the next attempt can begin.

Nv3 fires a few nanoseconds after Nv2 and does three things. It sends the "trigger" signal to input (A) in the learning circuit, it sends a signal that will start the selected agent operating, and it triggers Nv4 which has a very long delay. Nv4 must remain active at least as long as it will take to complete the learning attempt. When Nv4 times out, it triggers Nv5 which sends a pulse to the fail input (B) of the learning circuit. This is the default condition -- if the attempt didn't succeed by the time Nv4 times out, it's assumed to have failed. The LED connected to the output of Nv4 indicates that an attempt is in progress. When the LED goes out, the attempt is finished and another attempt can be made at any time.

The input at the middle left signals a successful attempt. It can be a high pulse, or a "teacher" can press a button any time an attempt succeeds. The pulse fires Nv6 which sends an inverted pulse to the "succeed" input (C) of the learning circuit. Nv6 also cuts short the delay on Nv4 through dioded D4 and resistor R14. At the same time, it holds the "fail" neuron (Nv5) in an "off" condition through diode D5. This prevents Nv4 from triggering Nv5, so the "fail" signal is automatically blocked by a "succeed" signal. R14 is used to introduce a tiny delay before Nv4 is turned off which ensures that Nv5 is securely turned off before Nv4 can trigger it.

The following chart shows the values of the various components used in the current trigger circuit.

Component Qty Value Comment
R9 1120 k Not critical.
C7 10.10 uF
R1011.0 k Not critical.
C8 10.10 uF
R1113.0 Meg Large enough to complete a learning attempt.
C9 11.0 uF
R141220 ohmNot critical
R1 & R1a2220 k R x C should be constant.
C1 & C2 20.10 uF
D4 - D641N4148 or 1N914

Component selection

The following formulae offer some guidelines for calculate other component values. I suggest you start with the ones shown on the diagram if they're available.

  • The most important relationships concern the long-term memory components. While you can play around with the values, there is one very important restriction: C3 must be large! Use an electrolytic cap at least 47 uF in size.
                  R2 x C3
       N = 0.60 x -------
                  R1 x C1
    
       Where N is the approximate number of successes required to finish learning.
       
    If N is too high you will spend a lot to time pressing the "success" button when you're testing the circuit. If N too is low, the circuit is just a switch. A starting value of around 6 seems reasonable. N is also affected by the feedback and drain resistors (R3 & R4) and the interval between trials so it isn't an exact number. Ideally all 4 long-term memory cells will have identical values but this is not absolutely essential.

  • The drain and feedback resistors (R3 and R4) have a complex relationship with the other memory variables. If they are too small, the interval between learning trials must be very small or the circuit will never learn. A good guideline is to make R3 as large a practical -- at least 5 megohms and preferably 10 megohms. R4 must be no larger than R3 and a better value is to have it about 1/2 of R3.

  • The values of R5 and C4 are very flexible. The idea is to get a loop that "runs" fairly quickly; otherwise it tends to lose its randomness. On the other hand, for testing purposes the loop should not be too fast or the optional LED's won't stay lit long enough to see. As a rough guide, use
       R5 x C4 < 0.05
    
       where R5 is in megohms and C4 is in microfarads.
       
    You may want to temporarily make C4 much larger than the planned value so you can watch the circuit operate and possibly take voltage measurements. I initially used 0.22 uF capacitors instead of 0.01 uF caps. The circuit will behave exactly the same but will be much, much slower.

  • I had to add the bypass capacitors (C6) when I sped the selection loop up to its final speed. As a rough guide, they should be approximately the same size as C5 -- certainly no larger. C5 can be any small value from 0.01 uF to 0.22 uF. R6 and R7 are selected so the voltage at the inverter input lies between the upper and lower thresholds of the Schmitt inverter. These numbers will vary from chip to chip, but a rough guide is
       R7/R6 = 1.45  
       
    In fact, any adjacent pair of resistors in the series 1, 1.5, 2.2, 3.3, 4.7, 6.8, 10 series will be exactly right. Keep in mind that R6 & R7 are connected in series across the supply lines, so current will be flowing through them all the time. I originally used values measuring in the hundreds of kilohms but this made the short-term memory sluggish and unreliable. The present values work quite well and don't draw too much current.

  • In the trigger circuit the values are very flexible. You can be off by a factor of 2 or 3 either way and still be in good shape. Smaller is better than bigger. The one value you may need to pay attention to is the delay of Nv4. When you connect this circuit to a robotic device, the delay of Nv4 must be greater than the longest time it will take the mechanism to succeed.
       R11 x C9 > maximum delay time.
    
       where R11 is in megohms and C9 is in microfarads.
       

IC pinouts

IC pinouts.

Circuit operation

"Teaching" the circuit is the best way to test it and to see how it works. Power it up and press the "start" button once to reset all the neurons. The circuit will actually perform a little better if you let it run freely for at least 15 minutes before starting to teach it. This will allow the charge on the memory capacitors (C3) to rise to their equilibrium value of about 1.2 volts. This is not essential -- it just allows failed learning attempts to have some influence on the selection loop.

To test the circuit

  1. Choose one of the output LED's to be the "goal".
  2. Press the "start" button.
  3. If the "goal" lights up, press the "success" button.
  4. Otherwise, wait until the "Learning in Progress Indicator" LED goes out.
  5. Wait 5 or 10 seconds and repeat starting at step 2.
If you installed the Optional Monitor LED's, you can watch your circuit learn.

[Next =>]


Notes:
  1. I borrowed the term "agent" from Marvin Minsky. In the context of this article, an agent is something that the learning circuit uses to try to reach its goal. An agent can be a circuit, a motor, a motorized limb, or even a completely independent robot.
  2. John A. DeVries II first brought this type of memory "neuron" to my attention. I didn't see how to separate the neuron from its input until I hit on the idea of preceding it with a tri-state inverter.


[Home] [Articles]
[Intro] [Discussion] [Contest]

Copyright © 2002 Bruce N. Robinson. Last updated 2002-12-15.