Learning RobotsIntroduction
Learn - Acquire knowledge, skill, etc.
What do we mean when we say someone learns something? Can a robot learn? And if it can, does it need a computer to do the learning? These are some of the issues I'll discuss in this article. I'll also present a simple, general purpose circuit that learns, and I'll discuss the winning entry in a contest I held to promote learning circuits for robots.
An Operational Definition of LearningDid Mark Tilden's walking robot, Walkman, "learn" to deal with obstacles in its path, as some people claim? Or did it just adapt to different physical surroundings? Some will argue that it didn't really learn because it didn't use reasoning to deal with the obstacle. Others will argue that it didn't remember how it dealt with an obstacle so it didn't really learn. These are weak arguments at best. Reasoning is a good problem solving tool but it's not an essential part of learning. Many animals learn without reasoning. The memory argument is a more compelling one, but how many people remember everything they learned in school? While memory plays an important role in learning, we can't very well insist that something that was "learned" must be remembered forever. Where do we draw the line?The definition at the top of this page describes just one of many different kinds of learning. Unfortunately, dictionary definitions can only tell us how words are used. They don't tell us, for example, what is involved in "learning". Nor are they concerned with how to recognize the difference between actual learning and the appearance of learning. Those issues are left to "experts" -- who often can't agree among themselves. A useful way to explore these issues is to examine the process of learning. Then when a debate arises about whether a robot is actually learning or just appearing to learn we can look inside the robot and see if it contains the required process. More important, the process description can help us design a learning circuit or procedure. Here are the essential elements in the learning process.
The goal can be the solution to a problem, a physical motion, or a skill that has to be acquired. The learner doesn't actually need to know what the goal is, or even that there is one. The notion that a goal might be unknown to the learner bothers some people. How can people (or machines) learn if they don't know what it is they're trying to achieve? Consider the pupils in an elementary school, reciting the multiplication tables. The goal is to get each one of 55 different product combinations correct. But how can they tell if an answer is correct when they haven't learned the answer yet? They must take guidance from the teacher, who does know what the goal is. There could be just one agent that is able to reach the goal. Or there might be several possible agents; which is best? It may even be that there is no way to reach the goal. Discovering that fact is also a form of learning. Some critics object to the notion that people already possess the capability to reach all the goals they aspire to. Why bother with learning at all? If all the neurons and muscles and bones that a baby needs to walk are already in place, why should the baby have to go through so much effort to learn to walk? Why not just "hardwire" humans so they can walk at birth? There are some important reasons why we need to learn something like walking. One is the fact that we grow. Our size and shape change. We have to deal with more momentum as we grow heavier. If we had walking skills programmed into us at birth, we would need thousands of different sets of those skills to handle all the different possible combinations of size and shape we might experience from birth to adulthood. Learning only the skills we need is more efficient. Yet all the fundamental pieces we need to learn those skills were provided to us at birth. Selection methods vary tremendously. When human babies first learn how to move, they wave their arms and legs around randomly. This is the simplest form of selection -- random choice. Later, as they acquire a few basic movement skills, they will select from skills that already work in other situations. If existing skills don't work, they will select skills similar to those that almost worked. And sometimes they will have to construct new skills from basic building blocks that they already possess. Measuring success can be simple; was the goal reached, yes or no? It can also involve deciding how close the result was to the goal. But what about when the learner doesn't know what the goal is? How is it possible to measure success then? The answer: whoever "knows" the goal can provide information about success. Influencing the selection process involves two things. First, when the goal is reached the learner has to remember which agent he/she/it used. Second, the learner has to alter the selection process so the chance of selecting the successful agent again increases. "Remembering" implies some kind of short-term memory. Why not permanent memory? Because learning frequently demands a great deal of repetition. Imagine how much permanent memory would be need to remember every single result of every single attempt ever made to learn something. Most of that information is useless anyway, once a skill is learned. To accumulate successes, some kind of long-term memory is needed. Human learning usually takes place over an extended period of time and there's no reason to expect robot learning to be different. In any event, when a skill has finally been learned, the learner has to remember it. (Humans don't consciously remember how to move skillfully, but they do remember. In the sports world this is often called "muscle memory".) Why do humans need so much repetition needed to learn something? True, if the first few attempts to reach a goal fail we have to keep trying. But once the goal is reached, why not stop there? It certainly seems like it would be more efficient. There will be times when we only need to succeed once to learn something. However, for many skills, especially physical ones, a great deal of repetition seems to be desirable. Why? When we want to learn precise movement skills, there is a good chance that several different agents will "come close" but not be very precise. How could we know if another attempt would give a better result, unless we tried again? Another important reason for having to obtain several successes before learning something has to do with the learning environment. What if, by chance, something happened in the enviroment that caused us to be successful? Pure luck! If we "learned" with that one success, we could never repeat the success unless the freak event occurred again. Even more likely, in a complicated environment some random event might confuse our measurement of success. Then we'd think we were successful when we weren't. Once again, learning with just one successful attempt leads us astray. In a contrived environment we might need only a few successes to learn something. But humans live (and always have) in a very complicated environment. We seemed to be designed to require a great deal of repetition to learn things, especially physical skills. This discussion of the learning process has laid the groundwork for developing a system of robotic learning. The following diagram shows the structure of a learning circuit.
Robotic LearningWhat sort of learning might a robot need to do? Is the learning process described above general enough to deal with all types of learning? It's very likely that a robot must learn to recognize patterns of sound and vision but this is a very complex application, better suited to micro-processors. It might also be useful for a robot to learn to move skillfully. That should be a little more within the scope of the hobbyist.At first, it seems that the reasons humans have to learn so many movement skills won't apply to robots. We have to re-learn how to move ourselves as we grow, but robots don't grow. Would it not be easier to program a robot to walk, right from the start? Or to build in all the various combinations of movements a particular type of robot might have to do. If we were building identical robots to perform identical tasks in a very constrained environment, this would work. It's been done, in fact. But what about general-purpose robots working in a variety of environments. Trying to program them to deal with every possible demand would involve huge amounts of hardware. Even compact microprocessors might not be up to the job. An alternative is to provide plug-in modules that allow a robot to perform specific tasks. This would allow the robot to be configured to do only those chores it was needed for. But is this easier than having a robot learn the tasks? Wouldn't it be better to build in some general patterns of movement and then have the robot learn specific skills, starting with a few approximate patterns? Most of the elements of the learning process can be created using microprocessors, or discrete digital components, or even analog circuits. And while different types of learning may need some specialized features, the general outline can serve as a model for special learning circuits. There is really only one part of the process that presents serious difficulties -- measuring success. Or to put it another way, observing the result and comparing it to the goal. To a very large extent, this is a sensory problem. It's interesting that so many people believe a micro-processor is a pre-requisite for robot learning, yet the greatest difficulty in creating a learning robot is to acquire and process sensory information. That problem exists regardless of what hardware is used to learn. The simplest way to measure success is to have a "teacher" press a button whenever a successful attempt is made. This gets pretty boring in a hurry (for the teacher; the robot doesn't care.) Going a step further, a robot might be physically constructed so a successful attempt pushes the button automatically. That eliminates the teacher. From there the complexity of recognizing success rises dramatically. To some extent the robot designer can control the type of sensory information needed to measure success. Ingenuity rather than technololgy is the point. For initial experiments with learning circuits, the best approach is the simple one -- keep the task itself simple so it can be easily measured.
CircuitsThis is the latest version of the original circuit which I developed to show that a simple analog learning circuit was possible. It's still quite primitive and there is lots of room for improvement. The circuit can easily be emulated on a small processor, which is exactly what the contest winner did. To begin with, here are some of the elements used in the learning circuit.
One important element is the learning selector. A pseudo-random
selector can be made by chaining together several Pulse Delay Circuits (Nv neurons) in
a loop. Each neuron is on for a short period of time and as it turns off, the next
neuron switches on. This provides a single pulse circulating around the loop of neurons.
When a response is called for the currently active neuron in the loop
is used to select the agent for that trial.
A second essential element in the circuit is the short-term memory. When the
learning circuit is triggered, it must "remember" which of the learning selector
outputs was "on" until after the learning cycle is complete.
A third component of the learning circuit is the long-term memory which is
used to store the biasing voltages (Vr) of the Nv neurons in the learning selector.
The Complete Learning CircuitThe learning circuit is assembled from the sub-circuits discussed above, plus a few extra elements. When triggered it will select one of 4 outputs which can be use to activate various agents. After several successes, the circuit will only select the output that led to the previous successes.
The learning selector is in the upper left area of the diagram. It contains 4 variable delay neurons connected in a loop. The diodes (D3) ensure that the circuit starts with only one output active at a time. The output of each neuron is directed through an inverter to produce the proper feedback signal and the inverter outputs then go to four short-term memory cells. The short-term memory elements are shown in the upper right, one connected to each of the learning selector neurons. The 74HC240 tri-state inverter in the memory neurons uses a single enable line to control all 4 devices. When the learning selector loop operates at relatively high speed the short term memory can be affected by circuit noise. Adding 0.1 uF bypass capacitors (C6) solves this problem. The 4 LED's in the middle right of the diagram show which output has been selected. They can be used for debugging and also for testing the circuit. Choose which LED is the "goal" and each time it lights up, press the "success" button (see below). Eventually the circuit will learn to light only the "goal" LED. Four identical long-term learning memory cells are shown in the middle left part of the diagram shows. One of these is exploded to show the details. Each memory cell is connected to a common success/fail line and also has its own selector line from a short-term memory cell. An inverted feedback signal comes from the Nv neuron and the memory output line is connected to the delay resistor (R5) of the same Nv neuron. To the lower left is one example of how a success/fail signal can be generated. There are many possible ways of producing these signals -- what is important here is that the pulse length (determined by C1 x R1 and C2 x R1a) and the output resistor (R2) produce an appropriate change in the long term memory. It's also very important to prevent simultaneous success and fail signals from occuring, as this will cause a direct short circuit through diodes D1 and D2. In the example circuit, diode D5 blocks the fail signal whenever the success neuron is active. Finally, a set of optional LED's is shown at the lower right. These are very useful for debugging as they show how the delays in the selection loop change as the circuit "learns". Just one current-limiting resistor is required for all four LED's. The behaviour of this arrangement for the long-term memory cell is particularly suitable to the learning mechanism. As the circuit experiences a few successes, the charge on the "successful" C3 rises slowly. A failed attempt will lower the charge on the "unsuccessful" C3. At first this has only a very slight effect on the learning selector. The less successful agents have almost as much chance of being selected as the successful agent. Finally, however, the charge on one C3 capacitor begins to approach the threshold of the selector neuron. This increases the delay to the point where the successful agent is far more likely to be chosen. Special care is needed with this circuit. Bypass capacitors (between ground and Vcc) must be located near all the IC power lines. As the charge on C3 approaches the falling threshold of the selector neuron it becomes extremely sensitive to electronic noise caused when other neurons in the circuit fire. The 0.1 uF bypass capacitors (shown in the second circuit diagram) reduce the effect of this noise. The component values I used are shown on the diagram. Guidelines for changing these values are given later on this page.
Trigger and Feedback FunctionsTo experiment with the learning circuit a simple interface circuit is useful. This will provide the trigger, fail, and succeed inputs to the learning circuit and can also start an agent after it has been selected.
In the upper left of the diagram is the start input. A low pulse at this input will start a learning attempt. For experimenting, a pushbutton can be used to provide the pulse. The start pulse fires Nv2, which immediately triggers Nv3 in turn. Nv2 is a filter neuron which prevents additional start signals from interupting the learning sequence. It is is held active through diodes D6 until the learning cycle has finished. The input labelled wait allows external agents to block a start signal as well. This is useful if a robotic device has completed a learning attempt but has to return to a starting position before the next attempt can begin. Nv3 fires a few nanoseconds after Nv2 and does three things. It sends the "trigger" signal to input (A) in the learning circuit, it sends a signal that will start the selected agent operating, and it triggers Nv4 which has a very long delay. Nv4 must remain active at least as long as it will take to complete the learning attempt. When Nv4 times out, it triggers Nv5 which sends a pulse to the fail input (B) of the learning circuit. This is the default condition -- if the attempt didn't succeed by the time Nv4 times out, it's assumed to have failed. The LED connected to the output of Nv4 indicates that an attempt is in progress. When the LED goes out, the attempt is finished and another attempt can be made at any time. The input at the middle left signals a successful attempt. It can be a high pulse, or a "teacher" can press a button any time an attempt succeeds. The pulse fires Nv6 which sends an inverted pulse to the "succeed" input (C) of the learning circuit. Nv6 also cuts short the delay on Nv4 through dioded D4 and resistor R14. At the same time, it holds the "fail" neuron (Nv5) in an "off" condition through diode D5. This prevents Nv4 from triggering Nv5, so the "fail" signal is automatically blocked by a "succeed" signal. R14 is used to introduce a tiny delay before Nv4 is turned off which ensures that Nv5 is securely turned off before Nv4 can trigger it. The following chart shows the values of the various components used in the current trigger circuit.
Component selectionThe following formulae offer some guidelines for calculate other component values. I suggest you start with the ones shown on the diagram if they're available.
IC pinouts
Circuit operation"Teaching" the circuit is the best way to test it and to see how it works. Power it up and press the "start" button once to reset all the neurons. The circuit will actually perform a little better if you let it run freely for at least 15 minutes before starting to teach it. This will allow the charge on the memory capacitors (C3) to rise to their equilibrium value of about 1.2 volts. This is not essential -- it just allows failed learning attempts to have some influence on the selection loop.To test the circuit
|
||||||||||||||||||||||||||||||||||||||||||||