DPRG
DPRG List  



DPRG: Sbasic , gambling , and probability

Subject: DPRG: Sbasic , gambling , and probability
From: Kipton Moravec kmoravec at airmail.net
Date: Sun Jan 11 12:26:41 CST 1998

Corey Hansen wrote:
> 
> Hi,
> 
> Alright here's the thing:
> 
> I'm building a robot that will learn , or at least I hope it will. To do
> this I am attempting it with Sbasic. My plan is to have a simple
> platform;wood , wheels , motors , caster , batteries , brain , and some
> stray electronics. The brain is HC11 based. The robot , for now , will have
> 8 inputs , 3 IR , 4 bumper switches , and a Cds cell set to be tripped at a
> certain light level. The motors will be driven by a classic H-bridge.
> 
> OK , there are 8 inputs , which can be represented by a one byte wide
> variable. Since there are 256 different combinations of sensor input , there
> will be 256 different situations for the robot to learn what is best to do.
> To know what is best to do , the robot needs a goal , which will be to stay
> away from things. To learn the robot needs to experiment , and have some
> randomness , mixed with probability. To do this all of the combinations have
> 7 variables to represent motor output. What the robot needs to have is a
> number of effectiveness inside each variable , to start say 128.
> 
> The robot needs to choose which of the motor outputs to use , according to
> probability and randomness. For this to work the robot needs to stage a
> 'gambling' effect with the motor outputs. At the start all the variables
> have 128 probability of being chosen , here's where randomness is needed. I
> need the robot to choose which output would work best by experience.
> Whichever output is chosen must be tested for effectiveness. After the
> output is executed the robot needs to wait for the inputs to change , when
> they do it needs to determine if more sensors went on or off. Since the goal
> is avoiding things , the lower the byte wide variable
> representing inputs is , the better , or vice versa. Now that it has
> determined if that move was a good or bad , the robot needs to change the 7
> variables. If the move was good , then the robot needs to add 1 to the
> variable representing the move , and minus 1 from the rest. Now the cycle
> repeats.
> 
> What I'm having a problem with is 'gambling' for the right motor output ,
> while pay attention to the necessary probability. If anyone would like to
> help it would be greatly appreciated.
> 
> :David
> davehansen at geocities.com

Interesting there are a couple of ways you could do this, and I could
make it one degree of difficulty harder by also considering what
movement it just completed as another input.  Or even two prior
movements may be additional inputs.

For example, if you look at the sensor inputs, you would choose a
movement based on the pattern. Say it says go forward. It goes forward
and hits something, which based on the sensor inputs, the response is to
go backwards, where it then ends up in the same position as it was in
when the sensors told it to go forward. Now it is stuck, forward, back,
forward, back... Adding previous two movements to the input is extra
information to decide not to go forward again and repeat the cycle.

Where I think you are getting at, is to have a list of possible
movements with a score for each movement for each set of input
combinations.

For example, if the sensors report the way is clear, then the score for
go forward is say 10 and the score for turn right is 7 and the score for
turn left is 7.  In the case of take the best choice you would always go
forward (because 10 is the highest number). If you wanted to add some
randomness to it, you would generate a uniform random number between 1
and the sum of all the scores for the movement, (in this simple case 10
+ 7 + 7 = 24).  If the random number is between 1 and 10 you go forward,
if it is between 11 and 17 you go right and if it is between 18 and 24
you go left.

You should preload the scores with what you would guess they should be,
or preload them with random numbers or preload them with the same
score.  If the movement was successful you could add one to the score
for that response for that set of inputs.  If the movement was not
successful you could subtract one from the score for that set of inputs.

The hard part is determining what is successful movement or not.  I
would try at a first attempt to base it on the  ability to move for a
certain period of time successfully or not?  (Say 1 or 2 seconds, you
may have to play with it to see what makes sense). Maybe you or someone
else has a different idea of what makes a successful or not successful
movement. 

This feedback of what is good or what is not good is important in any
learning system. The system can only learn if there is a decision as to
what constitutes a good/bad (pain/pleasure) move.

Regards,
Kip

------------------------------

More information about the DPRG mailing list