As I talked about in an earlier post, wandering behavior is an almost always necessary component of AI for any virtual, persistently populated environment.
To briefly summarize from the earlier post, the resource control (RC) model is a very popular scheme for implementing control of a virtual population. The gist of RC is, agents need resources, and their need for them increases over time. When the need crosses a threshold, agents seek out something in the environment that provides that resource. In this way, RC provides computationally cheap, reasonably realistic looking behavior. The essential problem is, then, how do agents find out about things in the world that satisfy resources?
Seeking: You could pre-load all the information about the world into a universally accessible dictionary structure of some sort. This way, when an agent needs some food, he queries the world for the nearest burger joint and goes there. Exploring/Wandering: Alternatively, you could tell the agents nothing, and then have them randomly walk around the virtual world finding things. Eventually, they find out where everything is, and the agents act just as they do in the first option.
For those of you who are into reinforcement learning, the second option should sound a lot like model-less Q-Learning. In Q-Learning, the agent starts off knowing little to nothing about the world, and through randomly sampling the environment, figures out what/where the “rewarding” things are. At some point, the agent decides to stop exploring, and starts exploiting its knowledge about the environment. If new information is constantly appearing in the world, however, you can’t ever turn off that exploring component. Thus, if the resource-bearing environment is always gaining/losing/altering resource values, wandering is always necessary. Wandering, in other words, is the solution to the constant appearance of new information (if not relying entirely upon Seeking, see this earlier post).
The constant appearance of new information introduces two problems:
- Finding out about the new information
- Planning and acting based on changing data
When I say an agent performs “wandering”, it is equivalent to saying he has a (reasonably efficient) solution to these two problems.
I’ll consider the first problem here, and the second in a later post. To consider how an agent should find out about a bunch of new information, let’s consider an example. You’re at a hotel, α, and you want to find a restaurant in the area. You don’t know where anything is, so you decide to take a stroll, and you’ll stop when you find something interesting. Let’s represent this scenario like so:
To “find out about new information”, one has to visit all the nodes, or a subset of them (until you’ve found a nice enough place to eat or something sufficiently distracting). The question is, is there an “optimal” way to visit them, and if so what is it?
First, the above picture can be resolved into the following graph to indicate which nodes (restaurants) are adjacent to each other:
In this graph, a solid line means the nodes are adjacent to each other (i.e. it makes sense for someone to visit F after visiting E. The dotted lines means technically, the nodes are adjacent, but whether or not someone would visit one after the other is up for debate. For example, L is adjacent to E, but it’s across the street, so a person might not really go from E straight to L.
The first thing off the top of your head when designing the algorithm might be to visit them in Depth First Search order, the outcome of which is:
Is that the order you would visit the buildings in? I didn’t think so.
Clearly, any path of visitation is not satisfactory. One criteria to capture is the “naturalness” of the path of visitation. I suppose there are many possible definitions of a “natural” path. But for this post, I will hypothesize that a natural path of visitation is one that minimizes the deviation from initial heading. Slightly more formally, a path becomes less natural the more zig-zagging you do. More precisely:
The deviation of a path of visitation P (in the depicted graph) is the number of horizontal edges traversed.
Thus, the deviation of a path is how “zigzaggy” it is. Just count the number of left arrows and right arrows in a given graph traversal.
Given this definition, a visitation path of minimum deviation (and maximum naturalness) is:
which has a deviation of 0.
P0 is definitely natural, although it does not allow you to visit very many nodes. You might have a certain threshold number of buildings that you insist on visiting before making a decision. In this way, there is a second criteria to keep in mind: number of nodes visited.
Note that, for path P, the number of nodes visited by P, and the deviation of P are directly proportional to each other: Increasing the zig-zagging of P means that you will hit more nodes.
The agent designer must therefore solve the following problem. Given a certain minimum number of nodes the agent must visit, what are the paths of minimum deviation?
For the graph above, these paths are:
Nodes agent must visit | Path of minimum Deviation (cost) |
---|---|
4 | αDEF (1) |
5 | αDELK (0) |
6 | αDEBHI, αDELHI (1) |
7 | αDELHBC, αDEBHLK (2) |
More on the second problem regarding planning and acting based on changing data later.