As I talked about in an earlier post, wandering behavior is an almost always necessary component of AI for any virtual, persistently populated environment.

Why Wandering is Important

To briefly summarize from the earlier post, the resource control (RC) model is a very popular scheme for implementing control of a virtual population. The gist of RC is, agents need resources, and their need for them increases over time. When the need crosses a threshold, agents seek out something in the environment that provides that resource. In this way, RC provides computationally cheap, reasonably realistic looking behavior. The essential problem is, then, how do agents find out about things in the world that satisfy resources?

Seeking: You could pre-load all the information about the world into a universally accessible dictionary structure of some sort. This way, when an agent needs some food, he queries the world for the nearest burger joint and goes there. Exploring/Wandering: Alternatively, you could tell the agents nothing, and then have them randomly walk around the virtual world finding things. Eventually, they find out where everything is, and the agents act just as they do in the first option.

For those of you who are into reinforcement learning, the second option should sound a lot like model-less Q-Learning. In Q-Learning, the agent starts off knowing little to nothing about the world, and through randomly sampling the environment, figures out what/where the “rewarding” things are. At some point, the agent decides to stop exploring, and starts exploiting its knowledge about the environment. If new information is constantly appearing in the world, however, you can’t ever turn off that exploring component. Thus, if the resource-bearing environment is always gaining/losing/altering resource values, wandering is always necessary. Wandering, in other words, is the solution to the constant appearance of new information (if not relying entirely upon Seeking, see this earlier post).

The Wandering Problem

The constant appearance of new information introduces two problems:

Finding out about the new information
Planning and acting based on changing data

When I say an agent performs “wandering”, it is equivalent to saying he has a (reasonably efficient) solution to these two problems.

The Wandering Solution

I’ll consider the first problem here, and the second in a later post. To consider how an agent should find out about a bunch of new information, let’s consider an example. You’re at a hotel, α, and you want to find a restaurant in the area. You don’t know where anything is, so you decide to take a stroll, and you’ll stop when you find something interesting. Let’s represent this scenario like so:

To “find out about new information”, one has to visit all the nodes, or a subset of them (until you’ve found a nice enough place to eat or something sufficiently distracting). The question is, is there an “optimal” way to visit them, and if so what is it?

First, the above picture can be resolved into the following graph to indicate which nodes (restaurants) are adjacent to each other:

In this graph, a solid line means the nodes are adjacent to each other (i.e. it makes sense for someone to visit F after visiting E. The dotted lines means technically, the nodes are adjacent, but whether or not someone would visit one after the other is up for debate. For example, L is adjacent to E, but it’s across the street, so a person might not really go from E straight to L.

The first thing off the top of your head when designing the algorithm might be to visit them in Depth First Search order, the outcome of which is:

α D E L K H J C I M B F G A

Is that the order you would visit the buildings in? I didn’t think so.

Clearly, any path of visitation is not satisfactory. One criteria to capture is the “naturalness” of the path of visitation. I suppose there are many possible definitions of a “natural” path. But for this post, I will hypothesize that a natural path of visitation is one that minimizes the deviation from initial heading. Slightly more formally, a path becomes less natural the more zig-zagging you do. More precisely:

A path of visitation for graph G = (E,V), is a string of nodes P = e1 e2 ... en such that ∀ei , ei ∈ E ei and ej appear next to each other only if they are adjacent to each other in G. en is a node at the periphery of G.
The deviation of a path of visitation P (in the depicted graph) is the number of horizontal edges traversed.

Thus, the deviation of a path is how “zigzaggy” it is. Just count the number of left arrows and right arrows in a given graph traversal.

Given this definition, a visitation path of minimum deviation (and maximum naturalness) is:

P0 ≡ α D E L K

which has a deviation of 0.

P0 is definitely natural, although it does not allow you to visit very many nodes. You might have a certain threshold number of buildings that you insist on visiting before making a decision. In this way, there is a second criteria to keep in mind: number of nodes visited.

Note that, for path P, the number of nodes visited by P, and the deviation of P are directly proportional to each other: Increasing the zig-zagging of P means that you will hit more nodes.

The agent designer must therefore solve the following problem. Given a certain minimum number of nodes the agent must visit, what are the paths of minimum deviation?

For the graph above, these paths are:

Nodes agent must visit	Path of minimum Deviation (cost)
4	αDEF (1)
5	αDELK (0)
6	αDEBHI, αDELHI (1)
7	αDELHBC, αDEBHLK (2)

More on the second problem regarding planning and acting based on changing data later.

When you’re creating AI to control a population of thousands of visualized entities, and it’s all supposed to run on one machine, your options can often be limited. Now, I know I said population up there, and a lot of you read that as crowd and you would be wrong wrong wrong. So here comes a brief but informative aside. The problems of creating realistic crowd behavior and population behavior are two completely separate things:

Crowds: Smaller in scope, both in time (duration of gathering) and in size (i.e. hours or days, 10s, 100s, 1000s of people). Actions do not have lasting consequences (see below). An example of this is a bunch of people assembled around a car accident, or in a stadium to witness a rock concert.

Populations: Large in scope, both in time and size (days, months, years, forever, population size on the order of 10^3, 10^4, or 10^5). In a population, one agent’s life is totally independent of another agent’s life: i.e. two agents in a the same place at the same time are most probably not there for the same reason, whereas the opposite is true for crowds. The most significant differences between a population and a crowd, however, is that a population itself has a personality, and actions have lasting consequences.

By this I mean the following. When modeling a population, now you have to think about modeling time-of-day events, such as rush hour in the morning and evening, and making food vendors more busy during the lunch hour.

In addition, if a traumatic event, such as an explosion, occurs at location L, then now you have to model the effect this had on the population. When people pass by L, now they’re creeped out by it a bit, or perhaps they completely steer clear of it. Now the city has a history, which the population itself makes and reacts to.

Now that that’s out of the way, we can talk about modeling populations. Actually, I wanted to talk about just one model in particular, that is pretty popular + successful: The resource control model, and some of the implications of using it for population control. Used in such places as The Sims, and Shao and Terzopolous, the resource model essentially creates agents that look for things when they run out of them.

In the resource control model, agents maintain a list of resources, including things such as hunger or satisfaction, often represented simply as a number. The resource value can be thought of as “urge for/to” something. Due to whatever reason, eventually the resource value will increase beyond a threshold, and the agent has to do something about it. For example, eventually the hunger resource will increase beyond a threshold. At this point, the agent examines the world for things satisfying hunger, i.e. places providing food.

Of course, the success of this framework hinges up on the agents being able to find what they’re looking for the environment. There are two things spanning the space of options:

Directory Lookup: An agent can query the system for the nearest food stand. The system returns the location of the food stand.
Wandering: There is no directory facility. To find out where things are, agents have to wander around the city, noting whenever it finds something interesting.

Any population control scheme using resource control will most probably use a combination of these two. If the agents start out with all wandering, and there’s no directory service, then as they note things, eventually the information will be put into a sort of directory. Agents that use only directory lookup become excessively mechanical. If the population only cares about things they already know about, then new information is difficult to propagate.

So, any combination of the above two axes has wandering as a component.

The question is: How do you do good wandering behavior?

Let me roll that one over a little.

Wandering Behavior for AI: Why and How

The Resource Control Model and Wandering