Artificial Intelligence Homework 2

Ed Smart

CSC 375

Homework 2

2.1 Suppose that the performance measure is concerned with just the first T time steps of the environment and ignores everything thereafter. Show that a rational agent’s action may depend not just on the state of the environment but also on the time step it has reached.

My first thought regarding this question was that a rational agent could be involved in some sort of time-limited event or activity, such as a sports game. In such a scenario, the agent would be required to play against an opponent and accumulate more points than the opponent within a predetermined amount of time. So, the duration of the game would be the T time steps that the performance measure is concerned with. In a game of soccer, for example, goals may be scored outside the game’s duration, but obviously, they do not matter. The agent’s style of play or tactics may change during the game depending on the score relative to the remaining time. If the agent has a comfortable point lead over its opponent with little time remaining in the game, it may choose to simply play defensively for the remainder. On the other hand, if the agent has a lower score than its opponent and there is a considerable amount of time remaining, it may choose to play more offensively in order to gain points.

Another situation in which a rational agent’s actions may depend on the time step is has reached could be taking a long exam in some given amount of time. If the agent has 60 minutes to complete 30 exam problems and has 20 problems finished after 45 minutes, it may start skipping the problems that it finds to be more difficult or time consuming and focusing on the easier or shorter ones in order to maximize the amount of problems it can answer, instead of just completing them in the order they are given and possibly not having time to answer some of the easy questions.

2.2 Let us examine the rationality of various vacuum-cleaner agent functions.
a. Show that the simple vacuum-cleaner agent function described in Figure 2.3 is indeed rational under the assumptions listed on page 38.

The first assumption listed on page 38 states that the performance measure awards 1 point for each clean square at each time step over a lifetime of 1000 time steps. Based on this, the agent behaves rationally, as it moves back and forth from square to square looking for dirt to clean and moving to the other square if none is found. If the performance measure included any sort of penalty for movement between squares, however, the agent would not be considered rational, as it just moves back and forth needlessly once the dirt has been cleaned.

The second assumption is that the geography of the environment is known, but the dirt distribution and the agent’s initial location are not. It also states that cleaned squares remain clean and sucking cleans the current square. In this case, the agent also behaves rationally due to the fact that it oscillates from one square to the other checking for dirt and cleaning any that it finds. It is also assumed that the Left and Right actions move the agent left or right respectively, unless doing so would move the agent out of the environment, in which case it remains at its current location. Due to the fact that there are only 2 squares, this assumption solves the problem of the agent not initially knowing its location. If it starts at square A without knowing it is at square A and tries to move left, it remains where it is and has no other choice but to move right and go to square B.

The third assumption is that the only available actions are LeftRight, and Suck. According to this, the agent is rational, as these are the only actions performed by it. It checks the square it is currently at for dirt, sucks it up if any is found, and then moves either left or right to the other square and repeats the process.

The last assumption is that the agent correctly perceives its location and whether that location contains dirt. Looking back at the second assumption, it is clear that the agent is capable of perceiving its location and whether or not that location contains dirt.

b. Describe a rational agent function for the case in which each movement costs one point. Does the corresponding agent program require internal state?

For the case in which each movement costs one point, the agent should stop after checking both squares A and B and removing any dirt found in order to reduce needless movements and loss of points. The corresponding agent program would require internal state in this case, as it would need to remember checking both squares before it stops performing its task. If it finds dirt at square A, cleans it, then moves to square B and finds no dirt, and then does not remember whether or not it checked square A, it will check A again, and then repeat with square B and continue making unnecessary movements, thus losing points.

If the squares did not permanently remain clean, however, the agent could stop for some fixed amount of time after checking and cleaning both squares and repeat its task. It could work at some time interval, checking both squares, cleaning any dirt found, stopping for 1 hour or so, and then repeating the process. This would allow the agent to minimize movements and point loss while keeping the squares clean if they were to become dirty again.

c. Discuss possible agent designs for the cases in which clean squares can become dirty and the geography of the environment is unknown. Does it make sense for the agent to learn from its experience in these cases? If so, what should it learn? If not, why not?

For the case in which clean squares can become dirty, the agent could work at some fixed time interval. It could check and clean both squares A and B, then stop for a fixed amount of time, then start up and repeat the process in order to keep the squares clean while minimizing needless moves and performance measurement point loss. If the geography of the environment is unknown, the agent would need to explore the environment instead of simply oscillating back and forth between squares A and B. I would say it would make sense for the agent to learn from its experience in these cases. It could learn the geography of the environment by exploring it, as well as learning which squares it has checked and cleaned so that it may temporarily stop once it has finished checking and cleaning all of them.

 

Leave a comment