• Nu S-Au Găsit Rezultate

Machine Learning-based gameplay

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning-based gameplay "

Copied!
57
0
0

Text complet

(1)

UNIVERSITATEA BABEŞ-BOLYAI CLUJ-NAPOCA FACULTATEA DE MATEMATICǍ ŞI INFORMATICǍ

SPECIALIZAREA INFORMATICĂ ROMÂNĂ

LUCRARE DE DIPLOMǍ

Machine Learning-based gameplay

Conducător ştiinţific

Prof. univ. dr. Czibula Gabriela

Absolvent Cîmpean Alexandru

2012

(2)

Table of contents

Introduction ... 3

I. Game perspective in Artificial Intelligence ... 5

1.1 A general game description... 5

1.2 The context of video games ... 5

1.3 Artificial intelligence in video games ... 13

II. Learning to play ... 17

2.1 Software agents: autonomous programs ... 17

2.2 Machine learning: making computers act ... 22

2.3 Embedding machine learning in computer ga mes ... 23

2.3.1 The history of machine learning and games... 23

2.3.2 What do the games have to learn? ... 23

2.3.3 Creating a Learning Agent ... 25

2.3.4 Proble ms with Learning ... 28

III. Relevant applications of machine learning in games ... 30

3.1 TD-Gammon ... 30

3.2 Samuel's checkers program ... 31

3.3 NeuroChess... 32

3.4 SAL... 32

IV. “A Tale of Two Catapults” – machine learning techniques applied in games... 34

4.1 Proble m statement ... 34

4.2 Motivation ... 34

4.3 Analysis and design... 35

4.3.1 Proble m analysis ... 35

4.3.2 Solution’s design ... 38

4.4 Implementation ... 42

4.5 User manual ... 49

4.6 Results obtained ... 51

4.7 Future work... 53

(3)

Introduction

Since the beginning of computer games, whether they were inspired from traditional games such as chess, backgammon, checkers, whether they exploited the new opportunities of the virtual worlds, creating new and innovative genres, the problem of an artificial player appeared. The players of those video games desired a program that knows how to play a game, and will play it in collaboration or against him, using similar skills as themselves.

The first games had a component that would play against the human player, but that proved to be very poor and limited according to the players’ demands. Then, in order to beat the game, some games had a collaborative part which helped the player in different situations that mostly couldn’t be done alone. To make such programs the developers inspired themselves from a classical branch of artificial intelligence, more exactly game theory. Using this method the programs used different algorithms to choose the best moves based on different heuristics and classical moves. The game theory approach proved to be a good one until its limits were discovered. This method requires a well established strategy and when the state space is big they will run very slow. Another deficit of this method is the incapacity to adapt to new situations, making the radical change of the game parameters difficult or even impossible.

Later, the concept of agent, which we will explain on this work, introduced new possibilities for programs to play games. Thus the artificial intelligence in video games has grown according to the need of better non-human players and was split in multiple types: traditional game playing, mimicking intelligence and human player behavior and some of them tried to adapt to methods from another branch of artificial intelligence – machine learning. Machine learning is a concept we will explain further in this work and applied in games, agents are trying to learn a game without knowing its rules and without considering a strategy, only based on the feedback he receives after playing repeated games with himself.

(4)

In the first chapter of this work we will present theoretical aspects of video games in general, artificial intelligence in video games as well as different traditional techniques of making a game playing program.

In the second chapter we will present other techniques of artificial intelligence. In the first place we will describe agents and their characteristics; we will see what an agent needs to be considered intelligent and what features are required; we will classify the agents according to different criteria and give examples of some of them. Further in this chapter we will present the concept of machine learning; we will describe its subtypes and characteristics; we will present the steps needed to make a learning algorithm; also here we will describe the concept of learning to play, give examples of learning methods used in game play and see how they work;

In the third chapter of this work we will talk about the recent work in this domain and analyze some of the most relevant learning games together with their characteristics.

The fourth chapter of this work is a practical application of machine learning in games. The game is called “A Tale of Two Catapults”, it is a game in which the player controls a catapult and in order to win the game he has to hit the opponent catapult. He can choose the required angle and initial velocity to shoot a projectile at some distance the opponent is situated.

Furthermore the wind can blow from any direction and influence the trajectory of the projectile.

In order to win the game the player has to hit the opponent three times. The opponent is in fact a program designed to learn to shoot after playing many games with himself. In the fourth chapter we will present all the steps in developing such a program including the design, the algorithms and methods used, difficulties met and the results obtained.

(5)

I. Game perspective in Artificial Intelligence

1.1 A general game description

Below we have some definitions of games from different theoreticians and specialists in games:

 "A game is a system in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome." (Katie Salen and Eric Zimmerman)[1]

 "A game is a form of art in which participants, termed players, make decisions in order to manage resources through game tokens in the pursuit of a goal." (Greg Costikyan)[2]

 "A game is an activity among two or more independent decision- makers seeking to achieve their objectives in some limiting context." (Clark C. Abt)[3]

 "At its most elementary level then we can define game as an exercise of voluntary control systems in which there is an opposition between forces, confined by a procedure and rules in order to produce a disequilibrial outcome." (Elliot Avedon and Brian Sutton- Smith)[4]

 "A game is a form of play with goals and structure." (Kevin J. Maroney)[5]

1.2 The context of video games

There are two principal reasons to continue to do research on games ... First, human fascination with game playing is long-standing and pervasive. Anthropologists have catalogued popular games in almost every culture....Games intrigue us because they address important cognitive functions....The second reason...is that some difficult games remain to be won, games that people play very well but computers do not. These games clarify what our current approach lacks. They set challenges for us to meet, and they promise ample rewards [6]. - Susan L. Epstein, Game Playing: The Next Moves

(6)

Computer game designer Chris Crawford attempted to define the term game using a series of dichotomies:

 Creative expression is art if made for its own beauty, and entertainment if made for money.

 A piece of entertainment is a plaything if it is interactive. Movies and books are cited as examples of non- interactive entertainment.

 If no goals are associated with a plaything, it is a toy. (Crawford notes that by his definition, (a) a toy can become a game element if the player makes up rules, and (b) The Sims and SimCity are toys, not games.) If it has goals, a plaything is a challenge.

 If a challenge has no "active agent against whom you compete," it is a puzzle; if there is one, it is a conflict. (Crawford admits that this is a subjective test. Video games with noticeably algorithmic artificial intelligence can be played as puzzles; these include the patterns used to evade ghosts in Pac-Man.)

 Finally, if the player can only outperform the opponent, but not attack them to interfere with their performance, the conflict is a competition. (Competitions include racing and figure skating.) However, if attacks are allowed, then the conflict qualifies as a game.[7]

Video Games encompass a range of electronic games, including coin-operated arcade games, console and cartridge games, various handheld electronics, and games on diskette and CD-ROM ranging in sophistication from simple geometric shapes to virtual reality programs with movielike qualities. [8]

Most games fall within a particular category. Some different gaming styles and, thus, could appear under more than one category simultaneously. And others pioneer new approaches to electronic entertainment. Those often fall outside of any pre-conceived genre. If such a title becomes popular enough that others try to duplicate the experience, a new genre comes into being.[9]

Below is a list of some of the most common video game genres with a brief description and some

(7)

dozens of genres and/or sub- genres. This is an attempt to give a broader perspective of types of video games.[9]

Shooter:

One of the oldest genres of video game is the classic shooter. It has roots in the early 60s with Steve Russell's Spacewar! Shooters are games that require the player to blow away enemies or objects in order to survive and continue gameplay. They usually fall into one of two categories:

1) horizontal, or 2) vertical. However, like Spacewar!, Star Castle, and Asteroids, there are shooters that are neither horizontal or vertical. These involve moving around the screen and shooting in whatever direction necessary to keep from being destroyed. Other classic examples include Defender, Galaga, R-Type, Phoenix, Space Invaders, Tempest, Xevious, and Zaxxon.

First-Person-Shooter (or FPS):

This is an example of a sub-genre that has grown enough to become its own genre. In fact, because of the prevalence of these games, many people use the term "shooter" to refer to first- person-shooters. These games are realtime fast-paced action games in which the player navigates an environment from a first-person perspective and, usually, blows everything and everyone away whenever possible. Though Wolfenstein 3D is regarded as the first sucessful example of this genre, it wasn't until the release of Doom that people began to recognize the true potential of this type of gaming. Doom enabled multiple game players to share in the same game simultaneously via modem and LAN. This would become the standard of this genre, opening the game format up to multi-player deathmatches that would become so important to the format that some put little effort into story and the single-player experience in general (i.e., Unreal Tournament and Quake III). Though this is a relatively new genre (since the early 1990s), it has grown in popularity. Examples of first-person-shooter franchises include Wolfenstein 3D, Doom, Duke Nukem 3D, Descent, Marathon, GoldenEye, Halo, Quake, and Time Splitters.

(8)

Adventure:

Another of the first video game genres, especially from the computer platforms, was the adventure game. These were initially text-based games like Will Crowther's Collossal Cave and the original Zork games. However, as the power of the gaming systems grew, developers tried to tap into the visual capabilities of each consecutive platform. The Atari VCS offered a game entitled Adventure. Roberta Williams began develping the King's Quest series for Sierra Online in an attempt to add interactive graphics and point-and-click funtionality to the more puzzle- oriented traditional text-based adventure. There has always been a strong following for this genre because of the challenge of puzzle-solving and the general lack of violence. This has also made it popular for many non-traditional gaming demographics. In recent years, LucasArts and Cyan have been known for their contributions to the adventure genre. Other examples of adventure franchises include Gabriel Knight, Indiana Jones, Maniac Mansion, Monkey Island, Myst, Police Quest, and Syberia.

Platform:

It is believed that the platform genre began in 1981 with the release of the games Donkey Kong and Space Panic. Games within this genre are usually identified by navigating environments that require timing and jumping in order to reach a desitination while avoiding and/or disposing of enemies. Many of these, like Donkey Kong, have a series of screens, each with its own individual pattern of challenges. As companies began to develop platform games for home consoles and computers instead of arcade machines (i.e. Super Mario Bros for the Famicom and Nintendo Entertainment system), they took advantage of the evolving processors and and greater memory capacity by transcending individual screens and utilizing actively side-scrolling worlds.

This evolutionary step in platform games moved them closer to immersive stories rather than challenging puzzles. Platform video games continued to evolve as gaming became more 3D. One of the greatest 3D platform games was introduced with the launch of the Nintendo 64 and was called Super Mario 64. Examples of 2D screen-based platform franchises include Bubble Bobble, Burgertime, Donkey Kong, Lode Runner, Mario Bros., and Space Panic. Examples of

(9)

Super Mario Bros., and Vectorman. Examples of 3D platform franchises include Crash Bandicoot, Pac-Man World, Spyro the Dragon, and the aforementioned Super Mario 64.

Role-Playing Games (RPGs):

Evolving from pen-and-paper games like Dungeons and Dragons, RPGs are a special type of adventure game that usually incorporate three major elements: 1) a specific quest, 2) a process for evolving a character through experience to improve his/her ability to handle deadlier foes, 3) the careful acquisition and management if inventory items for the quest (i.e., weapons, armor, healing items, food, and tools). Having said that, these games still have many variations and appearances.

Puzzle:

In many ways, puzzle video games are not dissimilar from traditional puzzles. What they offer are unique environments that are not as easily introduced in one's living room. For example, Wetrix enables the player to build up a series of walls that would be able to contain a deluge of water when it falls. Successful completion of a level involves capturing enough water. Other examples include Tetris, Intelligent Qube, Puzzle Bobble, Puyo Puyo, Devil Dice, and Mercury.

Simulations:

By their nature, simulations are attempts to accurately re-create an experience. These can be in the form of management simulations like SimCity and Theme Hospital, or more hands on like MicroSoft Flight Simulator or Gran Turismo.

Strategy/Tactics:

Like simulations, strategy/tactics games attempt to capture a sense of realism for the game player to experience. However, these titles are often turn-based as opposed to realtime and they give the player a greater sense of specific control over a situation. Franchises that fall into this genre include Ogre Tactics, Command and Conquer, Final Fantasy Tactics, and Worms.

(10)

Sports

As you can imagine, sports games are those that simulate the playing of sports. Many of these have incorporated novel aspects beyond the games themselves. For example, most football video games like the Madden series enable the player to create and customize teams and play them for an entire season. Furthermore, many sports games include management elements beyond the games themselves. There is quite a bit of variety in this genre for fans of the games, the players, and the behind the scenes responsibilities of owning a team.

Fighting:

These titles pit player against player (usually 2 players head-to-head) and involve one triumphing over the other. Many of these games include a single player mode, but the real draw to this genre is the ability to demonstrate one's gaming prowess against a friend. Examples of franchises in this genre include Street Fighter, Soul Calibur, Mortal Kombat, Tekken, Virtua Fighter, Dead or Alive, King of Fighters, and Bloody Roar.

Dance/Rhythm:

Dance Dance Revolution is probably the single largest franchise in this genre. Of the rest, many require a specialized controller like DDR, but several don't. This grouping of games is differentiated by the timed elements usually synched to music somehow. Other good examples of this form include Parappa the Rapper, Bust a Groove, Gitaroo Man, Space Channel 5, Frequency, Beatmania, Para Para Paradise, Donkey Konga, and Eyetoy Groove.

Survival Horror:

As the name suggests, these titles are an interactive evolutionary step of the horror genre. The main gameplay mechanic in these is to "survive" the environment that includes fantastic or supernatural elements that are very frightening and often disturbing. Many of these titles are

(11)

Hybrids:

It's important to recognize that many games are not limited to a single genre. Some are the combination of two or more game types. In fact, as gaming evolves, we see lines blurred between genres more frequently than not. Since the introduction of 3D gaming, the action/adventure genre has grown dramatically. It is practically a catch-all category that incorporates 3D games with real-time combat and puzzle-solving in a fairly cohesive storyline.

Many of these games are also first-person-shooters. Some are 3D platform titles. And most survival horror titles qualify as Action/Adventure games too. Another example of a hybrid is Myst. It is both an adventure game and a puzzle game. However, it is most certainly not an Action/Adventure game. [9]

Below we have three examples of games that show the evolution of video games throughout the decades:

Figure 1 – OXO (Noughts And Crosses) – made in 1952 by Alexander S. Douglas Figure 2 - Super Mario Bros. – made in 1985 by Nintendo Co., Ltd

Figure 3 – Assassin’s Creed – made in 2007 by Ubisoft Entertainment S.A.

(12)

Figure 2 - Super Mario Bros.

Figure 1 - OXO

Figure 3 - Assassin's Creed

(13)

1.3 Artificial intelligence in video games

Artificial intelligence in video games can be implemented using two types of algorithms:

machine learning, which we will discuss in the next chapter and search algo rithms. Next we will make a classification of search algorithms and describe them.

Search in artificial intelligence

Search plays a major role in solving many Artificial Intelligence (AI) problems. Search is a universal problem-solving mechanism in AI. In many problems, sequence of steps required to solve is not known in advance but must be determined by systematic trial-and-error exploration of alternatives.

The problems that are addressed by AI search algorithms fall into three general classes:

single-agent path-finding problems, two-players games, and constraint-satisfaction problems [10]

Path-finding problems

Classic examples in the AI literature of path- finding problems are sliding- title puzzles, Rubik’s Cube and theorem proving. The single-title puzzles are common test beds for research in AI search algorithms as they are very simple to represent and manipulate. Real-world problems include the traveling salesman problem, vehicle navigation, and the wiring of VLSI circuits. In each case, the task is to find a sequence of operations that map an initial state to a goal state.

Two-players games

Two-players games are two-player perfect information games. Chess, checkers, and othello are some of the two-player games.

(14)

Constraint-satisfaction problems

Eight Queens problem is the best example. The task is to place eight queens on an 8*8 chessboard such that no two queens are on the same row, column or diagonal. Real- world examples of constraint satisfaction problems are planning and scheduling applications.

Proble m space

Problem space is a set of states and the connections between to represent the problem. Problem space graph is used to represent a problem space. In the graph, the states are represented by nodes of the graph, and the operators by edges between nodes. Although most problem spaces correspond to graphs with more than one path between a pair of nodes, for simplicity they are often represented as trees, where the initial state is the root of the tree. The cost of the simplification is that any state that can be reached by two different paths will be represented by duplicate nodes thereby increasing the tree size. The benefit of using tree is that the absence of cycles greatly simplifies many search algorithms.

One feature that distinguishes AI search algorithms from other graph-searching algorithms is the size of the graph involved. For example, the entire chess graph is estimated to contain over 10^40 nodes. Even a simple problem like twenty-four puzzle contains almost 10^25 nodes. As a result, the problem-space graphs of AI problems are never represented explicitly by listing each state, but rather are implicitly represented by specifying an initial state and a set of operators to generate new states from existing states. Moreover the size o f an AI problem is rarely expressed as the number of nodes in its problem-space graph. The two parameters of a search tree that determine the efficiency of various search algorithms are its branching factor and its solution depth. The branching factor is nothing but average number of children of a given node. For example in eight-puzzle problem, the average branching factor is square-root(3) or about 1.732.

The solution depth of a problem instance is the length of a shortest path from the initial state to a goal state or the length of a shortest sequence of operators that solve the problem. [10]

(15)

Classes of search algorithms

For virtual search spaces

Algorithms for searching virtual spaces are used in constraint satisfaction problem, where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical equations and inequations. They are also used when the goal is to find a variable assignment that will maximize or minimize a certain function of those variables. A lgorithms for these problems include the basic brute-force search (also called "naïve" or "uninformed" search), and a variety of heuristics that try to exploit partial knowledge about structure of the space, such as linear relaxation, constraint generation, and constraint propagation.

An important subclass are the local search methods, that view the elements of the search space as the vertices of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by moving from item to item along the edges, for example according to the steepest descent or best- first criterion, or in a stochastic search. This category includes a great variety of general metaheuristic methods, such as simulated annealing, tabu search, A-teams, and genetic programming, that combine arbitrary heuristics in specific ways.

This class also includes various tree search algorithms, that view the elements as vertices of a tree, and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as depth- first search and breadth-first search, as well as various heuristic-based search tree pruning methods such as backtracking and branch and bound. Unlike general metaheuristics, which at best work only in a probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal solution, if given enough time.

Another important sub-class consists of algorithms for exploring the game tree of multiple-player games, such as chess or backgammon, whose nodes consist of all possible game situations that could result from the current situation. The goal in these problems is to find the move that provides the best chance of a win, taking into account all possible moves of the opponent(s).

Similar problems occur when humans or machines have to make successive decisions whose outcomes are not entirely under one's control, such as in robot guidance or in marketing, financial or military strategy planning. This kind of problem - combinatorial search - has been

(16)

extensively studied in the context of artificial intelligence. Examples of algorithms for this class are the minimax algorithm, alpha-beta pruning, and the A* algorithm.[11]

For sub-structures of a given structure

The name combinatorial search is generally used for algorithms that look for a specific sub- structure of a given discrete structure, such as a graph, a string, a finite group, and so on. The term combinatorial optimization is typically used when the goal is to find a sub-structure with a maximum (or minimum) value of some parameter. (Since the sub-structure is usually represented in the computer by a set of integer variables with constraints, these problems can be viewed as special cases of constraint satisfaction or discrete optimization; but they are usually formulated and solved in a more abstract setting where the internal representation is not explicitly mentioned.)

An important and extensively studied subclass are the graph algorithms, in particular graph traversal algorithms, for finding specific sub-structures in a given graph — such as subgraphs, paths, circuits, and so on. Examples include Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour algorithm, and Prim's algorithm.

Another important subclass of this category are the string searching algorithms, that search for patterns within strings. Two famous examples are the Boyer–Moore and Knuth–Morris–Pratt algorithms, and several algorithms based on the suffix tree data structure. [11]

(17)

II. Learning to play

2.1 Software agents: autonomous programs

There have been many debates about how a software agent should be defined, and a universally accepted definition has not been given. We have below some definitions of the software agents from the point of view of theoreticians and o ther people in this domain.

Software agents are computational systems that inhabit some complex dynamic environment, sense and act autonomously in this environment, and by doing so realize a set of goals

or tasks for which they are designed (P.Maes) [12].

Russel and Norvig point out: “the notion of an agent is meant to be a tool for analyzing systems, not an absolute characterization that divides the world into agents and non-agents” [14]. We can say that ‘software agent’ or ‘intelligent agent’ can best be seen as an umbrella term for programs that to some extent display attributes commonly associated with agency. (Nwana H.) [13]

Franklin and Graesser (1996) give their own definitions of what an agent is supposed to be : “an autonomous agent is a system situated within and part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.”[18]

(18)

Agent characte ristics

There are some characteristics that define a so ftware agent. In the section below we will enumerate and describe each characteristic as defined and accepted in literature.

Reactive

In order for an agent to function autonomously in any given environment it must be able to perceive its environment and act in a timely fashion upon changes that occur in it. A software agent may employ any type and number of sensors to sense its environment. The agent can react to sensory input using its actuators. We can differentiate between various degrees of reactivity, ranging from purely reactive software agents on the one hand, to software agents that deliberate extensively before reacting on the other hand. [15]

Pro-active and goal-oriented

Pro-activity is a more specialized form of autonomy. When an agent is said to be pro-active, it does not simply act in response to its environment, but it will exhibit goal-directed behavior and take initiative to attain its goals or design objectives [15].

Figure 4 -Franklin and Graesser’s (1996) agent taxonomy

(19)

Deliberative

More sophisticated agents are not merely reactive (i.e., operating in a stimulus response manner) but are able to reason about their actions. The ability to reason enables agents to act pro-actively and perform more difficult tasks in complex environments. [16]

Continual

In order for a software agent to accomplish its goals, it must have temporal continuity. The agent must be able to function over a certain period of time with persistence of identity and state.

Agents that have an episodic memory are able to learn from previous experiences [17]

Adaptive

Making agents adaptive is one way of attaining flexibility. Adaptivity can range from (1) being able to adapt flexibly to short-term, smaller changes in the environment, to (2) dealing with more significant and long-term (lasting) changes in the environment [12]. Software agents that are able to deal with longterm changes are able to improve themselves and their performance over time by storing the knowledge of past experiences within their internal state and taking this knowledge into account when executing (similar) actions in the future. [16]

Communicative

Agents should be able to communicate with other software agents and even humans in order to complete their tasks and help other agents complete theirs. [15] Especially in multi-agent systems the ability to communicate with other agents is important. Agents communicate with other agents using a common agent language, such as FIPA ACL or KQML. When agents communicate with humans they must communicate using natural language. [16]

Mobile

Although mobility is neither a necessary nor a sufficient condition for agency, many scholars (Gilbert et al. 1995; Nwana 1996; Brazier et al. 2003) include mobility when describing agent characteristics. It is oftentimes better for an agent to interact with a remote system at the location of the remote system than to do it over a distance. Several reasons for this preferred form of

(20)

interaction can be specified. A first reason is efficiency. Network traffic can be reduced when the agent and the remote system are at the same location. Fo r instance, when an agent queries a remote database, data has to be sent back and forth between the remote database and the agent.

This communication can be kept local when the agent and the remote system are at the same location, thereby reducing the strain on external networks such as the Internet. A second reason is that data need not be exchanged over (public) networks but can be handled locally. It also means that agents can operate more secure. A third reason is that the remote system only allows for agents to operate locally, thereby forcing the agent to migrate to the remote location.[16]

Agent topologies

In the next section we will investigate different agent topologies and try to place different agents into classes. A typology refers to the study of types of entities. There are several dimensions to classify existing software agents.

Firstly, agents may be classified by their mobility, i.e. by their ability to move around some network. This yields the classes of static or mobile agents. [13]

Secondly, they may be classed as either deliberative or reactive. Deliberative agents derive from the deliberative thinking paradigm: the agents possess an internal symbolic, reasoning model and they engage in planning and negotiation in order to achieve coordination with other agents. Work on reactive agents originate from research carried out by Brooks (1986) and Agre & Chapman (1987). These agents on the contrary do not have any internal, symbolic models of their environment, and they act using a stimulus/respo nse type of behavior by responding to the present state of the environment in which they are embedded (Ferber, 1994). Indeed, Brooks has argued that intelligent behavior can be realized without the sort of explicit, symbolic representations of traditional AI (Brooks, 1991b). [13]

(21)

Thirdly, agents may be classified along several ideal and primary attributes which agents should exhibit. We have identified a minimal list of three: autonomy, learning and cooperation. We appreciate that any such list is co ntentious, but it is no more or no less so than any other proposal. Hence, we are not claiming that this is a necessary or sufficient set. Autonomy refers to the principle that agents can operate on their own without the need for human guidance, even though this would sometimes be invaluable. Hence agents have individual internal states and goals, and they act in such a manner as to meet its goals on behalf of its user. A key element of their autonomy is their proactiveness, i.e. their ability to take the initiative rather than acting simply in response to their environment (Wooldridge & Jennings, 1995a). Cooperation with other agents is paramount: it is the raison d ’être for having multiple agents in the first place in contrast to having just one. In order to cooperate, agents need to possess a social ability, i.e. the ability to interact with other agents and possibly humans via some communication language (Wooldridge & Jennings, 1995a). Having said this, it is possible for agents to coordinate their actions without cooperation (Nwana et al., 1996). Lastly, for agent systems to be truly smart, they would have to learn as they react and/or interact with their external environment. In our view, agents are (or should be) disembodied bits of intelligence. Though, we will not attempt to define what intelligence is, we maintain that a key attribute of any intelligent being is its ability to learn. The learning may also take the form of increased performance over time. [13]

Figure 5 - Typology based on Nwana’s (Nwana 1996) primary attribute dimension

(22)

2.2 Machine learning: making computers act

Tom M.Mitchell is one of the most prominent theoreticians in the field of machine learning and he gave a widely quoted definition about this term : “A computer program is said to learn from experience E with respect to some class of tasks T and performa nce measure P, if its performance at tasks in T, as measured by P, improves with experience E”[19]

Machine learning algorithms are organized into taxonomy, based on the desired outcome of the algorithm. Common algorithm types include:

• Supervised learning - where the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the

classification problem: the learner is required to learn (to approximate the behavior of) a function which maps a vector into one of several classes by looking at several input-output examples of the function.

• Unsupervised learning - which models a set of inputs: labeled examples are not available.

• Semi-supervised learning - which combines both labeled and unlabeled examples to generate an appropriate function or classifier.

• Reinforcement learning - where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm.

• Transduction - similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training

outputs, and new inputs.

• Learning to learn - where the algorithm learns its own inductive bias based on previous experience. [19]

Of course the list above can be enlarged with other types of machine learning and all kinds of hybrid algorithms.

(23)

2.3 Embedding machine learning in computer games

2.3.1 The history of machine learning and games

The history of the interaction of machine learning and computer game-playing started at the same time Artificial Intelligence appeared. Arthur Samuel was the first one to apply machine learning techniques in games, working on his checker-playing program, this way being a pioneer in machine- learning and game-playing techniques (Samuel, 1959, 1967). He facilitated the evolution of both fields and since then the intersection of the two can be found regularly in conferences in their respective fields and have evolved throughout research along the years.

In recent years, the computer games industry has discovered AI as a necessary element

to make games more entertaining and challenging and, vice versa, AI has discovered computer games as an interesting and rewarding application area. The industry’s perspective is

witnessed by a multitude of recent books on introductions to AI techniques for game programmers or a series of edited collections of articles. AI research on computer

games began to follow developments in the games industry early on, but since John Laird’s keynote address at the AAAI 2000 conference, in which he advocated Interactive Computer Games as a challenging and rewarding application area for AI (Laird & van Lent, 2001) [20]

which demonstrates the growing importance of game-playing applications for Artificial Intelligence.[21]

2.3.2 What do the games have to learn?

The need of AI in computer games is growing, and with a good user experience must come an artificial intelligence part as good as it. These needs can be recast as a call for new practical and theoretical tools to help with [21]:

learning to play the game: Game worlds provide excellent test beds for investigating the potential to improve agents’ capabilities via learning. The environment can be constructed with varying characteristics, from deterministic and discrete as in classical

(24)

board and card games to non-deterministic and continuous as in action computer games.

Learning algorithms for such tasks have been studied quite thoroughly. Probably the best- known instance of a learning game-playing agent is the Backgammon-playing program TD-Gammon (Tesauro, 1995) [23].

learning about players: Opponent modeling, partner modeling, team modeling, and multiple team modeling are fascinating, interdependent and largely unsolved challenges that aim at improving play by trying to discover and exploit the plans, strengths, and weaknesses of a player’s opponents and/or partners. One of the grand challenges in this line of work are games like Poker, where opponent modeling is crucial to improve over game-theoretically optimal play (Billings et al., 2002) [22].

behavior capture of players: Creating a convincing avatar based on a player’s in-game behavior is an interesting and challenging supervised learning task. For example, in Massive Multiplayer Online Role-playing Games (MMORGs) an avatar that is trained to simulate a user’s game-playing behavior could take his creator’s place at times when the human player cannot attend to his game character. First steps in this area have been made in commercial video games such as Forza Motorsport (Xbox) where the player can train a

“Drivatar” that learns to go around the track in the style of the player by observing and learning from the driving style of that player and generalizing to new tracks and cars.

model selection and stability: Online settings lead to what is effectively the unsupervised construction of models by supervised algorithms. Methods for biasing the proposed model space without significant loss of predictive power are critical not just for learning efficiency, but interpretive ability and end-user confidence. optimizing for adaptivity:

Building opponents that can just barely lose in interesting ways is just as important for the game world as creating world-class opponents. This requires building highly adaptive models that can substantively personalize to adversaries or partners with a wide range of competence and rapid shifts in play style. By introducing a very different set of update and optimization criteria for learners, a wealth of new research targets are created [21].

model interpretation: “What’s my next move” is not the only query desired of models in a game, but it is certainly the one which gets the most attention. Creating the illusion of

(25)

decision to decision enables queries that provide the foundation for a host of social actions in a game such as predictions, contracts, counter- factual assertions, advice, justification, negotiation, and demagoguery. These can have as much or more influence on outcomes as actual in- game actions [21].

performance: Resource requirements for update and inference will always be of great importance. The AI does not get the bulk of the CPU or memory, and the machines driving the market will always be underpowered compared to typical desktops at any point in time [21].

2.3.3 Creating a Learning Agent

We will now see how to implement a learning system into a game, because of the need of the games to be adaptable. The example we will use is given by Nick Palmer in the article Machine Learning in Games development and is the example of a team-strategy based paintball game.

The aim of the program is for a team of seven agents to capture the opponent's flag and bring it back to their starting position. They must do this without being hit by t he opposing team's paintballs. For a start, we shall exclude the team strategy from our discussions, and concentrate on the individual agents - it is no good having a perfect strategy if the agents aren't smart enough to survive on their own [24].

We must consider the factors which will influence an agent’s performance in the game. Terrain is an obvious start point, as this is all that stands between the two teams, so it must be used to the agent's advantage. Secondly, there must be an element of stealth behind the behavior of each agent, as otherwise it will be simple to undermine any tactics used during the game by simply picking off the naïve agents one-by-one [24].

A learning agent is composed of a few fundamental parts: a learning element, a performance element, a curiosity element (or 'problem generator'), and a performance analyzer (or 'critic').

The learning element is the part of the agent which modifies the agent's behavior and creates improvements. Tom M.Mitchell names this component in his book “Machine Learning” The Generalizer [19]. The performance element is responsible for choosing external actions based

(26)

on the percepts it has received (percepts being information that is known by the agent about its environment). To illustrate this, consider that one of our agents is in the woods playing paintball.

He is aware of an opposing paintballer nearby. This would be the percept that the agent responds to, by selecting an action - moving behind a tree. This choice of action is made by the performance element. This component is called The Performance System by Tom M. Mitchell.

The performance analyzer judges the performance of the agent against some suitable performance measure. The performance must be judged on the same percepts as those received by the performance element - the state of affairs 'known' to the agent. When the analysis of performance has been made, the agent must decide whether or not a better performance could be made in the future, under the same circumstances. This decision is then passed to the learning element, which decides on the appropriate alteration to future behavior, and modifies the performance element accordingly. This component is called The Critic in Tom M. Mitchell’s book [19][24].

Below we have an illustrated example of the four components above working together as mentioned in Tom M. Mitchell’s book.

(27)

How do we make sure that the agent advances in its learning, and doesn't merely confine itself to previously observed behavior? This is dealt with by the curiosity eleme nt (so-called because it searches for a better solution) which has a knowledge of the desirable behavior of the agent (i.e.

it knows that being shot is not desirable, and that finding the opponent's flag is). To achieve optimal performance, this element will pose new challenges to the agent in an attempt to prevent bad habits developing. This component is called The Expe riment Generator by Tom Mitchell [19]. To understand the benefits of this, we will consider a paintballer who is hiding behind a tree. From his past experience, he knows that he is safe to stay where he is, and this would result in an adequate performance. However, the curiosity element kicks in, and suggests that he makes a break from his cover and heads to a nearby tree which is closer to the enemy flag. This may result in the agent ultimately being shot at, but could also achieve a more desirable goal. It is then up to the performance analyzer and the learning element to consider whether there is a benefit to this change in strategy [24].

This style of learning is known as reinforcement learning, which means that agent can see the result of its actions, but is not told directly what it should have done instead. This means that the agent must use, what is trial and error, to evaluate its performance and learn from mistakes. The

Figure 7 - Learning architecture proposed by Nick Palmer

(28)

advantage to this is that there is no limitation on the behavior, other than the limit to alterations suggested through the curiosity element. If after each action, the agent was told what its mistake was and how it should correct its behavior, then the desired behavior must already be understood [24].

As the learning agent is ultimately part of a game, it must not be left simply to work out for itself how to play. The agents must be imparted with a fair degree o f prior knowledge about the way to behave. In the case of paintball, this could include methods for avoiding fire by using cover, which may later be adapted during the learning process [24].

2.3.4 Proble ms with Learning

Although machine learning can bring a very big improvement in gameplay there are certain problems that can occur when tying to implement such an agent [24]:

Mimicking Stupidity - When teaching an AI by copying a human player's strategy, we will find that the computer could be taught badly. This is more than likely when the player is unfamiliar with a game. In this situation, a reset function may be required to bring the AI player back to its initial state, or else a minimum level must be imposed on the computer player to prevent its perfo rmance dropping below a predetermined standard.

Overfitting - This can occur if an AI agent is taught only a certain section of a game, and then expected to display intelligent behavior based on its experience. Using a FPS as an example, an agent which has learnt from its experience over one level will encounter problems when attempting a new level, as it may not have learnt the correct 'lessons' from its performance.

Local Optimality - When choosing a parameter on which the agent is to base its learning, be sure to choose one which has no dependency on earlier actions. As an example, we will take a snow-boarding game. The agent learns, through the use of an optimization algorithm, the best course to take down the ski slope, using its rotation as a parameter.

(29)

Set Behavior - Once an agent has a record of its past behavior and the resulting performance analysis, will it stick to the behavior which has been successful in the past, or will it try new methods in an attempt to improve? This is a problem which must be addressed or else an agent may either try to evaluate every possible behavior, or else stick to one without finding the optimal solution [24].

(30)

III. Relevant applications of machine learning in games

In this chapter we will present the most relevant games with machine learning techniques and try to analyze their structure and algorithms used.

3.1 TD-Gammon

One of the most impressive applications of reinforcement learning lately is presented by Gerry Tesauro in backgammon (Tesauro, 1992, 1994, 1995). Tesauro's program, TD-Gammon, needed little knowledge of table, but he learned to play very well, close to the world masters. Learning algorithm TD-Gammon was a simple combination of algorithm TD (λ) and nonlinear function approximation using a multilayer neural network trained by TD's errors calculated wit h backpropagation algorithm. [23]

To apply the learning rule a source of backgammon games was needed. Tesauro obtained an unending sequence of playing games that teach against the agent himself. To choose movements, TD-Gammon considered each of the approximately 20 ways he could get the dice and corresponding positions that would result. The artificial neuronal network was consulted to estimate each of their values. The movement that would lead to the estimated value was selected.

Continuing in this way, it was possible to easily generate a large number o f table games. Each game was treated as an episode, the sequence of positions as s0, s1, s2 ... that the rule applied after each movement of TD.

The network’s weights were established initially as low random values. Initial assessments were thus completely arbitrary. Since movements were selected based on these assessments, initial movements were inevitably weak and original games often lasted hundreds or thousands of moves up to win one. After dozens of games, however, performance has improved rapidly.

(31)

After playing about 300,000 games against itself, TD-Gammon 0.0 as described above, learned to play as good as the best previous backgammon computer programs. This was a striking result, because all previous performance implementations have used the extensive knowledge of table.

For example, the champion at that time was undoubtedly Neurogammon, another program written by Tesauro, who used neural network learning but not TD. The network used by Neurogammon was trained with copies provided by a backgammon expert, and, moreover, began with a set of features specifically designed for backgammon. TD-Gammon 0.0, on the other hand, was constructed essentially with zero backgammon knowledge. The fact that TD-Gammon was as good as Neurogammon and the other previous programs demonstrates potential of learning algorithms in games [23].

3.2 Samuel's checkers program

An important precursor of TD-Gammon was one part of Arthur Samuel’s work (1959, 1967) to build the learning programs for playing checkers. Samuel was one of the first to effectively use search heuristics and what we now call temporal-difference learning. Samuel wrote a checkers program for the IBM 701 in 1952. Its learning program was completed in 1955 and was shown on television in 1956. Future versions have acquired an ability to play good, but not expert [25].

Samuel used two main methods of learning, the easiest one being called rote learning. It consisted simply in saving a description of each position of the board met during the game, along with its value determined by back-up procedure Min-Max. The result was that, if a position has already been seen again it appeared as a terminal position of a search tree, the search depth was amplified effectively because the position value is cached earlier.

Rote learning produced slow but continuous improvement, which was most effective for opening and closing the game. Samuel's program has become a "better than average" program, having learned from many games with itself, a broad array of human opponents, and in a s upervised learning mode from examples in books [25].

(32)

3.3 NeuroChess

NeuroChess is a program that learned the game of chess created by Sebastian Thrun. It learns from a combination of playing with itself and the observation of experts. The search algorithm is taken from the GNU Chess, but the assessor is from a neural network.

NeuroChess is an application of "explanation-based neural networks" (EBNN). EBNN adjusts explanation based learning theory for neural networks. NeuroChess represents a chess position as a feature vector of 175 data by given by hand. It has two neural networks, V and M. V is the evaluation function, which takes as input an array of features and produces an assessment used to play. M is a model of chess, which has as input an array of features and the predicted value of two half- moves later. The result of M is used in the formation of V [26].

After training of M with 120,000 games of master level, and V with other 2400 games, NeuroChess defeated the GNU Chess about 13% of the games. The same preparation, a version not using model M, NeuroChess chess wins about 10% of the games against GNU Chess. EBNN thus improves its performance. Of course these percents will grow with the number of games played.

Learning from the games against itself has failed. However, learning from the experts first introduced some artifacts in program evaluations. A clear example of learning from experts is that the program observes that the probability of winning is greater when experts move their queen in the middle. The program does not realize that it is moved only when threatened.

Therefore the best results are provided by combining the games with itself and learning from experts [26].

3.4 SAL

SAL is a system developed by Michael Gherrity learners to play more games.

SALcan apparently learn any two-player board game of perfect information, provided the game can be played on a rectangular board and uses fixed types of pieces [27].

(33)

He works the following way:

The user supplies a move generator, written in C, and some constants that declare the size of the board and the number and type of pieces used in the game. This is SAL’s only knowledge of the rules of the game.

SAL chooses its moves with a search, like any chess program. It uses a two-ply, full-width alpha-beta search, plus a consistency search for quiescence.

SAL’s evaluation function is a backpropagation neural network (actually, it uses separate evaluators for the two sides of the game). The inputs to the network are features representing the board position, the number of pieces of each type on the board, the type of piece just moved, the type of piece just captured, and a bunch of ad hoc features having to do with pieces and squares under attack and threats to win the game [27] .

The evaluator of the neural network is trained by temporal difference methods to estimate the outcome of the game based on the current state.

SAL has been tested on games tic-tac-toe (X and 0), Connect-four, and chess.

 In tic-tac-toe, SAL learned to play perfect in 20,000 games, which means it is a bit slow.

 At Connect-four, a more complex game, after 100,000 games SAL has learned to overcome an opponent about 80% of the time.

In chess, SAL has played 4,200 games against GNU Chess, with GNU Chess set to play at a rate of one move per second. SAL has won eight games and lost others. The first game shows random play by SAL, since it hadn’t learned anything yet. A drawn game shows how much SAL improved, though it still plays very weakly. The graphs show that SAL continues to improve [27].

(34)

IV. “A Tale of Two Catapults” – machine learning techniques applied in games

4.1 Problem statement

The application is meant to be a Windows Phone application, a turn-based game with catapults.

The player controls a catapult and he must hit the opponent (in our case an AI opponent). In order to win, the player needs to hit the opponent three times directly. If the player is hit three times by the opponent, then the game is lost. The logic of the application is the essential part because the AI opponent is not a normal one. The application is, in essence an example of how machine learning techniques are used in video games and more than that. The central idea of this project is not the game itself but the design of a framework, an age nt-oriented architecture that can be applied to any turn-based two player game.

4.2 Motivation

Our motivation for this application is the growing need of computer game players for a better experience and a more capable AI. Through the last decades games evolved drastically due to the new limits of graphic processors. Along with this evolution came the evolution of the video gamer, whose needs for a more competent opponent are still a problem nowadays. The problems of the recent games are that their AI component is too weak, because it is static and limited.

There were some very good examples of machine learning techniques used in video games that had shown us the great potential of machine learning, but unfortunately they are just a few.

Inspired by Tom M. Mitchell’s book “Machine Learning” we have decided to make a computer game with a learning component and to show the results obtained.

(35)

4.3 Analysis and design

4.3.1 Proble m analysis

Being a game that can be only played on the computer, we do not have any human expert to teach our program different strategies, besides the fact that the game itself is pretty simple. So we have chosen a learning technique that doesn’t require a direct feedback. The most recommended learning strategy for this kind of task is reinforcement learning or any other learning that is works on the same principle, more exactly playing repeated games with itself in order to learn.

That being said we have chosen gradient descent learning for our program, a learning technique in which the player has to learn a function, called target function which will reflect the mathematical function that calculates the distance of a projectile that is being shot with an initial velocity and angle. In order to evaluate a state of the game we will give every state a value assigned by the target function. For the final game states it is very easy, we decide that the value of the target function is V(s) = 2000 if the game is won or V(s) = -2000 if the game is lost, s being the current state of the game we have to assign a value. For the intermediate states we need an approximation function that must be as relevant as it can in reflecting the real value of the state.

The mathematical function the program has to learn is D = v02

/ g · sin (2θ), where v0 is the initial velocity, g is the gravitational acceleration (9,8 m/s2) and θ is the angle formed between the gound and the catapult arm which fires, given in degrees. Because the above function is too complex to be learned in that form, it will be expressed in a approximation form, a linear combination of the features of the environment in a given state.

We consider the following features of the environment to be relevant for a state:

 x1 – the number of hits player one can take;

 x2 – the number of hits player two can take;

 x3 – the distance between the two catapults (in meters);

 x4 – the intensity of the wind (in m/s) and it’s direction;

 x5 – the initial velocity of the projectile;

(36)

 x6 – the angle where the projectile in shot from, in degrees.

To calculate the approximation function correctly we assign to each feature a weight represented by a float number and a bias. These weights are changed throughout the learning making the function converge to what we want. This being said, the approximatio n function can be expressed in the following way:

Ṽ(s) = w

0 +

w

i

· x

i

where

x

i represents the values of the features,

w

i represent the values of the weights and

w

0

represents the bias.

In order to learn properly, the agent needs some kind of feedback from a critic.In our case that feedback is given indirectly by the value of a function called training function. For every state s, the training function can be expressed in the following way:

Vtrain(s) = Ṽ(Succesor(s))

The Successor(s) function is the state that was right after s in the list of training examples provided by the game’s performance system. We need just oane more adjustment to make the algorithm work properly: the way weights are recalculated every turn. For that we have chosen the LMS(least mean squares) algorithm in which every turn wehave to minimize the value of E, which represents the error made by the sum of squared differences between the target function and the training function. The ideal situation, in which the agent has learned the function perfectly the target function is the same with the training function. In this way, each turn,for every single training example(made from a state and the associated value) every weight will recalibrate after the following function:

w

i

= w

i

+

η· (Vtrain(s) - Ṽ(s))· x

i

where

η

is a very small constant called learning rate,and the other values have been explained

(37)

After repeated simulations the target function should represent more and more correctly the desired function and finally it will be the same.

As stated before, our learning model was inspired by Tom M. Mitchell’s book „Machine Learning”[19], and our learning model can be modeled in the same way using the following components:

Experiment Generator

Performance System

Critic

Generalizer New initial state

Game History Training examples

Updated target function

Figure 8 - Catapult Learning Model

(38)

4.3.2 Solution’s design

Use case diagram

The application being a simple leads to a simple use case diagram. The player can start the game, play the game or exit the game. The characteristics of the game will be detailed in future

sections.

Figure 9 - Use case diagram

Class diagram

Because the application has many classes including the UI, made with the XNA Game Studio 4.0 framework together with the logic of the application which include s the abstract classes representing the agent architecture as well as the implementat ion of those classes with some specific algorithms used for this game we will show only the class diagram for the abstract agent model to clarify the relationship between the classes.

(39)

 The diagrams were made using the CASE instrument StarUML Description of classes

In this section we will describe every component of the model, and present the way it works and what is its relationship with other classes. The main classes of the agent architecture are: Play, Simulation, Player, AI, Action, Environment and State. Further, the machine learning part is given by the classes Learner, Critic, Generalizer, TargetFunction and PerformanceSystem.

Play – this class is practically the core of the model. It has only one abstract method called

“Start” which takes as parameter an initial state. It also has as class members the two players and the environment the game takes place. From this class two abstract classes are derived:

Simulation and Game.

Figure 10 - Class diagram

(40)

Simulation – this class is as important as its base class Play because it implements the actual logic of the game on an abstract level, independent of the implementation of the involved classes. It also has a function “IsComplete” which checks the environment if the game is over.

In the implementation section we will present a code snippet from the algorithm. It also has two components that facilitate machine learning: the Learner and History which will be discussed later.

Player – this class represents the abstract form of a player. It has as a member his number in the turn-based game. This class is further expanded in two classes: AI and Human. It has only one method ”SelectAction” which chooses the best action from the current state in order to win the game.

AI – the representation of an AI player in a game, having all the components required to play it properly. As its parent class it has the “SelectAction” method but in addition it also has two new elements which are part of the machine learning system. The classes TargetFunction and PerformanceSystem are specific to a player and each of them has its role in the process of learning. Details about how these two classes work will be discussed later in this chapter.

Action – this class represents a basic action an agent can do when is his turn. It has only one method “Execute”, which applies the effect of the selected action of the player over the current state.

Environment – this class represents the environment in which the game takes place. Usually it has a current state and a method to update it. The environment is an important element in the model because it’s one of the key elements in the interaction process. The State class is a property of the environment that gives a clear description of a state in the whole process as well as the parameters needed for evaluation.

Learner – this is the main component of the learning mechanism, consisting of the Critic and Generalizer and combining the two processes using a history of the game as an input. The History is created during the simulation and based on it the learner adjusts the target function.

Referințe

DOCUMENTE SIMILARE

BACKGROUND: Handball is one of the Olympic Games demanding increased physical fitness of its players moderate load training is introduced to these players in order

As a result, the researchers having studied strategies used in computer games and health to improve cognitive skills and habits namely game-based learning and

Machine learning (ML) is an artificial intelligence (AI) technique that allows computers to learn without being explicitly programmed.Machine learning is concerned with the

The Constitution of the Republic of Albania regulates three situations that require extraordinary measures: war situation, state of emergency and state of natural

In the following two chapters he offers an extensive analysis of game theory with its various social games: various cooperative and non-cooperative social games which illustrate

Most computer game players have experienced the sensation of temporarily losing their character in a given gameplay situation when they cannot control the character, simply because

Traditionally, research into systems composed of multiple agents was carried out under the banner of Distributed Artificial Intelligence ( DAI ), and has historically been divided

Models and implementation of distributed applications, Databases, Distributed artificial intelligence, Games computational theory, Languages, tools and programming media,