Creating AIs for 4×4 TicTacToe – Part 5 – Monte-Carlo Simulation

The current MinMax-AI, we implemented last time, is already working much better than the Heuristic driven AI! But is there a way to boost the performance even more? Yes there is! One possible approach is to improve the MinMax-Search itself (e.g. using Alpha-Beta-Search). This is basically the same as MinMax just with a few improvements. That is why we will not dive deeper into this.

Instead we will tackle the current biggest problem when using MinMax-Algorithm: The Heuristic! If we use a bad heuristic, we can even look five turns in the future and still lose horribly. Even though finding an “OK” heuristic is easy, it is very hard to find an extremly good heuristic. Therefore it would be amazing, if we could eliminate the dependence on a hand-crafted heuristic. This is what we are going to do by using Monte-Carlo simulation. Eliminating the need of hand-crafted heuristics also offer the possibility to create an AI for almost any game no matter how good or bad you as human play the game. Our final AI will be based on Monte-Carlo Search. First we introduce the basic idea called Monte-Carlo simulation. Monte-Carlo simulation can be summed up as:

Do something many times until you know the probabilities of outcome(s)

This is quite general but nearly everyone used this principle at least a few times in your life without knowing it. Imagine someone give you a dice, where one side tends to occur more often than the other sides and the question is: “What side is the side which occurs more frequently than the other sides?”

Answering this question is quite easy, but kind of exhausting. All you have to do, is throwing the dice a few thousand times and count which side occured how often. If you are not sure about what side is manipiulated, you may throw the dice a few thousand times more. After throwing the dice tens of thousands of times, you are able to determine which side is more likely to occur than the others (also called the “Law of large numbers”). Even though there is some uncertainty left, you can be quite sure to find nearly always the manipulated side.

We apply the same concept to AIs. You start in the current state and play thousand of rounds (until end) beginning from the current state. However in the very first branch, you remember/save the following things:

  • How often did you visit this branch?
  • How often did you win chosing this branch?
  • How often did you lose chosing this branch?

This approach differs from the MinMax approach in the way, you do not have to go through all but only through a few thousand states/actions to estimate the winrate. The idea behind this concept is, that if start in the best branch/action, you will win more often, even if you do not know all states, because the probability to end in a victory state is higher if the branch overall has a higher probability to lead to a victory. If you played thousand games by randomly chosing one possible next action, you may end up in the following situation:

You see we always chose a random action, even in the first branch and count starting from the very first branch. In the above example we now can determine the next best action, which is the action with the highest win-ration (or win+draw – ratio).

Today we will not provide an example code as we are building the final AI based on theses basics in the next post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.