Last time we walked through the MinMax-Algorithm. This algorithm itself without any adaptions would be already enough, if we want to create an AI which is playing perfectly. As this approach guarantees to find the action which will always yield in a victory even/due to if both players play perfectly. This also holds true for complex games.
However there is one major drawback, the algorithm is finding the best next action but does so by enumerating all future states. Just imagine you would do this for the very first “X” on an empty field. Then the algorithm would have to go through
- 16 initial states
- 15 states can be reached for each initial state (–> 16*15)
Overall this will end in 16 x 15 x 14 x 13 x … (or 16!) states to check, which are about 20.922.790.000.000 states. It takes like forever to check all of these states (in reality there are less states because four X/O in a row/diagonal/… are already enough to end a whole branch of the game tree). However if you want to have an enemy which does not “think” two hours of what the next best action is, this is for sure not the way to go.
Instead you want to stop at a depth which is good enough for the AI but still results in good performance. Just cutting the branch after a specific amount of turns done, will not solve this problem. Just imagine you do not look more turns in the future than exactly two. This means you may get a game tree which looks like this:
It basically works but a new challenge occurs: In the very beginning there are only a few situations in which you are able to calculate the whole game tree (until victory/draw/loos) in just two turns. But if you are not able to calculate this, you are also not able to find the biggest/smallest next number. That is why we use heuristics to approximate the quality of a state.
Whenever we reach a state which is a maximum-depth state but we did not win/loose then we use the heuristic to estimate the value of this state. We already saw that heuristics are only approximations of the real world. Therefore using heuristics for state rating may lead to suboptimal actions, even though in most cases they work reasonably well enough.
We now apply the following heuristic to estimate our board value:
- If enemy has three O in row/diagonal/column the board value is decreased by 10 (for each three row/column/…)
- The more X you have in a row/diagonal/column with at least one empty cell, the better is the state (increase by one for each X in row/column/…)
- If enemy has four O in row/diagonal/column the board value is decreased by 1000
This heuristic gives boards with the enemy having a maximum of 2 signs in a row/colum/diagonal the highest value. This results in trying to block as soon as possible traps and similar problems. Applying this heuristic to our situation mentioned earlier we end in the following MinMax-Situation:
In the next step we will create an AI, which is even harder to beat than our current MinMax-AI. But to achieve this we have to go through a few basics regarding Monte-Carlo Simulation in the next post.
This is a similar AI, which uses the same MinMax approach but a slightly different heuristic: