Creating AIs for 4×4 TicTacToe – Part 6 – Hardcore AI

In this last post of our series we implement an even more intelligent AI, which also can be used for any other game (the basic principle can be applied to any game). Last time we saw, that Monte-Carlo Simulation enables us to get rid of the handmade Heuristics. But there is one little detail left, which will improve Monte-Carlo Simulation. If we look at our gametree of last time, we did every starting turn about the same time, because we always took a random move. However, we are able to add two more strategies:

  • Use a handcrafted Heuristic to help finding “good” turns
  • The amount of victories determines how good a branch is, even while simulation

It seems strange to add a handcrafted Heuristic after we used Monte-Carlo to remove the dependence on Heuristics. The difference is, that now we are using a Heuristic to give Monte-Carlo “hints” of where some good branches may be. We simulate like before from the very beginning to the very end, but this time we use in a few turns the Heuristic as used by our Heuristic-Driven AI to determine the next action to do. In about 10% of all actions we use the heuristic in the other 90% we use random moves as before. As Pseudo-Code it basically looks like:

This will enable our simulation to find good actions even faster and with less simulations. Therefore our AI is getting overall faster to a good approximation. The second point is kind of obvious, but we can use this idea even more than we currently do.

The best branch after 1000 games is also very likely to be the best branch after 10000 games or even 100000 games. However, maybe two branches differ only by a few wins. So what we want to know is, which of these two competing branches is better. By doing more random simulations we may find out this, too. However, it is better to concentrate on only these two branches and exclude every other branch. On this way we get a really good approximation of the two win-ratios and therefore are able to say with confidence, that branch A or B is better.

Even though we expect the other branches to be always much worse, we should not exclude them completely, maybe we were just lucky with these two branches or unlucky with the actions taken in the other branches. This can be done by randomly chosing if the best action should be taken or not.


As the Win-Game-Ratio is for a low amount of simulations very unprecise we at first do a few hundred simulations to get a rough approximation of the victory/lose distribution.

In the following we implemented a Monte-Carlo-AI which uses the Win-Ratio-Move, but not the Heuristics. The code can be easily extended to use Heuristics.

Open in new window

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.