Django 10 – Implementing TicTacToe with IA

NOTE: This article was initially posted on my Substack, at https://andresalvareziglesias.substack.com/

Hi all!

The Tic Magical Line experiment is approaching to an end. In the previous articles, we have learned how to build a full stack Django version of the TicTacToe game, inside a containerized environment with the help of Docker.

Our TicTacToe is a (sort of) MMORPG. Each player can battle against other players… but also against the CPU, disguised as a dragon.

Let’s make the dragon’s brain and play a bit with the mysterious world of AI and Machine Learning…

Thanks for reading A Python journey to Full-Stack! Subscribe for free to receive new posts and support my work.

Articles in this series

Chapter 1: Let the journey start

Chapter 2: Create a containerized Django app with Gunicorn and Docker

Chapter 3: Serve Django static files with NGINX

Chapter 4: Adding a database to our stack

Chapter 5: Applications and sites

Chapter 6: Using the Django ORM

Chapter 7: Users login, logout and register

Chapter 8: Implementing the game in Player vs Player

Chapter 9: Scheduled tasks

CPU player without Machine Learning

The TicTacToe is a simple game, and the CPU player logic can be really simple too. We can do something like this:

import random
import os

from game.tictactoe.dragonagent import DragonAgent

class DragonPlay:
def __init__(self, board, type=”ai”):
self.board = board
self.type = type

def chooseMovement(self):
if self.type == “simple”:
return self.simpleMovement()
else:
raise Exception(“Not implemented yet!”)

def getEmptyPositions(self):
emptyPositions = []

for i in range(0, 9):
if self.board[i] == “E”:
emptyPositions.append(i)

return emptyPositions

def simpleMovement(self):
emptyPositions = self.getEmptyPositions()
if len(emptyPositions) == 0:
print(“No empty position to play!”)
return -1

if random.choice([True, False]):
# Choose the fist empty position and play there
return emptyPositions[0]

else:
# Choose a random empty position and play there
return random.choice(emptyPositions)

This simple agent makes random movements in a very dumb way… but allows a player to play against the CPU. Very useful for testing the game logic of our Django application until now… but a bit boring at the end.

We need a smarter dragon…

CPU player with Machine Learning

Make easy things hard, just for fun. Let’s create the same CPU player, but using a bit of AI and Machine Learning this time:

import random
import numpy as np
from tensorflow.keras.models import load_model
import os

from game.tictactoe.dragonagent import DragonAgent

class DragonPlay:
def __init__(self, board, type=”ai”):
self.board = board
self.type = type

def chooseMovement(self):
if self.type == “simple”:
return self.simpleMovement()
else:
return self.aiMovement()

def getEmptyPositions(self):
emptyPositions = []

for i in range(0, 9):
if self.board[i] == “E”:
emptyPositions.append(i)

return emptyPositions

def simpleMovement(self):
emptyPositions = self.getEmptyPositions()
if len(emptyPositions) == 0:
print(“No empty position to play!”)
return -1

if random.choice([True, False]):
# Choose the fist empty position and play there
return emptyPositions[0]

else:
# Choose a random empty position and play there
return random.choice(emptyPositions)

def aiMovement(self):
emptyPositions = self.getEmptyPositions()
if len(emptyPositions) == 0:
print(“No empty position to play!”)
return -1

agent = DragonAgent()
if os.path.exists(‘/game/tictactoe/model/dragon.keras’):
agent.model = load_model(‘/game/tictactoe/model/dragon.keras’)

validMove = False
position = -1

while not validMove:
position = agent.start(self.boardToState(self.board))
if self.board[position] == “E”:
validMove = True

return position

def boardToState(self, board):
state = []

for cell in board:
if cell == ‘E’:
state.append(0)
elif cell == ‘X’:
state.append(1)
elif cell == ‘O’:
state.append(-1)

return state

This code loads an Agent class and a Machine Learning model. The agent class is a TensorFlow based agent using the QLearning machine learning algorithm, a reinforcement algorithm that learns playing:

import numpy as np
import tensorflow as tf

class DragonAgent:
def __init__(self, alpha=0.5, discount=0.95, exploration_rate=1.0):
self.alpha = alpha
self.discount = discount
self.exploration_rate = exploration_rate
self.state = None
self.action = None

self.model = tf.keras.models.Sequential([
tf.keras.layers.Dense(32, input_shape=(9,), activation=’relu’),
tf.keras.layers.Dense(32, activation=’relu’),
tf.keras.layers.Dense(9)
])

self.model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=alpha), loss=’mse’)

def start(self, state):
self.state = np.array(state)
self.action = self.get_action(state)
return self.action

def get_action(self, state):
if np.random.uniform(0, 1) < self.exploration_rate:
action = np.random.choice(9)
else:
q_values = self.model.predict(np.array([state]))
action = np.argmax(q_values[0])
return action

def learn(self, state, action, reward, next_state):
q_update = reward
if next_state is not None:
q_values_next = self.model.predict(np.array([next_state]))
q_update += self.discount * np.max(q_values_next[0])

q_values = self.model.predict(np.array([state]))
q_values[0][action] = q_update

self.model.fit(np.array([state]), q_values, verbose=0)

self.exploration_rate *= 0.99

def step(self, state, reward):
action = self.get_action(state)
self.learn(self.state, self.action, reward, state)
self.state = np.array(state)
self.action = action
return action

It’s a bit confusing, we need to learn how we can use this agent to understand it. All will make sense at the end, believe me 🙂

How to train your dragon

In this line the previous code loaded a pre trained model:

load_model(‘/game/tictactoe/model/dragon.keras’)

But, how can we train this model? We can teach a couple of dragons how to play to TicTacToe and reward them with each victory and punish them with each defeat. The dragons can now play one time, and other, and other, and other… You get the idea.

How can we implement this? Simple: get a TicTacToe board, a couple of DragonAgent instances and let’s the play begin:

import numpy as np
from tensorflow.keras.models import load_model
import tensorflow
import os
import random
import sys

from dragonagent import DragonAgent
from tictactoe import TicTacToe

def boardToState(board):
state = []

for cell in board:
if cell == ‘E’:
state.append(0)
elif cell == ‘X’:
state.append(1)
elif cell == ‘O’:
state.append(-1)

return state

def agentPlay(prefix, name, game, agent, symbol):
validMove = False
while not validMove:
if game.freeBoardPositions() > 1:
position = agent.get_action(boardToState(game.board))
else:
position = game.getUniquePossibleMovement()

validMove = game.makeMove(symbol, position)
if validMove:
print(f”{prefix} > {name}: Plays {symbol} at position {position} | State: {game.board}”)

return game.checkGameOver()

def agentStart(prefix, name, game, agent, symbol):
validMove = False
while not validMove:
position = agent.start(boardToState(game.board))

validMove = game.makeMove(symbol, position)
if validMove:
print(f”{prefix} > {name}: Plays {symbol} at position {position} | State: {game.board}”)

return game.checkGameOver()

def playGame(prefix, agent, opponent):
emptyBoard = “EEEEEEEEE”

game = TicTacToe(emptyBoard)

# Choose who starts the game
agentIsO = random.choice([True, False])
print(f”{prefix} > NOTE: In this game the agent is {‘O’ if agentIsO else ‘X’}”)

agentInitialized = False
opponentInitialized = False

while not game.checkGameOver() and not game.noPossibleMove():
if agentIsO:
# Give an immediate reward on 1 if the agent wins
if agentInitialized:
position = agentPlay(prefix, “Agent”, game, agent, ‘O’)
else:
position = agentStart(prefix, “Agent”, game, agent, ‘O’)
agentInitialized = True

if game.checkGameOver():
print(f”{prefix} > Agent wins! Agent’s reward is: +1″)
agent.learn(boardToState(game.board), position, 1, None)
break

# Give an immediate penalty regard on -1 if the opponent wins
if opponentInitialized:
position = agentPlay(prefix, “Opponent”, game, opponent, ‘X’)
else:
position = agentStart(prefix, “Opponent”, game, opponent, ‘X’)
opponentInitialized = True

if game.checkGameOver():
print(f”{prefix} > Opponent wins! Agent’s reward is: -1″)
agent.learn(boardToState(game.board), position, -1, None)
break

else:
# Give an immediate penalty regard on -1 if the opponent wins
if opponentInitialized:
position = agentPlay(prefix, “Opponent”, game, opponent, ‘O’)
else:
position = agentStart(prefix, “Opponent”, game, opponent, ‘O’)
opponentInitialized = True

if game.checkGameOver():
print(f”{prefix} > Opponent wins! Agent’s reward is: -1″)
agent.learn(boardToState(game.board), position, -1, None)
break

# Give an immediate reward on 1 if the agent wins
if agentInitialized:
position = agentPlay(prefix, “Agent”, game, agent, ‘X’)
else:
position = agentStart(prefix, “Agent”, game, agent, ‘X’)
agentInitialized = True

if game.checkGameOver():
print(f”{prefix} > Agent wins! Agent’s reward is: +1″)
agent.learn(boardToState(game.board), position, 1, None)
break

# If no one wins, give a reward of 0
agent.step(boardToState(game.board), 0)

print(f'{prefix} > Game over! Winner: {game.winner}’)
game.dumpBoard()

if (agentIsO and game.winner == ‘O’) or (not agentIsO and game.winner == ‘X’):
return 1
elif game.winner == ‘D’:
return 0
else:
return -1

# Reopen the trained model if available
agent = DragonAgent()
if os.path.exists(‘/game/tictactoe/model/dragon.keras’):
agent.model = load_model(‘/game/tictactoe/model/dragon.keras’)

# The opponent muest be more exploratory; set yo 1.0 to always choose random actions
# exploration_rate goes from 0.0 to 1.0)
opponent = DragonAgent(exploration_rate=0.9)

# We can optionally set the number of games from command line
try:
numberOfGames = int(sys.argv[1])
except:
numberOfGames = 10

# Uncomment to disable keras training messages
tensorflow.keras.utils.disable_interactive_logging()

# Play each game
wins = 0
draws = 0
loses = 0

for numGame in range(numberOfGames):
prefix = f”{numGame+1}/{numberOfGames}”

print(f”Playing game {prefix}…”)
result = playGame(prefix, agent, opponent)

if result == 1:
wins += 1
elif result == 0:
draws += 1
else:
loses += 1

# Save the trained model after each game
agent.model.save(‘/game/tictactoe/model/dragon.keras’)

print(f'{prefix} > Training result until now: {wins} wins, {loses} loses, {draws} draws’)
print()

I’m sure that there is a better way of doing this, but remember, we are still learning, start with something that (sort of) works and improve it later 🙂

This piece of code performs any number of IA battles, learning on the way and storing the training result on a model file. Later, we can use this model file in the Tic Magical Line application.

Not very useful… but funny!

What we learned until now

This experiment has been an excuse from the beginning to the end to learn how to build a Django application inside a Dockerized environment. Everything else (the TicTacToe part, the Dragons and the machine learning) is just a bit of spice to make the learning more funny.

We have learned until now that Django is awesome. Is full of functionalities, very organized and has a toon of plugins and extensions. Very, very useful.

Now, we can use this fantastic framework to do more useful applications.

Thanks for reading A Python journey to Full-Stack! Subscribe for free to receive new posts and support my work.

About the list

Among the Python and Docker posts, I will also write about other related topics (always tech and programming topics, I promise… with the fingers crossed), like:

Software architecture
Programming environments
Linux operating system
Etc.

If you found some interesting technology, programming language or whatever, please, let me know! I’m always open to learning something new!

About the author

I’m Andrés, a full-stack software developer based in Palma, on a personal journey to improve my coding skills. I’m also a self-published fantasy writer with four published novels to my name. Feel free to ask me anything!