What LLMs Can and Cannot Do in Small Game Creation

RMAG news

In a previous article, I wrote about the difficulties of using LLMs (large language models) like ChatGPT to create small games 1.

While there are several methods to conceive game ideas using ChatGPT, getting it to think up and implement a new game is challenging. At least the current version of ChatGPT presents the following issues:

It’s difficult to get ChatGPT to come up with new game ideas detailed enough to be implemented.
ChatGPT tends to struggle with implementing new, unprecedented algorithms into source code.

Will Future GPT-oo Solve These Issues?

It’s unclear.

Given the rapid advancements in image generation AI, these issues might be resolved in about six months. Whether these challenges are surmountable with larger models or if the approach is fundamentally flawed remains to be seen at this point.

With the emergence of Claude 3, which outperforms GPT-4 in some tasks, it has become somewhat possible to predict to what extent these problems can be solved by future LLMs.

I previously attempted to create a one-button action mini-game using ChatGPT (GPTs) 2. I tried the same thing using Claude 3 Opus. The results were:

The quality of the generated ideas improved, and it became able to generate rules that effectively utilized the characteristics of one-button operation.
The source code implementing the rules also became of decent quality from the start, but it was still incomplete.
It was mostly unable to set appropriate difficulty levels, score based on risks and rewards, or define game over conditions.

There were areas of improvement and areas that remained largely unchanged.

Using this as a guide, if we imagine what LLMs will be able to do and not do in small game creation as they become more advanced, it might look something like this:

What they can do

Create ideas for game motifs, rules, etc.
Generate source code corresponding to the ideas.

What they cannot do

Judge the quality of the games they create.
Improve the bad parts of the games.

The quality of games here refers to the following:

Is the game viable? Are there any absurd game over conditions or perpetual scoring patterns?
Does it have appropriate risks and rewards? Do scores and play time increase based on the player’s skill?
Is it fun? Is the gameplay intuitive and does it provide a sense of exhilaration?

Examples of improvements to address the above issues might include:

Improving the rules. Changing game over conditions and scoring systems.
Adjusting parameters. Changing the speed and appearance frequency of the player and enemies.
Improving the controls. Changing to controls that are easy for the player to understand and do not cause stress.

It seems unlikely that LLMs alone can handle these. To judge the quality of a created game, the computer needs to play the implemented game itself and analyze the results. However, outside the scope of just LLMs, AIs that play games already exist, so it may be possible to some extent by combining such mechanisms. Honestly, it’s unclear to what extent computers can understand human sensations like exhilaration.

Having computers improve the bad parts of games seems very difficult. If AI can judge the quality of games, it might be possible to somehow do it with a reinforcement learning-like mechanism using that as a reward. However, can AI appropriately improve rules and controls to make games better, and properly adjust the extremely wide range of in-game parameters?

So at this point, it seems better to limit what LLMs can do and have them assist in small game creation. Then, what we can have LLMs do through prompts is:

Create several game ideas based on a theme provided by the user.
Create rules corresponding to the game idea selected by the user.
Elaborate the rules into properties, shapes, behaviors, collision events, etc. for each in-game object.
Create skeleton code that includes the above rules as in-program comments.
Provide library knowledge to implement the skeleton code and obtain the implemented code.

This is what LLMs can do. The resulting code and game are incomplete, so the following is done manually:

Fixing bugs in the code and rules that are not as intended.
Improving rules and code to make the game viable.
Adding game over conditions and scoring systems that take risks and rewards into account.
Adjusting rules and parameters to make the game fun.
Implementing difficulty increase methods and adjusting the difficulty curve.

Considering this, it doesn’t seem like using LLMs will reduce the effort of creation much. However, looking at the code implemented by LLMs, sometimes interesting behaviors emerge. So it seems usable even now for the purpose of creating toys that serve as the basis for game ideas. As model accuracy improves, it will be less likely to generate completely useless source code.

I’ve put the prompts for performing the above actions with Claude and some games that I want to improve and complete using them in 3. When new LLMs come out, I want to continue utilizing them in small game creation while improving the prompts.

Can AI Chatbots Create New Games? ↩

one-button-game-builder ↩

Claude’s One-Button Game Creation ↩

Leave a Reply

Your email address will not be published. Required fields are marked *