VQGAN Caveats

VQGAN assumes you want “low” quality images if you don’t include “unreal engine” in your prompts. Engine bias? Example:

Prompt: Busy medieval tavern

Prompt: Busy medieval tavern unreal engine

Menu on the right.. below the name of the tavern. IINEDU? INEOU? Innuendo? Hmm..

I obviously need to increase the resolution of the images it creates as well, but the difference is obvious. Also, I noticed that while it generated this image the tavern was “built” first and then it started adding people. If you compare one of the first generated images to this one you’ll see what I mean:

Guess they haven’t opened yet

I’m going to let this prompt run its course and I’ll post in the result later. Also, I’m running this is colab and I want to see how well my new computer performs in comparison.


I’m working on another experiment using AI to generate game content. I’m attempting to use GPT-3 to create a text-based adventure game with actual graphics provided by VQGAN + CLIP. Here’s my first attempt at a “hand-made” POC adventure (original prompt stolen from AI Dungeon):

You are Jack, a wizard living in the kingdom of Larion. You have a staff and a spellbook. You finish your long journey and finally arrive at the ruin you’ve been looking for. You have come here searching for a mystical spellbook of great power called the book of essence. You look around and see several skeletons and ghosts of the dead. This is going to be dangerous. You need to find the book before you can leave, but you have to be careful, or you may end up a skeleton too.

Using another GPT-3 prompt I wrote to extract the setting from the text above, I got this:

An enchanted ruin in the kingdom of Larion

I then fed that into a Colab notebook which, after a few minutes, generated this:

An enchanted ruin in the kingdom of Larion
An enchanted ruin in the kingdom of Larion

Let’s see if we can keep this up. I need to tell GPT-3 what I’m going to do next. In order to keep generating the main storyline, the perspective has to be reversed, so in order to say something like:

I explore the ruins looking for a library.

Instead, you say:

You explore the ruins looking for a library.

Here’s the next sequence from GPT-3:

Suddenly, a ghost appears. It says, “I am the ghost of the old wizard and I will not let you pass. I will stop you with my magic.”

Now, we need to see what my setting extraction prompt does with this:

The ruins of a wizard’s tower in the kingdom of Larion.

Not exactly what we wanted… I need to somehow rewrite my prompt so that it favors the latest actions of the players. After some experimentation, I got this:

A wizard searching for a magical spellbook in a ruined ruin called the Book of Essence.

This would have been a better prompt to use when we generated the first image, but what we want is a simple summary of what just happened in our story, so something like:

The ghost of the old wizard appearing in an enchanted ruin in the kingdom of Larion.

I was able to get GPT-3 to focus on the current events in the story by simply providing it with examples of this in the setting extraction prompt, and I managed to generate this:

A ghost appearing to a wizard in a kingdom of Larion.

That’s more like it, and although it does not include the true setting, which in this case should be a ruin in the kingdom of Larion, we now have a better idea of the type of prompt we need to do this. Here’s the result from the image generator using the new prompt from GPT-3:

A ghost appearing to a wizard in a kingdom of Larion.
Ohh-kayyy. I don’t know whether to quit now or hang this on the fridge. Wow. And please ignore the ghost king in the foreground.

Despite the third-grader aesthetic, the novelty is undeniable. Both of these images seem relevant enough to the story to make me think that this experiment could be made into an actual game. Given that the sequence above was generated manually, my plan now is to attempt to automate this process and see where I can take it, but, unless you like waiting 10 minutes for each frame of gameplay to be generated, it won’t be featured on Steam any time soon. 😉

Quick Update

Added some terrain:

Also, a new discriminator that attempts to judge the traversable area with a generated structure but doesn’t really work. The generator cares not for the terrain so I’ve basically placed the structure itself in the world by hand. I’m still trying to figure out how to give the generator some terrenity.

The next steps are adding openings (windows/doors) to the generation schema and then I want to use a pathfinding algorithm to favor structures with proper doors and windows.

Latest and Greatest

I’m still set on figuring out how to generate random POIs for Voxel based games like 7D2D or Minecraft. Here’s my latest research project that will hopefully get me closer to my goal:

Github here: https://github.com/newcarrotgames/wirearchy

I’m using an extremely crude GAN-ish style of procedural generation that uses something similar to an evolutionary algorithm to build structures, and then that structure is scored by a discriminator for usefulness. The generation code has been fairly simple, but the discriminator is proving to be a bit complicated. I tend to complicate things on my own, so I’m also dealing with my own insecurities during this process.. free therapy, right? Here’s some shots of what I’ve been able to do so far:

Asking the network to generate large structures.
The discriminator used here favors structures with high resource cost (iron/stone > wood).
Just added terrain using simplex noise but then I realized matching the POI to the terrain won’t be as easy as I thought.

It “works”

The results are promising, although it’s hard to tell that at first:

Side view
Top-down view

So, despite the fact that it’s still an unrecognizable blob, it does show that this network is better suited for three dimensional data. It also shows that I have quite a lot to do if I ever want this thing to produce useful output.

The corner “wall” feature is also puzzling. I expected to see something like this though considering the training data I’ve used:

One of the cabin prefabs used for training.

Most of the training data I’ve used has the base you see there at the bottom, but judging by the generated model I think I’m messing up the orientation of the data as it goes through the system. To me it looks like the generated models are actually upside down may need to be flipped over:

Hopefully I can figure the rest of this out!

Quick Update 5/29/2020

My original assumption about generative adversarial networks was wrong, but the work I did got me down the right path. Now I have plenty of tools going forward, and I have spent the past few weeks researching GAN systems and other machine learning techniques for content generation. I finally realized that Tensorflow already had what I’ve been looking for.

The system I’ve created does a fair job considering how little I know about what’s going on under the hood of these systems, but it’s obvious that it’s not able to understand the spatial data of voxels which was expected because the system I’m using was built to work with two dimensional image data. Here’s an example of the voxel data produced by the current GAN using the slightly modified handwriting generator example:

Example output from current implementation: notice the divisions between the different “walls”

Just realized I need to fix the SSL certificate on this site… anyway, you can see that the GAN is generating 3D data but there are clear divisions between the walls of the structure because it’s really just generating an image and there’s no way for it to know that each “frame” in the image corresponds to a 3D feature. After I realized this I started trying to design a network that could work with 3D data, and from this research I learned that GANs use convolutional networks to generate content, and then after I (sort of) understood what that means I thought well.. maybe they’ve already thought of this, and, of course, they have. Hopefully, I can use the built in 3D convolutional layer that Keras already provides. The code I’m working on now is in the github repo if you want to check it out. When/If I get it to work, I’ll post the results.

Update 4/24/2020

Just to update my millions of followers (that’s a joke), I’ve realized that the GAN wants all of it’s training data to be the same size, so that means the prefabs have to be resized before they can be used for training. This does allow for some tricks to increase the amount of training data available by taking the same prefab and making it slightly smaller or larger to produce a unique image for training. Scaling the 3D array of voxels is not as easy as I thought it would be as the existing array resize methods are not built for this specific task or at least that’s what I’m telling myself so I can justify writing my own code to do this. My first attempts were failures, and I’ve found it is difficult for a visual thinker like myself to comprehend the process, so I’ve made this webgl prefab viewer so I can see exactly what my code is doing:

webgl prefab viewer

If you have the game installed it will put the prefabs in the upper right corner so you can check them out. The web app uses flask on the backend so I can use all the python code I have so far. Currently, there’s no optimizations, so the larger prefabs are slow to both load and view. I know how to fix it, but until it becomes a real blocker I’m not going to worry about it. Ok, I’m off to work on my resizing code.

7D2D GAN Project Update


That’s an in-game screen shot of the first pregab generated by the current code, and it’s ok to be confused. Let’s be honest, that’s not going to help you survive a horde anytime soon. What this really means is:

A scaled-up example of the training data. The actual images are only 28×28 pixels.

The next goal post is to convert all the available game prefabs to training data images like the one above, re-train, and see what we get. The majority of challenges remaining with this project still exist, and to be honest, I am sure this project will not generate anything that even resembles a finished product for a long time, and that’s not tensorflow’s fault. However, I am planning on taking whatever reasonable action I can to make this a reality which probably just means emailing someone who actually knows how it works. If that’s all you have to do, why the heck not?

My super long long-term fuzzy warm feeling goal is creating a website called thisprefabdoesnotexist.com which works like this one: https://www.thispersondoesnotexist.com/. Good luck to me!

I failed

I was hyper-focused on releasing CarCoder for a while, but after a work sponsored hackweek where my team built a mobile app that does real time video object detection using Tensorflow, I had another idea that I could not ignore.

In between rocket league binges, I’ve been playing another game called “7 Days To Die” which is a first person/base building/survival/open world/crafting/tower defense/post zombie apocalypse game that I think is cool because of the following reasons:

  • It uses marching cubes instead of basic cube rendering for voxel map data
  • The game world is randomly generated using a neat blend of actual random landscapes/biomes with these prefab buildings/structures called Points Of Interest (POIs)

Marching cubes is established now but it’s a way of rendering a voxel based environment in a more realistic manner than just rendering a bunch of cubes (think Minecraft)

The prefab POIs are cool because they add these playable miniquests to the random world so there’s not just the typical sandbox experience like minecraft where you don’t have any real incentive to do much until nightfall. The only problem is once you’ve played one, if you see it again somewhere else, you already know everything about it, they are static elements that are always the same no matter how many times they get regenerated into a new world.

One of the games randomly generated cities. Each building/sign is a separate prefab which is then arranged by the world generator as shown above.

The idea I had was to change this, so that we could randomly generate POIs using machine learning so the player has an almost infinite amount of content. It’s a bold strategy and it will fail short of the goal but there’s so much that can be done easily with tools that exist right now I can’t think of a good reason not to pursue this project. I’m also using this to build my overall software engineering design/architecture skills so I can at least claim the work done here won’t be a complete waste ;).