So, I really love playing Spellbook, but something I have always wanted to add since I originally considered building it was AI generated music; specifically, old school 8-bit video game music that matched the current mood of the story. It may be possible to take the data from this site and create new music the same way the game creates images. I have no idea how to train CLIP, but I did recently learn how to train a transformer for a work project, so hopefully I can make this happen. I’ll let you know if I make any progress. Hope I don’t break the internet.
My latest foray into the mysterious world of generative adversarial networks has yielded interesting results:
As illustrated (pun intended) in this post, it seemed possible to create an interactive story writer with actual graphics using GPT-3 and VQGan/CLIP, and the screen shot above proves that maybe I’m not as lazy as I thought I was. Hmm.. anyway, as a game it’s simply ahead of its time, but as an experiment it’s frickin’ cooler than dolphins with lasers. The prompts are hard coded, and as of yet I can only get it to continue the story for one or two additional prompts without devolving into madness, but it is sooo very very neat that I can just hit refresh and see a new opening passage along with a relevant image (after a few minutes of course). I can’t help but wonder how long it will be until content like becomes realistic and performant enough to replace the handmade games we have now.
VQGAN assumes you want “low” quality images if you don’t include “unreal engine” in your prompts. Engine bias? Example:
Prompt: Busy medieval tavern
Prompt: Busy medieval tavern unreal engine
I obviously need to increase the resolution of the images it creates as well, but the difference is obvious. Also, I noticed that while it generated this image the tavern was “built” first and then it started adding people. If you compare one of the first generated images to this one you’ll see what I mean:
I’m going to let this prompt run its course and I’ll post in the result later. Also, I’m running this is colab and I want to see how well my new computer performs in comparison.
I’m working on another experiment using AI to generate game content. I’m attempting to use GPT-3 to create a text-based adventure game with actual graphics provided by VQGAN + CLIP. Here’s my first attempt at a “hand-made” POC adventure (original prompt stolen from AI Dungeon):
You are Jack, a wizard living in the kingdom of Larion. You have a staff and a spellbook. You finish your long journey and finally arrive at the ruin you’ve been looking for. You have come here searching for a mystical spellbook of great power called the book of essence. You look around and see several skeletons and ghosts of the dead. This is going to be dangerous. You need to find the book before you can leave, but you have to be careful, or you may end up a skeleton too.
Using another GPT-3 prompt I wrote to extract the setting from the text above, I got this:
An enchanted ruin in the kingdom of Larion
I then fed that into a Colab notebook which, after a few minutes, generated this:
Let’s see if we can keep this up. I need to tell GPT-3 what I’m going to do next. In order to keep generating the main storyline, the perspective has to be reversed, so in order to say something like:
I explore the ruins looking for a library.
Instead, you say:
You explore the ruins looking for a library.
Here’s the next sequence from GPT-3:
Suddenly, a ghost appears. It says, “I am the ghost of the old wizard and I will not let you pass. I will stop you with my magic.”
Now, we need to see what my setting extraction prompt does with this:
The ruins of a wizard’s tower in the kingdom of Larion.
Not exactly what we wanted… I need to somehow rewrite my prompt so that it favors the latest actions of the players. After some experimentation, I got this:
A wizard searching for a magical spellbook in a ruined ruin called the Book of Essence.
This would have been a better prompt to use when we generated the first image, but what we want is a simple summary of what just happened in our story, so something like:
The ghost of the old wizard appearing in an enchanted ruin in the kingdom of Larion.
I was able to get GPT-3 to focus on the current events in the story by simply providing it with examples of this in the setting extraction prompt, and I managed to generate this:
A ghost appearing to a wizard in a kingdom of Larion.
That’s more like it, and although it does not include the true setting, which in this case should be a ruin in the kingdom of Larion, we now have a better idea of the type of prompt we need to do this. Here’s the result from the image generator using the new prompt from GPT-3:
Despite the third-grader aesthetic, the novelty is undeniable. Both of these images seem relevant enough to the story to make me think that this experiment could be made into an actual game. Given that the sequence above was generated manually, my plan now is to attempt to automate this process and see where I can take it, but, unless you like waiting 10 minutes for each frame of gameplay to be generated, it won’t be featured on Steam any time soon. 😉
Added some terrain:
Also, a new discriminator that attempts to judge the traversable area with a generated structure but doesn’t really work. The generator cares not for the terrain so I’ve basically placed the structure itself in the world by hand. I’m still trying to figure out how to give the generator some terrenity.
The next steps are adding openings (windows/doors) to the generation schema and then I want to use a pathfinding algorithm to favor structures with proper doors and windows.
I’m still set on figuring out how to generate random POIs for Voxel based games like 7D2D or Minecraft. Here’s my latest research project that will hopefully get me closer to my goal:
Github here: https://github.com/newcarrotgames/wirearchy
I’m using an extremely crude GAN-ish style of procedural generation that uses something similar to an evolutionary algorithm to build structures, and then that structure is scored by a discriminator for usefulness. The generation code has been fairly simple, but the discriminator is proving to be a bit complicated. I tend to complicate things on my own, so I’m also dealing with my own insecurities during this process.. free therapy, right? Here’s some shots of what I’ve been able to do so far:
The results are promising, although it’s hard to tell that at first:
So, despite the fact that it’s still an unrecognizable blob, it does show that this network is better suited for three dimensional data. It also shows that I have quite a lot to do if I ever want this thing to produce useful output.
The corner “wall” feature is also puzzling. I expected to see something like this though considering the training data I’ve used:
Most of the training data I’ve used has the base you see there at the bottom, but judging by the generated model I think I’m messing up the orientation of the data as it goes through the system. To me it looks like the generated models are actually upside down may need to be flipped over:
Hopefully I can figure the rest of this out!
My original assumption about generative adversarial networks was wrong, but the work I did got me down the right path. Now I have plenty of tools going forward, and I have spent the past few weeks researching GAN systems and other machine learning techniques for content generation. I finally realized that Tensorflow already had what I’ve been looking for.
The system I’ve created does a fair job considering how little I know about what’s going on under the hood of these systems, but it’s obvious that it’s not able to understand the spatial data of voxels which was expected because the system I’m using was built to work with two dimensional image data. Here’s an example of the voxel data produced by the current GAN using the slightly modified handwriting generator example:
Just realized I need to fix the SSL certificate on this site… anyway, you can see that the GAN is generating 3D data but there are clear divisions between the walls of the structure because it’s really just generating an image and there’s no way for it to know that each “frame” in the image corresponds to a 3D feature. After I realized this I started trying to design a network that could work with 3D data, and from this research I learned that GANs use convolutional networks to generate content, and then after I (sort of) understood what that means I thought well.. maybe they’ve already thought of this, and, of course, they have. Hopefully, I can use the built in 3D convolutional layer that Keras already provides. The code I’m working on now is in the github repo if you want to check it out. When/If I get it to work, I’ll post the results.
Just to update my millions of followers (that’s a joke), I’ve realized that the GAN wants all of it’s training data to be the same size, so that means the prefabs have to be resized before they can be used for training. This does allow for some tricks to increase the amount of training data available by taking the same prefab and making it slightly smaller or larger to produce a unique image for training. Scaling the 3D array of voxels is not as easy as I thought it would be as the existing array resize methods are not built for this specific task or at least that’s what I’m telling myself so I can justify writing my own code to do this. My first attempts were failures, and I’ve found it is difficult for a visual thinker like myself to comprehend the process, so I’ve made this webgl prefab viewer so I can see exactly what my code is doing:
If you have the game installed it will put the prefabs in the upper right corner so you can check them out. The web app uses flask on the backend so I can use all the python code I have so far. Currently, there’s no optimizations, so the larger prefabs are slow to both load and view. I know how to fix it, but until it becomes a real blocker I’m not going to worry about it. Ok, I’m off to work on my resizing code.
BEHOLDL! THE FUTURE IS HERE!
That’s an in-game screen shot of the first pregab generated by the current code, and it’s ok to be confused. Let’s be honest, that’s not going to help you survive a horde anytime soon. What this really means is:
- 7D2D prefabs are a collection of proprietary binary and text file formats, and now the code can generate these files, and, more importantly, the game can ACTUALLY READ THEM. (shoutout to hal9000, and pille over at https://forums.7daystodie.com/forum/-7-days-to-die-pc/game-modification/prefabs)
- The ripped tensorflow GAN tutorial code should be able to generate something interesting once it has actual training data. To create this prefab, a script generated thousands of images of random blue squares, and these images were used as the initial training data.
The next goal post is to convert all the available game prefabs to training data images like the one above, re-train, and see what we get. The majority of challenges remaining with this project still exist, and to be honest, I am sure this project will not generate anything that even resembles a finished product for a long time, and that’s not tensorflow’s fault. However, I am planning on taking whatever reasonable action I can to make this a reality which probably just means emailing someone who actually knows how it works. If that’s all you have to do, why the heck not?