I took the prompt of “Stone-Age Home Entertainment” and attempted to get Bing’s AI (powered by Dall-E 3) to generate the image. The scene I wanted to see was this: A couple of Neanderthals are watching a big stone television perched on the tusks of an unamused Wooly Mammoth. I avoided specifics in my description because I’ve used this generator before and I know that it gets finnicky.
Each prompt generates 4 images which I’ll include because they’re fun. The first prompt I gave the AI:
Two neanderthals watch a stone-age tv perched on the tusks of a wooly mammoth. The wooly mammoth is not amused.
So the idea of having the TV on the tusks was just being missed entirely. I assumed it was because of my word choice: “perched” is a weirdly specific word that apparently brought birds to mind. I also wanted to make the results more cartoony and less realistic so I added my favorite Neanderthal-cartoonist to the prompt.
In the style of a farside cartoon. Two neanderthals watch a stone-age tv. The TV is sitting on the tusks of a wooly mammoth. The wooly mammoth is not amused.
What the fuck? This is clearly a form of high art that I’m just too dumb to understand.
Stylistically closer to what I wanted, but definitely not Farside. Though I should probably be glad they aren’t stealing from Gary Larson. The TV is still not on the ground. I have at this point fully given up on the idea that it would somehow show the Mammoth’s face through the frame. I decided to add a perspective to the prompt to try and get that over-the-shoulder shot I wanted from the beginning. I also changed “not amused” to “annoyed” to try and get a more intense expression on the mammoth.
In the style of a farside cartoon. A stone-age television is being held up by the tusks of a wooly mammoth. The wooly mammoth is annoyed. The camera is looking at this over the shoulders of two neanderthals.
Okay well that thing is kind of a mammoth, and it is holding the TV. The over-the-shoulder shot was surprisingly successful although throwing the word “camera” in the mix added some freaky hallucinations in the first image.
Overall I’ve found in the past that Bing (or Dall-E) does best with open-ended prompts that don’t get too specific. If you come to the generator with a specific image in mind you’re going to be disappointed. As a little experiment I just fed them “Stone-Age Home Entertainment” and got this frustratingly nice image:
Clearly this technology has a ways to go and I have to work on the prompts I feed it. Too often a bit of language I used (perched, camera) became an unintended element of the cartoon. However, as a way of quickly iterating on ideas and generating inspiration it’s a very useful tool.
Also, I know Midjourney and some of the more powerful paid image generators can create some pretty impressive stuff. I can imagine those tools would be incredibly useful for generating textures and images for designers to manipulate further. I just wish corporations weren’t jumping at the chance to replace real people with AI.