A tour of generative AI tools
This is starting to get interesting
We have entered a new hype cycle phase for AI (artificial intelligence), one that has perhaps already exceeded past waves of interest in such technology. So what is different about these new apps and tools?
The technology behind these new generative AI tools is extremely complex, but it’s worth trying to gain a high-level understanding of what is going on here. For that I would recommend reading a walk-through like this one on the Wolfram web site. It talks about neural nets and how these are created and used with chat bots. For AI images, this post looks at the technology used in the Stable Diffusion image generator.
While the technical details are interesting, the questions that many people have center around what these tools can be used for, the potential disruptive effects of their use, and the ethical implications of these new technologies.
These tools can do a wide array of tasks, but many of the currently popular tools fall into four broad categories: text chat (LLMs), image generation, coding assistance (like GitHub Copilot), and writing help (like the AI creation tools in Notion or Office 365). Let’s take a deeper look at the first two of these now.
Text Chat Apps
OpenAI chat bots
Many of the recent headlines around AI chat have been garnered by tools created by the startup OpenAI. These include ChatGPT, GPT-4, and several other chat bots like Dragonfly and Sage. In addition, technologies from OpenAI are being used in products from other companies—including the new Bing Chat from Microsoft, which we will cover later.
The capability of tools like ChatGPT can be quite impressive. While it warns that it can make mistakes and that it doesn’t know what happened after 2021, it can work pretty well in making up stories or other creative tasks, and can really excel at technical challenges like coming up with web layouts or database queries. While the answers can sometimes be incorrect, sometimes they just need a quick edit or two to work.
ChatGPT can currently be tried out on the OpenAI web site as a “research preview” which lets the user have open-ended chats which are then saved in the left sidebar. OpenAI have also added a “Plus” tier where the user can pay for upgraded features. Other ways to use ChatGPT include some third-party apps (like Poe, which we will cover later on) which allow users to interact with ChatGPT and the newer GPT-4 easily from their phone.
Microsoft is an investor in OpenAI and has been working on integrating their technology into Bing’s search offerings. A big part of that so far has been adding an AI chat mode to Bing. So now, for certain users of Microsoft’s Edge browser (that get an invite) Bing has a “CHAT” tab at the top of their web page—while for the typical web search Bing is still the default, under the “SEARCH” tab. (For invited users the chat feature can also be accessed on the Bing app. ▶ 🍎)
Bing Chat comes with the ability to answer technical questions and generate creative text content (just like ChatGPT, which would make sense). But another feature that Bing offers is in line with its mission as a search tool—recommendations. So when I have asked Bing for recommendations for cool coffee shops in Santa Fe and good places to eat local food in Albuquerque, Bing has scoured the web (or at least its own listings for the web) for answers and brought back a list of possibilities, along with links to web pages.
Bard from Google
Google’s AI work has long fueled speculation and discussion: last year a story broke about engineer Blake Lemoine discussing the idea that Google’s LaMDA tool has consciousness (he appears to hold a minority view on the issue, for now). While it has been talked about for a while now, the only major public preview for a while of LaMDA’s abilities was the Google AI Test Kitchen app, which allowed a few different chat modes that were all highly constrained in terms of subject matter and capability. That has all changed now with the (currently invite-required) release of the Bard.
While noting right in the logo that it is an experiment and announcing prominently at the bottom that it may offer some “inaccurate or offensive information,” Bard is the most general purpose chat bot yet offered to the public by Google. But where do its capabilities rank? It is still early but it seems to deliver better in some areas than others. Its answers to front-end code questions have so far been a little lacking in quality; it does better with creative writing prompts but seems inclined to discursive tangents. So far, I have found Bard to be capable of the kind of recommendations that Bing Chat can do, like “good coffee shops in Flagstaff,” but unlike Bing Chat it currently does not include any links. One currently unique feature is that it will share multiple drafts of the same answer.
Another entry into the chat bot race, Claude (and related chat bots like Claude+) is from a company called Anthropic, which has put its product out in another limited invite-only preview. So it is different from the others in some ways—for one thing, Claude+ is a bit of a scold. Want information about cannabinoids? NO. Ask a lighthearted question about good pickup lines for someone with the astrological sign Aries? It will simply recommend against pickup lines (a good position in some contexts probably, but perhaps a Puritanical and rigid one in others).
However Claude+ is informative the way Bing and Bard can be when asked practical questions like where to fly a drone or get a cup of coffee. For creative writing tasks, Claude+ shows a strange mix of surprising creativity with occasional tangential lost context that reads pretty well overall for a first draft.
One app with (limited, pay version offers more) access to Claude+, along with ChatGPT, GPT-4, and a few others, is the already-mentioned Poe 🍎. This app currently offers access to these chat bots on its app and on the web.
Image Generation Tools
Bing Image Creator
The image generator that probably started the recent wave of interest was DALL-E Mini, a version of the DALL-E model from OpenAI that became available to the public. So how does someone use DALL-E Mini? Type in just a text description (”prompt”) and you get an image (or several) back that (with varying degrees of quality and accuracy) portrays what is described.
As an early entrant to this space, DALL-E Mini will often create distorted and messy images. It’s hard to predict: some prompts seem to return fairly coherent pictures, while others seem way off. Very impressive on its own, DALL-E Mini has largely been superseded at this point by other models, both from within OpenAI (like DALL-E 2) and outside (like Stable Diffusion). However DALL-E Mini remains freelyaccessible (at sites like Craiyon) and easy to use and people continue to call on it to generate images.
Now available to try from OpenAI (which allows a certain number of free images per month for invited users, after which a premium account is required), DALL-E 2 can both create original images from prompts, and generate alternate versions of images that are uploaded to it. Much improved from its predecessor, DALL-E 2 can often create pretty good birds, dogs, and other animals; for people it does better in art styles than in realism; it can create sparse images but often the output has a sketchy, drawn quality that can leave it looking busy but incomplete.
Prompts for abstract ideas (e.g. eyeball planet) output images that can be less detailed than the same prompt when given to Midjourney. Overall DALL-E 2 is best when working with its strengths—which include prompts that require understanding art style requests, making a quick simple sketch of a visual idea, or cartoon images that evoke a colorful 90s aesthetic.
Bing Image Creator
Microsoft has been quickly adding AI capabilities to their offerings, and one of the latest examples is the new Bing Image Creator preview, rolling out now with limited availability.
Another example of Microsoft partnering with OpenAI, the Bing Image Creator notes right at the top of the page that it is “powered by DALL-E.” So while there could well be some changes to the model, this tool will likely output images that are similar to images from DALL-E 2. However from a small sample set of generated images, the Bing Image Creator seems to use a more detailed, painted kind of style than DALL-E 2, while showing similar capabilities overall.
A very powerful image generation tool, Midjourney has a lush, detailed, painted style that can make many types of images from simplistic logos all the way to complex simulated photographs. It can weave complex images with ambitious levels of detail and realism—but in all that detail there are often flaws. To make the image that they are after, users can type in a prompt, pick the best image out of the first four, then easily iterate on that one with the “Make Variations” tool, along with a few different upscaling options that often smooth out some details.
Midjourney can be used to create fantastic images—or to modify existing ones. In addition to being able to generate based on a text prompt, Midjourney can also generate a new image based on an existing image plus a text prompt. So you could, for example, upload a picture of a parrot and then prompt “as a dragon” and you just might end up with an image of a dragon-bird. Or upload an image of an existing building and prompt “as a log cabin” and see what that might look like.
Midjourney allows a brief free trial, and then is available as a paid subscription with a few tiers. The web UI for Midjourney is actually through a Discord channel where it uses a bot to generate and return images when users use the “/imagine” command in certain channels—new users can get a few free images generated in one of the newcomer rooms, but in order to keep generating they are required to sign up on the Midjourney web site before returning to the Discord channel and using one of the “General Image Gen” or other rooms to create.
Based on code that has been released to the public by Stability AI, Stable Diffusion is another powerful image generator that can output realistic images in vivid detail. Like other current generators, certain details like the number of digits or legs on a person can be a bit off sometimes—as in Midjourney, generating several generations of variations may be needed to get close to the desired image.
Like other tools, Stable Diffusion can create images from text prompts or text + image prompts. The way they do it is a bit unique though—the user gets to set what percentage the uploaded image should weigh in the new image. Selecting 100% will basically just copy the image, while 0% will ignore the image and create only based on the text prompt.
Giving each of these image generators effective prompts seems to come with a bit of a learning curve, and I will say that Stable Diffusion seems just a little more unpredictable in its output than Midjourney—sometimes smoothly detailed, sometimes chaotic. The way it interprets the grammar of prompts seems different—it seems just as likely to mash two things up (very imaginatively at times) than depict them next to one another. But this might be something I can get better at as my “prompt engineering” for Stable Diffusion improves with experience.
Stable Diffusion can be tried out online at their Dream Studio website (which gives out a limited number of free “credits” for image generation to new users) and on open source AI sites like Hugging Face.
There is no definitive way to tell that a snippet of text was generated by an AI like ChatGPT and not a human writer. A teacher using the unethical tool Turnitin has already wrongly flagged a student for using ChatGPT to create an essay that she had actually written herself—and this kind of situation is going to become more and more common as educators continue to rely on outdated methodologies as the world around them changes.
The image generators add various properties to output; for example images generated with DALL-E 2 have a multi-colored logo in the bottom right; Bing images have a b in the bottom left; generated images are also likely to have additional hidden metadata describing info like what version of what algorithm was used to generate the image.
New technology trends are often met with misinformed resistance and generative AI is no exception. Copyright fans are having an especially difficult time—these image models may use copyrighted artwork as training data (an influence) but they do not usually output images that violate copyright under current standards. (If your art evokes Chuck Close, and you have been to the Pace Gallery and seen his paintings—so what?!) Some corporations that are heavily invested in commercializing art like Getty Images have launched ambitious lawsuits attempting to re-define the meaning of copyright. The question is, will this lead to another land-grab by the “intellectual property” lobby which is comprised mostly of large media corporations, law firms, art galleries, and publishing houses?
As far as the chat bots, a moral panic seems to be swelling in some areas, with a dubious group that includes Elon Musk having recently signed a letter demanding a “pause” on AI development. The Biden administration’s Commerce Department has recently raised the terrifying prospect of over-regulating the industry with a bureaucratic “certification process.” Such thinking is not only stupid but also ultimately futile, likely to at best push cutting-edge research outside the US—and stopping misinformation is a separate issue and if politicians were serious about that there would have been a crackdown on Facebook by now.
Chat bots and image generators reflect their training data, and this data includes biases and misconceptions present in the world. Obviously training data needs to get better, as do the algorithms. But seeing these tools as inherently problematic because they reflect issues in society is simplistic. These tools could be used for a wide range of creative purposes, and seeing them as inherently without value is dumb reactionary Luddite nonsense.
Many of the people warning about AI are the people who sit on economic bottlenecks that don’t serve the overall system. Google has seen an enormous industry rise around its search capability—SEO and ads services have become key selling points for many web hosting companies, since being found on the web has long revolved around Google search and ads visibility. But if new chat-based search tools start to displace traditional web search, much of the specific expertise that people have built up around Google products and services would suddenly become much less useful in the marketplace. Fear is growing in some areas of an Adwords Rust Belt growing as Microsoft and some startups end Google’s decades of dominance.
Many visual artists on the web were still celebrating their rejection of NFTs when suddenly they felt a threat from AI image generators displacing them. Instead of commissioning them to create an original artwork, people might just go to an image generation app. Work becomes scarce, they argue, as the machines do all the creating.
But I don’t think those scenarios are likely to play out that way. Google has stubbornly held on to a large piece of the search market for a long time: no one seems to think social media will threaten its dominance now, but there was a time of panic about possibility inside of Google—remember Google+?
As for artists, the story they tell about less work sounds a lot like the (false) story told by media giants about piracy—as it turns out, piracy actually increases interest in the work, and sales go up. The same type of effect could happen here, as more people become interested in art as it becomes more accessible.
The business uses for these tools will probably be very specific, at least at first: they can gather together info, throw together drafts, offer ideas—but they are still too unpredictable and error-prone to offer the final word on many things. Most likely businesses will adapt to them: AIs will check AIs, with occasional human audit; conventions around “human readability” of data may change as the AI is increasingly trusted to parse the data—what will the charts and data visualizations look like when they are created by AI? What might code projects look like as the human-readable style of breaking projects into many files and relying on compilers and build steps could give way to real-time translation back and forth from machine code?
So much data is already being created daily, it’s worth asking whether these AIs can be used to infer meaning from this enormous storehouse of information. Sharing a paragraph from literature in a prompt for example with ChatGPT and asking a comprehension question will often lead to an astute answer. Maybe the job of reader is just as important as the job of writer as far as AI is concerned, as they could potentially help people get better insights into their lives and the world around them.
This is just a peek at some of the prominent projects in the generative AI space right now, but there are many more. Aside from Microsoft (MSFT) and Google (GOOGL) being very active in this space, other tech giants are conducting their own extensive research on AI and several open source chat bots are being worked on right now.
I recommend that anyone who is interested in these technologies try some of them out. I personally attempt to log my interactions with these AIs by downloading many of the images I generate and capturing most of my chat conversations in my notes app (some of them save your data on their server but many don’t). Going forward I might share some stories about those interactions—surprising turns from the chat bot being able to do a certain task and then suddenly refusing to, a random glitchy and incorrect answer that looks like it was written offline by a simplistic algorithm, and even surprising similarities in answers to similar creative prompts at the same time across different chat bots.
Disclosure: Through personal holdings that I control and through investment LLCs in which I am a partner, I buy, hold, and sell stocks, bonds, ETFs, options, NFTs, and cryptocurrencies, not limited to but including some of those discussed in this newsletter.
A version of DALL-E Mini can be downloaded from GitHub and after some fairly technical installation steps on a Mac (and likely Windows etc. as well) the software is able to generate images at the command line using Python.
The standard UI for many of these image generators is to create a group of four different images, and then the user usually picks the best one out of the four.