daydreams about AI art (or: Sora radicalized me into learning animation)
Despite being an AI researcher (standard disclaimers1), I tend to dislike "AI art". I have little tolerance for people who prompt an image generator and call themselves artists or claim it as their own work. Studio Ghibliāstyle slop avatars make me cringe a little. Iāve spent almost US$6,000 US$7,000 US$8,000 commissioning artists since I started keeping track, and $0 on AI image generators.
Still, Sora's release was big enough that I felt I should try it, if only to be able to talk about it over lunch.
...It was a bit of an excuse. There were scenes I wanted to generate, not to post or claim as my own, but just to look at. I sympathize with people using AI image generators insofar as I feel like sometimes you just have a fleeting fancy and want to look at some image you're imagining a little bit ā not enough to commission an artist, nor to scrounge up a few extra months or years with which to learn to draw. Life is short. All else equal, it would be nice for you to get to look at the image! Of course, all else is not equal; there's no way to offer the technology to only people like you, not people who want it for their business who might otherwise have hired an artist, much less people who want to make deepfake porn or erode society's sense of shared reality.
Alas, Sora disappointed me. I like colorful glowy things and intricate patterns of motion, among other things, so I asked for some videos of juggling. Several prompts later, I gave up. The balls kept appearing in the juggler's hands out of nowhere and arcing offscreen, never to return, all while they obliviously kept juggling empty air.
Maybe the next generation of models will be better.
I'm sure different people reject AI art for different reasons, and would draw different boundaries for what counts as "AI art". For example, Ted Chiang argues that AI can't make art starting from the definition:
art is something that results from making a lot of choices.
When drawing a picture, an artist makes choices about the composition, perspective, colors, texture, shading, etc.; which adds up to more art, by this metric, than somebody who types in just a few words into an image or video generator. But Chiang says he would be willing to call somebody an artist if, for example, they used tens of thousands of words to finely control a generator. He approvingly cites an art exhibit that features only 20 of more than 100,000 images the artist, Bennett Miller, generated with DALLĀ·E 2. So if choices in the text prompt count as much as in the actual graphical medium, perhaps it's helpful to view the user who prompts Sora with a short sentence as making, say, 0.01% art; like drawing the first paintbrush stroke of 10,000 that would make up an artistās picture.
The sad thing about Sora and all the most well-known AI image generators, then, is that they're not designed to let you accumulate your 0.01%s of choices into a 100% piece. On Sora, each prompt is a brand new video. If you donāt like one aspect of a video, all you can do is tweak your prompt and pray for the regenerated clip to miraculously keep all the aspects you did like. In theory, if you're talking to Gemini or ChatGPT and have had it generate an image for you, you can have it revise the image through followup conversation, but the connection between your words and the actual revision is still brittle. While trying out their functionalities for this post, I asked ChatGPT to make my dragonās eyes green instead of yellow, and in the same turn it did that it turned a delightful glance into a wide-eyed stare.
To learn more, though, I sought out some articles and videos from people deeper into the AI art space. Defenders of AI art emphasized the human effort that goes into a good generation ā finding good input images, tweaking parameters and seeds, and combing through the results until the right output image pops out. In one tutorial, the instructor interactively highlighted small regions of a photograph in Photoshop to regenerate them, then explained how he evaluated the three candidate generations Photoshop gave him to choose the best one. Another YouTuber used a tool he built himself to revise and compose Midjourney generations, copying cropped pieces of his generated image between the tools; at one point he opened ChatGPT to generate a specific design that he copied back into Midjourney to have it transfer the insignia into a part of the image.
A bunch of things jumped out at me. Firstly, these people were iterating on their image and making choices! Fewer ones than an artist drawing an image from scratch, surely, but more than the one-sentence prompter we started out considering. Might even a skeptic be willing to grant that they're making 1% art? 10%?
Secondly, the choices they had to make benefited from the same skills and experience a traditional (non-AI) artist might have! I'm not sure whether they did a good job, but they all had some idea for what they wanted out of each image that they were iterating towards, and they had to notice undesirable aspects of color, texture, shading, and all the rest to judge which parts to regenerate and when.
While watching these people work, I would occasionally think to myself, the thing youāre doing would be a lot easier if you actually drew on or manipulated the image directly. The Photoshopper did that at times, erasing part of the layer mask on a patch of generated pixels and going out of his way to disable the generative AI on the erase tool for tackling a specific region. The Midjourney guy, I don't know whether he had simplified his workflow in the video for demonstrating his tools, but I did feel like he was sometimes spending more effort repeatedly regenerating some areas than he would have needed to paint them with a normal paintbrush in a normal graphics editor.
But ultimately, none of this would lead to me feeling comfortable using these tools and sharing the resulting art, or recommending that my artist friends try them, because I can anticipate their glaring objection: it hardly matters how you interact with the image generator when it's still trained on stolen art, no? Even Adobe's image generation model Firefly, which Adobe says is only trained on content where they have permission to do so and which they've apparently compensated contributors over, got plenty of heat over how contributors couldn't opt out or negotiate the terms. To which the AI defender would respond, well, a model being trained on an image is just like an artist looking at another artistās art for inspiration; the artist isnāt stealing, so the model isnāt either.
Unfortunately I must cop out of clearly taking one side of this argument. Without getting too technical, I hope you understand that training an image generation model on an image does not store the image in a database that the model can refer to later and copy out; rather, it perturbs the billions of numbers in the model in some inscrutable way that inches it closer to associating the image with its label. Strangely, after billions of these perturbations, the model can generate apparently novel images and videos at least some of the time. But some other times, models regurgitate images2, not verbatim but still superhumanly precisely; so training them is clearly not analogous to taking inspiration either. I could speculate all day about whatās going on in these models and how it can or canāt be analogized to an artistās process, and about whether regurgitation is a fixable bug or a fundamental part of how the models work; but nobody really knows3, and in any case, what counts as stealing isnāt a purely technical question with an objective answer.4
(There's another independent debate about the environmental harms, for which I'd implore you read Andy Masley's explanation of why using ChatGPT isn't bad for the environment in any meaningful way ā with the caveat that video generation might in fact be bad, and for whatever itās worth I plan to donate a good chunk to environmental charities this year for all the videos I generated in spite of myself)
Still, abstaining is clearly ethically safer, so I donāt think Iāll be regularly using AI-generated images anytime soon. And yet, I like to daydream. I believe in principle AI could be helping us make art rather than displacing it. If there was an AI tool that I felt comfortable using to draw ā if, somehow, there were teams as well-resourced as the ones behind any frontier generative AI model that dedicated themselves to producing an ethical, creatively augmentative model instead ā what might it look like?
- What if it could autocomplete my paint strokes the way an LLM-integrated code editor might autocomplete my typing? What if as I start drawing a repetitive pattern or hatching a region, it suggests how to continue the rest, using the same paintbrush Iām using to produce paint strokes that I can individually accept or reject?
- What if it could criticize my art as I drew it? What if it could compare my drawing and a reference image and tell me things like, I drew the eyes too low or too close together ā opinions that I can contemplate and learn from to improve my craft, or that I can nevertheless reject, either because it's wrong or because I'm deliberately breaking the conventions for some creative goal?
- What if it could do first drafts of coloring in and block-shading my line art? I would still be in charge of picking the colors and light source and specifying roughly what goes where, but what if the AI could take a stab at filling in all the regions just based on that? If it gets a few regions wrong I can re-fill or re-shade just those ones (and if it gets a lot of regions wrong, I can undo its work and fill everything exactly how I would before).
- What if it leaned into its memorization abilities, and helped me find properly attributed reference images based on my work-in-progress sketch that I might be able to combine?
I'm drawing inspiration from coding, where I do find LLMs helpful, because code is code whether itās written by the LLM or by me; I can revise and debug and test it in the same ways. Even when coding, I think it's valid to be concerned that an LLM will unknowingly plagiarize in its generations, but I donāt worry about it in my day-to-day usage because I'm asking the LLM to revise parts of a codebase that had never existed before. I donāt think it has enough degrees of freedom to meaningfully plagiarize anything. Likewise with writing ā I can ask Claude to research a topic and then incorporate its sources, or use it to brainstorm and workshop different ways to phrase a tricky sentence; a link is a link, a word is a word. I have to confess: that Ted Chiang article I referenced earlier? Claude found it for me, in a vague back-and-forth about AI art critics, but I ended up taking nothing from the conversation other than the link. All the commentary (and flaws within) are mine alone. So too did Claude help me review the literature for interpreting diffusion models; although I thought its own summary was flawed, it definitely did a better job finding papers than me (I tried).
I can't use an image generator as a subroutine or building block in the same way, because it generates images differently than I do ā synthesizing pixels wholesale, with composition/structure/color/shading all rolled up into an inseparable mass. If it generates some eyes too high, I canāt just lasso select and move them without massive pains to reconstruct the surroundings. If it colors the eyes wrong, I can't just paint bucket in a different color.
And look, maybe my daydreams are all just complaints about minor inconveniences, or about a skill issue on my part. But arenāt minor inconveniences exactly the kind of thing we want AI to automate away? Maybe a more professional artist would still think my proposals are stealing in more subtle ways, or least removing opportunities for the artist to express their style or creativity. I donāt know. All I can say is, it would be nice to move towards a world where we can have this kind of argument because itās relevant to systems that actually exist.
The best reason to ignore everything I said, though, is that I live in a weird bubble. For whatever reason, I see 10Ć more posts denouncing any usage of AI in art or writing, than actual AI-generated art or writing. I primarily learn about new AI features from infographics explaining how to disable them. Even at work, where I get access to frontier LLMs and tooling as theyāre built, Iām naturally excited to look for new ways to use them, but I spend just as much time if not more worrying about their shortcomings and failure modes. LLMs help me with some tasks5 and not with others, so I would encourage others to give them a spin to get a sense of which of their tasks are which. If they find any where LLMs are helpful, thatās great; if not, it's nothing to write home about š¤·
Maybe the world doesnāt need more of this kind of advice? I remember reading My AI Skeptic Friends Are All Nuts and nodded along as it addressed a bunch of plausible common objections. Then I read Contra Ptacek's Terrible Article On AI, which shares anecdotes of executives who canāt fund projects that donāt (pretend to) have AI in them, and a hospital whose fruitless AI initiatives displaced mundane data improvements that might have saved lives. Elsewhere, I was forwarded a list of anecdotes about artists losing work. Every month I read a new account of some poor sysadminās struggle against LLM scrapers who ignore robots.txt.
These things suck! Iām sure I would have written a very different post if I had to deal with any of them. Maybe I should have anyway. I just reject that how much society should use AI is a one-dimensional issue where my only options for what to argue for are āmoreā and ālessā.
If Sora had made me a juggling video where the balls went in the right direction, or if whatever image generator du jour made me a good image, would I post it? Probably not. For one, Iād still have lingering concerns that the generation is unwittingly ripped off a training data example, but I could imagine some back and forths with the AI that would convince me that thatās not true for the final image. More importantly, though, I would only want to post something I thought I had contributed to enough to make it mine. Otherwise itād just be, like eevee put it, capital-W Whatever. If anybody can generate something equally good with a few sentences, what reason could I possibly have to think that theyād rather look at something I generated?
For now, then, Iāll be learning to draw and animate the way I would have before generative AI arrived on the scene. Itās not very good, but itās mine.

I work for Anthropic. Opinions in this post are, of course, my own. Though I want to mention that our LLM Claude doesn't output images or videos.↩
Somepalli et al. 2022; Carlini et al. 2023. Though the latter paper approaches regurgitation as a problem from a privacy perspective, and I would recommend reading the lead author's blog post for caveats about its implications for copyright and creative works.↩
Though thereās a lot of active research in this area (far more than I thought when I started wanting to write all this) ā for example, the modelās āattention patternsā might contain anomalies that can be used to detect and mitigate memorization/regurgitation (Ren et al. 2024; Chen et al. 2024)? But we're well short of a deep understanding.↩
Some examples without involving AI: Was the Obama "Hope" poster a fair adaptation of the source photo, or theft? What about Whaam! vis-Ć -vis the comic panels it was based on? Reasonable people will differ.↩
For the reader who still believes that LLMs are never useful for coding, I might suggest Simon Willisonās explanation and Nicholas Carliniās list of examples, which I think are both reasonably hype-free.
And yes, I know about the METR study that found that AI assistance slowed developers down when working on their own codebases. It could be that a lot of developers are actually using LLMs in ways that are detrimental to themselves. But I also think the study is entirely compatible with my view because working on a codebase you already know well is one of the worst use cases for LLM assistance. See also METR's note on scientific communication.
(Writing in late 2025, I was starting to think this footnote might not be necessary, but then while looking into some completely unrelated things I found somebody I recognized had written a blog post less than two months old calling LLMs useless for programming, so I guess it's staying.)↩