Have I Been Trained: The Messy Reality of AI Data and How to Get Your Art Out

Have I Been Trained: The Messy Reality of AI Data and How to Get Your Art Out

You’ve seen the images. Swirling, cosmic cats. Cyberpunk cities that look suspiciously like a concept artist's portfolio from 2018. If you’ve spent any time online lately, you know that generative AI is basically a giant sponge. It soaked up the internet. But it didn't just take the Wikipedia entries and public domain books. It took your photos, your sketches, and your professional portfolio.

The site Have I Been Trained changed the conversation.

Launched by the group Spawning, it was the first real "wait a second" moment for the creative community. Before it showed up, we all just sort of assumed our data was being sucked into a black hole. Now, there’s a search bar. You type in a name or drop an image, and suddenly, you’re looking at the LAION-5B dataset. This is the massive collection of five billion image-text pairs that built Stable Diffusion.

Seeing your own face or your private artwork show up in a dataset used to train a billion-dollar AI model is... well, it's a lot. Honestly, it’s a bit of a gut punch.

Why Spawning and Have I Been Trained Actually Exist

Mat Dryhurst and Holly Herndon aren't just tech enthusiasts; they are artists who realized the "Wild West" era of AI training was inherently exploitative. They built Spawning to give creators a seat at the table. Basically, the platform acts as a bridge. It indexes these massive datasets and lets you search them.

It’s not perfect. It can't magically delete your data from a model that has already been trained.

Think of an AI model like a baked cake. You can't really pull the eggs out once it’s out of the oven. But you can tell the baker not to use your eggs in the next batch. That is the core value proposition here. By using Have I Been Trained, you can flag your work for "opt-out."

📖 Related: Why the CH 46E Sea Knight Helicopter Refused to Quit

This signals to companies like Stability AI that you do not consent to your work being used in future versions of their software. It’s a consent layer that didn’t exist three years ago.

The LAION-5B Problem

What are you actually searching when you use the tool? Mostly LAION-5B. This dataset was scraped by a German non-profit. They claimed it was for research. But then, it became the foundation for commercial tools.

The dataset contains everything. Medical records that were accidentally public. Personal family photos. Watermarked stock photography from Getty Images. It’s a mess.

When you search Have I Been Trained, you aren't searching the AI model itself. You are searching the index of what was used to feed it. If you find your work, you can create an account and claim it. It's a bit of a process, but it’s currently the only standardized way to say "no" at scale.

Does Opting Out Even Work?

This is where things get sticky. If you opt out on the site today, your image doesn't vanish from the internet. It doesn't even disappear from Stable Diffusion 1.5.

It works for the future.

👉 See also: What Does Geodesic Mean? The Math Behind Straight Lines on a Curvy Planet

Stability AI, the creators of Stable Diffusion, agreed to honor these opt-out requests for their SDXL and future models. This was a massive win for Spawning. It proved that a grassroots tool could actually force a multi-billion dollar tech company to change its data ingestion pipeline.

However, there’s a catch. Not every AI company cares. Midjourney hasn't signed on. OpenAI is largely a "black box" regarding what exactly went into DALL-E 3. So, while Have I Been Trained is the gold standard for transparency, it only covers the parts of the AI world that choose to be transparent.

That’s the limitation. You’re fighting a fragmented war.

The Psychology of Seeing Your Work in the Machine

I’ve talked to illustrators who found their entire life’s work in the dataset. It feels like a violation. You spend twenty years developing a style, and then you see it distilled into a mathematical weight.

Some people argue that "it’s just like a human looking at art."

Kinda. But not really. A human doesn't ingest five billion images in a weekend. A human doesn't replicate a style with 99% accuracy at the push of a button for $20 a month. Using Have I Been Trained is often the first step in the grieving process for the "old" internet. The one where your portfolio was a gallery, not a buffet for a scraper.

✨ Don't miss: Starliner and Beyond: What Really Happens When Astronauts Get Trapped in Space

Beyond the Search Bar: What You Should Do Now

If you are a photographer, illustrator, or even just someone who posts a lot of selfies, you need a strategy. Relying on a single website isn't enough. The AI landscape moves too fast for one tool to be a silver bullet.

First, go to the site. Search your name. Search your handle. If you find stuff, hit the opt-out button. It takes five minutes.

But then, look into "poisoning" and "cloaking" tools.

Researchers at the University of Chicago developed Glaze and Nightshade. These are different from Have I Been Trained. While Spawning handles the legal and ethical opt-out, Glaze and Nightshade handle the technical defense.

Glaze makes subtle changes to your art—invisible to humans—that confuse an AI's perception of style. Nightshade goes further; it actually "corrupts" the training data. If an AI scrapes a "Nightshaded" image of a dog, it might start thinking a dog is a toaster.

It’s digital self-defense.

Actionable Steps for Creators

  1. Register with Spawning: Create a profile on Have I Been Trained so you can bulk-claim images. This is much faster than doing it one by one.
  2. Update your Robots.txt: If you host your own website, make sure your developer adds the NoAI tags. Major scrapers like GPTBot now (mostly) respect these headers.
  3. Use Glaze for New Uploads: Before posting to ArtStation or Instagram, run your high-value pieces through Glaze. It adds a protective layer that makes the data less "tasty" for models.
  4. Audit Your Socials: If you have an old Flickr or a public Pinterest board, check them. These are high-priority targets for scrapers because they are easy to crawl.
  5. Watch the Copyright Office: The US Copyright Office is constantly taking comments on AI. Stay informed. The tech is ahead of the law, but the law is finally starting to lace up its shoes.

The reality is that Have I Been Trained is a lighthouse. It doesn't stop the tide, but it helps you see where the rocks are. We are moving toward an era of "informed consent" in data. It’s a slow, annoying, and often frustrating transition. But by using these tools, you're at least making it harder for your work to be taken without a fight.

Check your data. Claim your work. It’s your digital footprint, after all. You should be the one deciding where it leads.