Categories: GENERAL

The era of the AI-generated internet is already here

This isn’t a conspiracy theory or future prophecy. The idea of an web dominated by AI-generated content is already happening and it doesn’t look good.

Ever since ChatGPT hit the market, AI-generated content özgü been steadily seeping into the web. Artificial intelligence özgü been around for decades. But the consumer-facing ChatGPT özgü pushed AI into the mainstream, creating unprecedented accessibility to advanced AI models and demand that businesses are eager to capitalize on.

As a result, companies and users alike are leveraging generative AI to crank out high volumes of content. While the initial concern is the abundance of content containing inaccuracies, gibberish, and misinformation, the long-term effect is complete degradation of web content into useless garbage.

Garbage in, garbage out

If you’re thinking, the web already contains a bunch of useless garbage, that’s true, but this is different. “There’s a lot of garbage out there… but it özgü an insane amount of variety and diversity,” said Nader Henein, a VP analyst for management consulting firm Gartner. As LLMs feed off each other’s content, the quality gets worse and more vague, like a photocopy of a photocopy of an image.

Think about it this way: the first version of ChatGPT was the last model to be trained on entirely human-generated content. Every model since then contains training data that özgü AI-generated content which is difficult to verify, or even track. This becomes unreliable, or to put it bluntly, garbage, data. When this happens, “we lose quality and precision of the content, and we lose diversity,” said Henein who researches data protection and artificial intelligence. “Everything starts looking like the same thing.”

“Incestuous learning” is what Henein calls it. “LLMs are just one big family, they’re just consuming each other’s content and cross pollinating, and with every generation you have… increasingly more garbage to the point where the garbage overtakes the good content and things start to deteriorate from there.”

As more AI-generated content is pushed out to the web, and that content is generated by LLMs trained on AI-generated content, we’re looking at a future web that is entirely homogenous and totally unreliable. Also, just really boring.

Model collapse, web collapse

Most people already sense something is off.

https://twitter.com/itsandrewgao/status/1689634145717379074?s=46&t=CvVXD3mx2FQQ1P-fkwstYA” target=”_blank” title=”(opens in a new window)” rel=”noopener

Even Google searches now sometimes surface AI-generated likenesses of celebrities instead of things like press photos or movie stills. When you Google Israel Kamakawiwo’ole, the deceased musician known for his ukulele cover of “Somewhere Over the Rainbow,” the top result is an AI-generated prediction of how Kamakawiwo’ole would have looked if he were alive today.

Google Image searches of Keira Knightley result in warped renderings uploaded by users on OpenArt, Playground AI, and Dopamine Girl alongside real photos of the actress

Keira doesn’t deserve this.
Credit: Mashable

That’s not to mention the recent pornographic deepfakes of Taylor Swift, an Instagram ad using Tom Hanks’s likeness to sell a dental plan, a photo editing app using Scarlett Johansson’s face and voice without her consent, and that fire song by Drake and The Weeknd that was actually an unauthorized audio deepfake that sounded exactly like them.

If our search engine results already can’t be trusted, and the models are almost certainly feasting on this junk, we have stepped over the threshold into the web’s AI garbage era. For the moment, the web as we once knew it is still somewhat recognizable, but the warnings are no longer abstract.

The web isn’t completely doomed

Assuming products like ChatGPT don’t pull off a hail-Mary and start reliably generating vibrant, exciting content that humans actually find pleasurable or useful to consume, what happens next?

Expect communities and organizations to fight back by protecting their content from the AI models trying to hoover it up. The open, ad-supported, search-based web might be going away, but the web will evolve. Expect more reputable media sites to put their content behind paywalls, and trusted information coming from subscriber newsletters.

Expect to see more copyright and licensing battles, like The New York Times’ lawsuit against Microsoft and OpenAI. Expect to see more tools like Nightshade, an invisible tool that protects copyrighted images by attempting to corrupt models trained on them. Expect the development of sophisticated new watermarking and verification tools that prevent AI-scraping.

On the flipside, you can also expect other news publications like Associated Press — and possibly CNN, Fox, and Time — to embrace generative AI and work out licensing agreements with companies like OpenAI.

As tools like ChatGPT and Google’s SGE become substitutes for traditional search, expect revenue models built on SEO to change.

The silver lining of model collapse, however, is the loss of demand. The proliferation of generative AI is currently dictated by hype, and if models trained on low-quality content are no longer useful, the demand dries up. What (hopefully) remains are us feeble-minded humans with the unquenchable urge to rant, overshare, inform, and otherwise express ourselves online.

Topics
Artificial Intelligence
ChatGPT