
Perplexity defensive over ignoring robots.txt and stealing data
Perplexity was discovered to be actively bypassing blocks from websites to scrape content in 2024, and a new report shows that it has continued with increasing sophistication as the company defends the practice.
Perplexity's logo surrounded by lights and flowers. Image source: Perplexity
Apple received some significant blowback when it was discovered that Applebot had been crawling the web for years to get data to train Apple Intelligence. Websites immediately blocked the bot, and others, which sparked some interesting discoveries about how AI companies are operating.
A year on, and at least one company is still doing everything in its power to ignore robots.txt and scrape webpages anyway — Perplexity. According to a report from Cloudflare, Perplexity is using several techniques to undermine the trust expected on the web and access data to train its large language models.Continue Reading on AppleInsider | Discuss on our Forums