Войти в систему

Home
    - Создать дневник
    - Написать в дневник
       - Подробный режим

LJ.Rossia.org
    - Новости сайта
    - Общие настройки
    - Sitemap
    - Оплата
    - ljr-fif

Редактировать...
    - Настройки
    - Список друзей
    - Дневник
    - Картинки
    - Пароль
    - Вид дневника

Сообщества

Настроить S2

Помощь
    - Забыли пароль?
    - FAQ
    - Тех. поддержка



Пишет Slashdot ([info]syn_slashdot)
@ 2025-11-15 21:22:00


Previous Entry  Add to memories!  Tell a Friend!  Next Entry
While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data
From the personal blog of interface expert Bruce Ediger: Early in March 2025, I noticed that a web crawler with a user agent string of meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) was hitting my blog's machine at an unreasonable rate. I followed the URL and discovered this is what Meta uses to gather premium, human-generated content to train its LLMs. I found the rate of requests to be annoying. I already have a PHP program that creates the illusion of an infinite website. I decided to answer any HTTP request that had "meta-externalagent" in its user agent string with the contents of a bork.php generated file... This worked brilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and 31, 2025... After about 3 months, I got scared that Meta's insatiable consumption of Super Great Pages about condiments, underwear and circa 2010 C-List celebs would start costing me money. So I switched to giving "meta-externalagent" a 404 status code. I decided to see how long it would take one of the highest valued companies in the world to decide to go away. The answer is 5 months.

Read more of this story at Slashdot.



(Читать комментарии) (Добавить комментарий)