Войти в систему

Home
    - Создать дневник
    - Написать в дневник
       - Подробный режим

LJ.Rossia.org
    - Новости сайта
    - Общие настройки
    - Sitemap
    - Оплата
    - ljr-fif

Редактировать...
    - Настройки
    - Список друзей
    - Дневник
    - Картинки
    - Пароль
    - Вид дневника

Сообщества

Настроить S2

Помощь
    - Забыли пароль?
    - FAQ
    - Тех. поддержка



Пишет Slashdot ([info]syn_slashdot)
@ 2024-04-19 17:22:00


Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Meta's Not Telling Where It Got Its AI Training Data
An anonymous reader shares a report: Today Meta unleashed its ChatGPT competitor, Meta AI, across its apps and as a standalone. The company boasts that it is running on its latest, greatest AI model, Llama 3, which was trained on "data of the highest quality"! A dataset seven times larger than Llama2! And includes 4 times more code! What is that training data? There the company is less loquacious. Meta said the 15 trillion tokens on which its trained came from "publicly available sources." Which sources? Meta told The Verge that it didn't include Meta user data, but didn't give much more in the way of specifics. It did mention that it includes AI-generated data, or synthetic data: "we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3." There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it's liable to spit out a more concentrated version of any garbage it is ingesting.

Read more of this story at Slashdot.



(Читать комментарии) (Добавить комментарий)