A new bill wants to reveal what’s really inside AI training data

A new bill wants to reveal what’s really inside AI training data

Illustration: Cath Virginia / The Verge | Photos: Getty Images

A new bill would compel tech companies to disclose any copyrighted materials that are used to train their AI models.

The Generative AI Copyright Disclosure bill from Rep. Adam Schiff (D-CA) would require anyone making a training dataset for AI to submit reports on its contents to the Copyrights Register. The reports should include a detailed summary of the copyrighted material in the dataset and the URL for the dataset if it’s publicly available. This requirement will be extended to any changes made to the dataset.

Companies must submit a report “not later than 30 days” before the AI model that used the training dataset is released to the public. The bill will not be retroactive to existing AI platforms unless changes are made to their…

Continue reading…

Leave a Reply

Your email address will not be published. Required fields are marked *