Understanding California’s AB-2013: Generative AI and Data Transparency
In a major legislative move, California enacted Assembly Bill 2013 (AB-2013), signed into law on September 28, 2024. The bill targets transparency in the development and use of generative artificial intelligence (AI) systems by requiring developers to disclose detailed information about the datasets used to train their AI models. This new law, effective from January 1, 2026, applies to all generative AI systems or substantial modifications released on or after January 1, 2022.
Key Provisions of AB-2013
AB-2013 mandates that any developer of generative AI systems—whether an individual, partnership, corporation, or government agency—must post public documentation regarding the training data for these systems. Below are the core components of the bill:
Transparency of Dataset Sources: Developers must provide a high-level summary of the datasets used in training, including details such as the source or owner of the datasets (Section 3111(a)(1)). This ensures that users and stakeholders understand the origins of the data that underpin the AI system.
Purpose of the Data: The law requires developers to explain how the data used furthers the AI system's intended purpose (Section 3111(a)(2)). This provides clarity on the role of specific datasets in enhancing the AI's functionality.
Data Characteristics: Developers must describe the types of data points used, including the characteristics of labeled and unlabeled data. This is particularly important for understanding the nature of the training data (Section 3111(a)(4)).
Intellectual Property Considerations: The bill mandates that developers disclose whether the datasets contain data protected by copyright, trademark, or patent law or if the datasets are in the public domain (Section 3111(a)(5)). This protects intellectual property rights and provides transparency about the legal status of the training data.
Personal and Aggregate Data: Developers must clarify whether the datasets include personal information or aggregate consumer information, as defined under California's Consumer Privacy Act (Section 3111(a)(7), (8)). This measure aims to safeguard privacy and ensure compliance with existing data protection laws.
Modifications to Data: If the datasets were cleaned, processed, or modified, developers must disclose the nature of these changes and their intended purpose in relation to the AI system (Section 3111(a)(9)). This is crucial for transparency in how data is prepared for training AI models.
Synthetic Data Use: The bill also covers the use of synthetic data, requiring developers to disclose if such data was used to train the system and explain its functional purpose (Section 3111(a)(12)).
Applicability and Exemptions
The bill applies to generative AI systems made publicly available in California or substantially modified after January 1, 2022. However, there are certain exemptions:
AI systems solely used for ensuring security and integrity are exempt (Section 3111(b)(1)).
Systems developed for national security, military, or defense purposes, as well as those used in aircraft operations, are also exempt (Section 3111(b)(2), (3)).
Conclusion
California’s AB-2013 marks a significant move toward greater accountability in AI development, particularly generative AI. By enforcing transparency around training datasets, the law aims to protect intellectual property and personal data, promoting responsible AI use. However, the bill also introduces challenges for developers, such as increased compliance costs, potential privacy concerns, and risks to innovation.
Overall, AB-2013 sets a new standard for AI regulation, but developers will need to carefully consider how to balance transparency with maintaining their competitive edge and complying with complex privacy regulations.