EU Releases Template for AI Training Data Transparency: What It Means for Providers of General-Purpose AI Models
On 24 July 2025, the European Commission’s AI Office released the long-awaited template for the public summary of training content used in general-purpose AI (GPAI) models, fulfilling a key requirement under Article 53(1)(d) of the EU Artificial Intelligence Act (AI Act). This move marks a significant step toward greater transparency in AI development and sets out clear expectations for providers operating within the EU market.
A new era of AI transparency
The AI Act, which entered into force on 1 August 2024, imposes harmonised rules on AI systems, with specific obligations for GPAI providers. From 2 August 2025, all providers of GPAI models (whether proprietary or open-source) must publish a “sufficiently detailed” summary of the data used to train their models, using the official template provided by the AI Office.
This requirement aims to increase transparency, particularly around the use of copyright-protected content, and to empower stakeholders such as rightsholders, researchers, and consumers to better understand how AI systems are built and what data they rely on. As covered in our previous article here, this may have implications for the United Kingdom and may tip matters into litigation between copyright owners and AI developers.
Summary of the template and required information
The newly released template is structured into three main sections, each designed to capture different aspects of the training data:
- general information:
- identification of the provider and model
- modalities used (e.g., text, image, audio, video)
- estimated size of training data per modality
- linguistic and demographic characteristics of the data
- list of data sources:
- disclosure of publicly available datasets, including large datasets and their sources
- information on commercially licensed datasets and private datasets obtained from third parties
- details on data scraped from online sources, including crawler behaviour, types of websites scraped, and a summary of the most relevant domain names
- use of user data collected through provider services and interactions
- inclusion of synthetic data generated by other AI models
- any other sources, such as offline or self-digitised content
- relevant data processing aspects:
- measures taken to respect text and data mining (TDM) opt-outs
- steps to remove illegal content, such as child abuse material or unlicensed copyrighted works
- optional disclosures on additional data processing practices
Implications for AI providers
The release of the template has significant implications for GPAI providers:
- compliance burden: Providers must now prepare summaries that cover all stages of model training, from pre-training to fine-tuning. This includes gathering and categorising data across multiple modalities and sources
- legal risk: Non-compliance can result in fines of up to €15 million or 3% of global turnover, enforced by the AI Office starting August 2026. Providers must also ensure lawful data collection under other EU laws, such as the Copyright Directive and GDPR
- operational adjustments: Providers may need to implement new data governance frameworks, update documentation, and establish mechanisms for responding to rightsholder inquiries about specific domain usage
- market transparency: The summaries will be publicly available on provider websites and distribution platforms, potentially influencing user trust, competitive positioning, and downstream integration decisions
Mixed reactions from stakeholders
The release of the template has sparked a range of reactions across the AI and creative industries.
Burak Özgen, deputy general manager of GESAC (the authors and composers' lobby) told euroactiv.com the template “falls short of safeguarding the creative sector” and risks favouring large tech companies over European creators. He criticised the lack of granular detail, arguing that the summary format does little to help authors enforce their rights.
Other commentators have echoed this concern, noting that while the template is a step forward in transparency, it does not resolve the deeper issue of licensing reform. Creators remain frustrated by the opacity surrounding whether their works were used in training and how they can opt out or seek compensation.
On the other hand, the Commission has defended the template as a balanced approach, designed to be simple, uniform, and effective while respecting trade secrets. Executive Vice-President Henna Virkkunen stated that the template is key to building trustworthy and transparent AI, and supports the broader goals of the AI Act.
Looking ahead
With the AI Act’s GPAI rules set to apply from 2 August 2025, and enforcement beginning in August 2026, providers have a limited window to prepare. For models already on the market, the deadline to publish summaries is 2 August 2027.
The Commission has indicated that it may review and update the template based on practical experience and technological developments. In the meantime, providers, rightsholders, and civil society will continue to navigate the evolving landscape of AI governance, transparency, and copyright enforcement.
Disclaimer
This information is for general information purposes only and does not constitute legal advice. It is recommended that specific professional advice is sought before acting on any of the information given. Please contact us for specific advice on your circumstances. © Shoosmiths LLP 2025.