Nonprofit Creative Commonswhich spearheaded the licensing movement that allows creators to share their works while retaining copyright, is now preparing for the AI era. On Wednesday, the organization announced the launch of a new project, CC signals, which will allow dataset holders to detail how their content can or cannot be reused by machines, as in the case of training AI models.
The idea is meant to create a balance between the open nature of the internet and the demand for ever more data to fuel AI.
As Creative Commons explains in a blog postthe continued data extraction underway could erode openness on the internet and could see entities walling off their sites or guarding them with paywalls, instead of sharing access to their data.
The CC signals project, on the other hand, aims to provide a legal and technical solution that would provide a framework for dataset sharing meant to be used between those who control the data and those who use it to train AI.
Demand is increasing for such a tool, as companies grapple with changing their policies and terms of service to either limit AI training on their data or explain to what extent they’ll use users’ data for purposes related to AI.
For instance, X initially made a change that allowed third parties to train their models on its public data, then later reversed that. Reddit is using its robots.txt file, which is meant to tell automated web crawlers whether they can access its site, to restrict bots from scraping its data for training AI. Cloudflare is looking toward a solution that would charge AI bots for scraping, as well as tools for confusing them. And open source developers have also built tools to slow down and waste the resources of AI crawlers that didn’t respect their “no crawl” directives.
The CC signals project instead proposes a different solution: a set of tools that offers a range of legal enforceability, but all of which have an ethical weight to them, similar to the CC licenses that today cover billions of openly licensed creative works online.
“CC signals are designed to sustain the commons in the age of AI,” said Anna Tumadóttir, Creative Commons CEO, in an announcement. “Just as the CC licenses helped build the open web, we believe CC signals will help shape an open AI ecosystem grounded in reciprocity.”
The project is only now beginning to take shape. Early designs have been published on the CC website and GitHub page. The organization is actively seeking public feedback ahead of its plans for an alpha launch (early test) in November 2025. It will also host a series of town halls for feedback and questions.