AI copyright and licensing basics: what creators and publishers should know
All you need to know about the issue and how to avoid falling into common potholes
Generative artificial intelligence (AI) is already reshaping how text, images, audio and video are made and distributed. In UK copyright terms, some things are clear, including that most training and output questions turn on existing law, contracts and evidence. Much remains uncertain, including how courts will treat large-scale training on web content, and what counts as a “substantial” taking in machine-made outputs. For publishers and creators, the immediate task is risk reduction: permissions, record-keeping, and clear licensing choices.
What is settled, and what is still uncertain
The settled part is mundane but important. The UK’s copyright framework still applies. Copyright subsists in qualifying original works, and infringement still turns on acts such as copying a substantial part without permission, unless an exception applies. Contracts and licences still set the rules where parties have agreed them. Evidence still matters. If you cannot show what you used, what rights you had, and what your systems did, you are exposed.
The uncertain part sits in the gaps between old doctrines and new industrial practices. Training a model can involve copying at scale, sometimes transiently, sometimes persistently, sometimes with works stored, transformed and re-used. Rights holders argue that this is unauthorised copying and that commercial AI training should be licensed. AI developers argue that training is transformative, that any copies are technical steps, and that existing exceptions may apply in some circumstances. UK policy has also moved around this question in recent years, with repeated attempts to find a workable settlement between creators and AI developers.
Outputs add a second layer of uncertainty. The UK has a specific regime for computer-generated works, but it was drafted for older forms of computation. Whether modern generative systems fit cleanly within that concept, and what level of human creative control is required for a user to claim authorship, remains contested. Even where an output is not itself copyright-protected, it may still infringe if it reproduces a substantial part of someone else’s work.
This makes “AI copyright” less like a single legal question and more like a chain of practical questions. What content did you ingest. On what terms. Can you prove it. What do your contracts say. What did the model output. How close is it to any protected work. What steps did you take to prevent copying, and to respect permissions.
Training data, at a high level
Training data controversies are often described as a fight over scraping. That can miss the point. Scraping is a method. The underlying issue is the reproduction of protected works and databases, and whether the acts involved in training are licensed, exempt, or infringing.
For UK-based publishers, two legal levers come up repeatedly in discussions.
First, copyright and database rights. A publisher may hold copyright in articles and images, and database rights in curated collections. Large-scale copying for training can engage both. Even when content is publicly accessible, it is not necessarily free to copy for any purpose.
Second, text and data mining (TDM) exceptions. UK law contains an exception that permits text and data mining for non-commercial research, subject to conditions. That exception is not a general permission for commercial training, and it does not override contract terms in the same way in every scenario. For most businesses building or deploying generative AI, assuming that an exception provides a blanket shield is risky.
The practical consequence is that training risk is often addressed commercially rather than doctrinally. Developers seek licences, use curated datasets, exclude certain sources, and build indemnities into enterprise contracts. Rights holders seek opt-outs, licensing schemes, collective deals, or technological measures that make unauthorised ingestion harder.
Output ownership, and why it is messy
Creators often ask a direct question: who owns the output.
In UK practice, there are three distinct questions hiding inside that one.
Is the output protected by copyright at all. Copyright requires originality. If an output is generated with minimal human creative input, it may not qualify as an author’s work in the usual sense, although the UK’s computer-generated works provisions can complicate this analysis. The direction of travel is not clear-cut, and businesses should treat “we own it” claims with caution unless they can evidence meaningful human authorship.
If it is protected, who is the author. For human-created works, authorship usually sits with the person who created the work, subject to employment rules and contracts. For AI-assisted work, the question becomes whether the human made the creative choices that determine the final form. That is fact specific, and it varies by medium. Editing a paragraph is different from generating a full illustration and making trivial changes.
Even if you “own” the output, does it infringe. Ownership and infringement are separate. A publisher can hold copyright in a new work and still infringe if it reproduces a substantial part of an existing work. This is why similarity checking, provenance, and audit logs matter.
Licensing options creators and publishers are actually using
Because the law is uncertain at the margins, licensing has become the practical battleground.
For publishers and stock libraries, common approaches include:
- Direct licensing deals with AI developers, often specifying permitted uses, attribution or labelling expectations, security requirements, and compensation models.
- Platform-specific terms that govern whether content can be used for model training, and on what basis.
- Technical restrictions and metadata signals intended to discourage ingestion, although their enforceability and effectiveness vary.
- Collective or consortium approaches, where rights holders negotiate together or pursue common standards.
For small creators, licensing is often less formal, but the same principles apply. Decide what you are willing to license, on what terms, and what you are not willing to license. Put it in writing. Keep records. Avoid vague permissions that can be stretched later.
On the buyer side, businesses deploying generative AI commonly look for:
- Clear statements about training sources and exclusions.
- Warranties and indemnities, with realistic caps and carve-outs.
- Controls that reduce output similarity risk, such as filters and dataset curation.
- Commitments around data handling, retention, and customer prompts.
Licensing is not a panacea. It can reduce exposure, but it does not remove the need for internal governance. A licence may cover training, but not outputs. It may cover one model, not downstream fine-tuning. It may exclude high-risk categories. It may impose audit requirements that your organisation is not ready to meet.
Practical steps for risk reduction
This is not legal advice. It is operational hygiene, and it is usually where disputes are won or lost.
1) Map your content and rights position
Publishers should know what they own, what they license in, and what they cannot sub-license. Many organisations carry legacy photo agreements, syndicated columns, agency copy, and contributor terms that were never drafted with AI in mind. That is now a problem.
2) Separate “AI for internal use” from “AI for publication”
Internal drafting assistance carries different risks from published output. Treat publication as a higher bar, with stronger review, provenance checks and documented approvals.
3) Put permissions in writing
If you commission freelancers, photographers or illustrators, update contracts. If you buy syndicated content, check whether you have any rights to use it in AI workflows, including summarisation, translation, reformatting, and training.
4) Create a permissions register
A simple spreadsheet can work if it is maintained. Record the work, the rightsholder, the licence terms, the permitted AI uses, the expiry, and any restrictions on training or derivative use. Link to the underlying agreement.
5) Use model and tool controls
Enterprise tools often allow you to disable training on your prompts, restrict retention, and set access controls. Turn those settings on deliberately, and document what you chose and why.
6) Treat prompts and uploads as potentially disclosive
If staff paste unpublished copy, embargoed financials, source material, or personal data into a third-party tool, you may have a confidentiality and privacy problem alongside copyright.
7) Build an output review workflow
For text, look for distinctive phrasing, unusual metaphors, and repeated structures that can signal close copying. For images, watch for recognisable characters, logos, and stock-photo artefacts. For audio, be cautious with voice likeness.
8) Keep audit trails
If you publish AI-assisted work, keep records of prompts, system settings, model versions, retrieved sources, and the human edits made. If challenged, evidence is your friend.
Do and do not guidance
Do
- Do assume UK copyright law still applies, and that “publicly available” does not mean “free to use for training”.
- Do update contributor and commissioning contracts to address AI use explicitly.
- Do maintain a permissions register and an audit trail for published work.
- Do use reputable tools with clear data handling terms, and configure privacy and retention settings consciously.
- Do treat RAG-style systems as safer for factual work than free-form generation, because they can be limited to your licensed materials, but still test for leakage and misattribution.
- Do create an escalation route for staff when an output looks too close to a known work.
Do not
- Do not assume that an AI tool’s terms give you the rights you need for publication, especially for commercial re-use and sublicensing.
- Do not publish high-stakes or high-profile content generated with minimal human editorial control.
- Do not feed third-party copyrighted material into tools unless you have permission or a clear, documented lawful basis.
- Do not treat similarity checks as perfect. Use them as signals, not as absolution.
- Do not rely on vague statements like “the model is trained on the internet” as a compliance story.
Record-keeping and permissions, the unglamorous centre of the story
Most copyright disputes are less about philosophy and more about paperwork. If you are challenged, the questions will be practical.
What material did you use. Who owned it. What rights did you have. What did the contract say. What did staff upload. What tool processed it. Did the provider store it. Did it train on it. What settings did you choose. What steps did you take to prevent close copying. Who signed off publication.
A robust record-keeping approach usually includes:
- A single source of truth for contributor agreements and licences.
- A content provenance tag in the editorial system, marking whether a piece used AI assistance and what sources it relied on.
- A prompt and output log for higher-risk categories, stored securely with access controls.
- A retention policy that balances accountability with privacy and security.
This is also where permissions culture matters. Publishers that treat licensing as a back-office chore are exposed. Publishers that treat it as product infrastructure tend to be better placed to negotiate with AI developers and to defend their own practices.
How businesses can reduce risk without stopping innovation
For publishers, the strategic choice is not “AI or no AI”. It is “controlled AI or uncontrolled AI”.
Controlled AI usually means:
- Clear rules on what staff can upload into third-party tools.
- Approved tools with enterprise terms, not consumer accounts.
- Model governance, including change control when providers update models.
- A licensing strategy, either to monetise your content for training, or to enforce exclusions.
- A public transparency posture that does not overclaim what you can guarantee.
For small creators, the most realistic approach is to focus on what you can control:
- Publish clear licensing terms for your work.
- Keep originals and drafts, which help show provenance.
- Document commissions and permissions.
- Be cautious about distributing raw source files widely.
- Decide whether you want your work used for training, and signal that preference consistently.
The direction of travel
The UK is still working through the tension between two economic stories. One story says generative AI needs broad access to data to thrive. The other says creators and publishers need enforceable rights and workable compensation to keep producing the material AI systems feed on. Courts, regulators, industry negotiations and government policy will shape where the balance lands.
In the meantime, creators and publishers should treat AI copyright and licensing as an operational risk, not a culture war. Licences, contracts, permissions registers and audit trails are not glamorous. They are the tools that let you use generative AI while retaining control over your work and your liabilities.
Sources and verification
Web browsing is disabled in this environment, so I cannot compile and cite 12 authoritative UK sources with links and quotations. If you enable browsing or paste a shortlist of sources you trust, I will produce a UK-focused version with at least 12 citations, prioritising UK government publications, regulators, court materials, and leading academic or law-firm analysis, using careful, non-advisory language.