Confrontation and convergence: Copyright in the face of artificial intelligence

07/03/2024

5 points to remember

By contributing to the democratization of the creative process, AI is challenging the boundaries of copyright.

To protect copyright, legislators around the world are enacting new rules to govern artificial intelligence systems.

The ability of artificial intelligence systems to create artificially-generated works calls into question the very notion of authorship.

Trained on data, which are sometimes protected by copyright, language models contribute to blurring the boundaries between the notions of « fair use » (a concept inherited from Anglo-Saxon law), « transformative work » and infringement of rights.

In response to emerging disputes, solutions for protection, agreements and adaptations on ethical, technological and economic dimensions are considered.

Introduction

Generative AIs may help, but they disrupt copyright rules and concepts. They blur the line between innovation and infringement. Adapting copyright law to AI needs legislative reforms. It also requires ethical, economic, and technological strategies. These will address emerging challenges and disputes.

The voice of texts

An overview of the legislative contexts in the United States, Europe and China.

Protecting human creativity: the American approach

In the United States, copyright law relies on the Digital Millennium Copyright Act of 1998. It also uses the Compendium of the U.S. Copyright Office. In March 2023, the Office published AI-specific guidelines. These guidelines, titled « Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, » emphasize human authorship for copyright protection. They propose a case-by-case analysis to assess whether works are human-created or computer-assisted. With President Biden’s AI executive order, new copyright guidance is expected, particularly on infringement of works used in AI training. Ongoing lawsuits involving companies such as Google, Microsoft and OpenAI will also influence future AI-related copyright legislation.

The principle of transparency: a fundamental concept for European legislators

In the European Union (EU), member states have their own copyright laws, but European legislation is based on one major directive (directive 2019/790). As of December 2023, the AI Act specifies that providers of general-purpose AI models must comply with it. This directive authorizes, under certain conditions, the reproduction and extraction of works for text and data mining (« scraping »). However, rightsholders may opt for exclusion of their works (except for research purposes). When scraping or data mining, suppliers are therefore obliged to respect explicit rights reservations.

In addition, the AI Act provides for the maintenance of technical documentation and the publication of a summary of training data. Thus, the generated content (« output ») of large generative AI models must be correctly identified as being generated by AI. It should also be remembered that open source GPAI providers will not be exempt from the copyright documentation (however SMEs and start-ups are subject to simplified obligations, proportional to their size).

Fair use of content: the Chinese point of view

The amended Copyright Law of the People’s Republic of China (2020) imposes stricter criteria for fair use and licensing. The Interim Measures on AI (2023) emphasizes respect for intellectual property rights in the development of AI, and protects data sources within the framework of intellectual property. However, the text does not contain more specific details.

AI-author, AI-plagiarizer: copyright issues and controversies in the age of AI

The popularity of AI-powered content generation is accompanied by the emergence of complex legal issues that remain unresolved. While these systems can generate autonomously content, how does copyright apply to these creations? Who owns them? In the higher echelons of Generative AI, do Chat GPT, DALL-E, Midjourney and others have the right to use existing works as training data?

The emergence of the AI-author question

AI, as an « author », raises the question of how much human contribution is necessary to claim copyright, and forces legislators to consider the « quantity and quality of human intervention » in the creative process.

In 2023, this debate was illustrated in civil society when a German photographer, Boris Eldagsen, won the Sony World Photography Awards in the « Creation » category. In fact, the photo he presented did not exist; it was entirely generated by AI, and he was not its creator. Boris Eldagsen’s intention was to call the competition’s organizers to task. In his view, AI-generated visuals cannot compete in the same categories as human-generated ones. Photographer declined the prize but made his mark.

The judiciary, for its part, is beginning to take a stand. The British Supreme Court refused to grant a patent on the grounds that the inventor had to be a human being or a company, not a machine. Similarly, the U.S. Copyright Office rejected an application for copyright protection for works of art created with the help of artificial intelligence, even though the creation had won a prize in an art competition.

A work created by AI could not therefore be protected by copyright.

Using existing works to train AI algorithms

To generate content, AI systems « feed » on a wealth of data, drawn from Internet databases. Chatbots run by robots indexing newspaper articles normally reserved for subscribers , « plagiarizing » of newspaper editorial content , training models on the basis of literary works protected by copyright… AI providers are being sued on the ground of these methods used to train their language models. Claims for copyright infringement are multiplying. Two recent cases bring this point in focus.

In early January 2024, Nicholas Basbanes and Nicholas Ngagoyeanes filed a copyright infringement suit against Microsoft Corp. and OpenAI . They accuse both companies of copying their work without compensation or authorization to « build a massive commercial enterprise ». The two journalists claim that the produced LLMs are insufficiently transformative works, and are therefore seeking damages for copyright infringement, loss of opportunity to license their work, and market destruction caused by the defendants. They are also seeking a permanent injunction to prevent the recurrence of similar harms.

At the end of December 2023, the New York Times accused Microsoft and OpenAI of copyright infringement. The NYT claimed that the use of its articles to train generative AI models involved unfair competition and trademark dilution. The newspaper disputed the defendants’ argument that this use is « fair use » for transformative purposes. It argued that this practice amounts to plagiarism of its content. The NYT contended that this undermines its quality and reputation, and should be compensated. At this stage of the proceedings, negotiation efforts have been unsuccessful.

These high-profile cases are based on common issues . They demonstrate that copyright is becoming increasingly blurred and difficult to protect. In particular, AI is blurring the notions of « fair use » and « transformative work ».

In both cases, the plaintiffs refuse to consider LLMs as transformative works. These models are partly formed from protected works, without authorization, and are capable of retranscribing entire passages, a characteristic that would prevent them from being considered as transformative works. AI also complicates the notion of « fair use » (legal use of a copyrighted work). The profits generated by AI systems suggest that training models with protected works does not pursue a research goal, one that falls within the scope of « fair use ».

Faced with discontent, best practices and agreements: intermediate solutions

Consent, remuneration, supervision of AI use. Within the entertainment and media industry, many voices are raised to find a balance between the development of this technology and the protection of creators.

Ethics at the service of reasoned and reasonable use of AI systems

To resolve or mitigate potential conflicts, proposals are emerging, notably on the issue of attributing authorship of a work to an artificial intelligence. In France, Librinova, a self-publishing house, has chosen to adhere to the « Création humaine » label. This label certifies that a written, audiovisual or musical work has been indeed created by a human being.

Within the media, initiatives are also underway. In France, the Comité directeur sur les médias et la société de l’information (CDMSI) has adopted guidelines on the responsible use of artificial intelligence (AI) systems in journalism. Acknowledging the benefits of artificial intelligence systems, these guidelines reiterate the fundamental principles governing their implementation within a media organization, including installing proper risk assessment and human supervision controls. This document also affects external suppliers, obliging them to take responsibility for the AI solutions they propose.

Technological countermeasures against copyright infringement

To guard against fraudulent use of their content, authors can affix a label, meaning a watermark, identifying whether an image is genuine. In addition to enabling content to be traced, this watermark also acts as a deterrent. In the United States, this practice is protected by the Digital Millennium Copyright Act (DNCA), prohibiting anyone from removing it. It is up to the author to attest to the presence of this label on his works. In the same spirit, a tool called Nightshade modifies images imperceptibly for the viewer. Learning models, on the other hand, perceive this transformation, forcing them to categorize visuals incorrectly.

In copyright infringement cases, the key issue is the data used to train models. Developers are encouraged to cite the works they use. They should also track the metadata and tags of source data. Authors can counter infringement by opting out or blocking data collection by indexing robots. CNN, Reuters, New York Times, France Médias Monde, Radio France, and many other media organizations around the world have opted for this approach.

In this way, they hope to encourage the adoption of an agreement similar to the one reached with platforms such as Google or Facebook for the remuneration of neighboring rights. Some platforms are taking the opposite approach and self-regulating. Stable Diffusion, for example, has agreed to exclude from its training models the works of Greg Rutkowski, a creator of digital works, whose style appeared in a very large number of prompts (over 400,000 requests). The only downside is, faced with the increased popularity of the artist’s works, a developer created a new model, trained on his works and thus able to imitate them, against his will…

Partnerships and compensation: economic solutions to copyright maintenance

To balance copyright respect with AI development, major sector players are seeking agreements. Artists, especially in the USA, support this. The Authors Guild is campaigning for collective licenses to address these concerns.

Artists’ compensation programs are gradually being set up. For example, Adobe compensates contributors whose content was used in the training of Firefly, its generative AI model trained on images from Adobe Stock. Similarly, Shutterstock has set up a contribution fund and signed a six-year contract with OpenAI for the use of its images.

In the same spirit, the German press group Axel Springer has signed an agreement with OpenAI. For a fee, the company is authorized to use articles published on the group’s news sites to drive Chat GPT. Effective in the first quarter of 2024, Axel Springer sees this agreement as a new model for financing, generating traffic and monetizing its content.

Conclusion

Artificial intelligence systems have reshaped the perception of copyright. From now on, creators will have to watch over their content, as the emergence of generative AI forces them to take an active role in protecting their rights. This context also calls for collective, global thinking, a point that puts legislators to the test.

By 2030, « the global market for AI in media and entertainment is expected to reach $99.48 billion », a measure of the challenge which it will pose.

Share the Post: