Confrontation and convergence: Copyright in the face of artificial intelligence

5 points to remember

  • By contributing to the democratization of the creative process, AI is challenging the boundaries of copyright.
  • To protect copyright, legislators around the world are enacting new rules to govern artificial intelligence systems.
  • The ability of artificial intelligence systems to create artificially-generated works calls into question the very notion of authorship.
  • Trained on data, which are sometimes protected by copyright, language models contribute to blurring the boundaries between the notions of “fair use” (a concept inherited from Anglo-Saxon law), “transformative work” and infringement of rights.
  • In response to emerging disputes, solutions for protection, agreements and adaptations on ethical, technological and economic dimensions are considered.

Introduction

Generative AIs may help, but they also upset the rules and concepts of copyright, blurring the boundary between innovation and infringement. Adapting copyright law to the age of artificial intelligence requires not only legislative reforms, but also ethical, economic and technological strategies to meet emerging challenges and disputes.

1.     The voice of texts

An overview of the legislative contexts in the United States, Europe and China.

Protecting human creativity: the American approach

In the United States, copyright law is based on the Digital Millennium Copyright Act of 1998 and the Compendium of the U.S. Copyright Office. In March 2023, said Office published AI-specific guidelines (Copyright Registration Guidance : Works Containing Material Generated by Artificial Intelligence), stressing the importance of human authorship in order to claim copyright protection, and envisaging a case-by-case analysis to determine the human or computer-assisted nature of works. With President Biden’s AI executive order, new copyright guidance is expected, particularly on infringement of works used in AI training. Ongoing lawsuits involving companies such as Google, Microsoft and OpenAI will also influence future AI-related copyright legislation.

The principle of transparency: a fundamental concept for European legislators

In the European Union (EU), member states have their own copyright laws, but European legislation is based on one major directive (directive 2019/790). As of December 2023, the AI Act specifies that providers of general-purpose AI models must comply with it. This directive authorizes, under certain conditions, the reproduction and extraction of works for text and data mining (“scraping”). However, rightsholders may opt for exclusion of their works (except for research purposes). When scraping or data mining, suppliers are therefore obliged to respect explicit rights reservations. In addition, the AI Act provides for the maintenance of technical documentation and the publication of a summary of training data. Thus, the generated content (“output”) of large generative AI models must be correctly identified as being generated by AI. It should also be remembered that open source GPAI providers will not be exempt from the copyright documentation (however SMEs and start-ups are subject to simplified obligations, proportional to their size). 

Fair use of content: the Chinese point of view

The amended Copyright Law of the People’s Republic of China (2020) imposes stricter criteria for fair use and licensing. The Interim Measures on AI (2023) emphasizes respect for intellectual property rights in the development of AI, and protects data sources within the framework of intellectual property. However, the text does not contain more specific details.

2.     AI-author, AI-plagiarizer: copyright issues and controversies in the age of AI

The popularity of AI-powered content generation is accompanied by the emergence of complex legal issues that remain unresolved. While these systems can generate autonomously content, how does copyright apply to these creations? Who owns them? In the higher echelons of Generative AI, do Chat GPT, DALL-E, Midjourney and others have the right to use existing works as training data?

The emergence of the AI-author question 

AI, as an “author”, raises the question of how much human contribution is necessary to claim copyright, and forces legislators to consider the “quantity and quality of human intervention” in the creative process.

In 2023, this debate was illustrated in civil society when a German photographer, Boris Eldagsen, won the Sony World Photography Awards in the “Creation” category. In fact, the photo he presented did not exist; it was entirely generated by AI, and he was not its creator. Boris Eldagsen’s intention was to call the competition’s organizers to task. In his view, AI-generated visuals cannot compete in the same categories as human-generated ones. The photographer declined the prize but made his mark.

The judiciary, for its part, is beginning to take a stand. The British Supreme Court refused to grant a patent on the grounds that the inventor had to be a human being or a company, not a machine. Similarly, the U.S. Copyright Office rejected an application for copyright protection for works of art created with the help of artificial intelligence, even though the creation had won a prize in an art competition. 

A work created by AI could not therefore be protected by copyright.

Using existing works to train AI algorithms

To generate content, AI systems “feed” on a wealth of data, drawn from Internet databases. Chatbots run by robots indexing newspaper articles normally reserved for subscribers , “plagiarizing” of newspaper editorial content , training models on the basis of literary works protected by copyrightAI providers are being sued on the ground of these methods used to train their language models. Claims for copyright infringement are multiplying. Two recent cases bring this point in focus.

In early January 2024, Nicholas Basbanes and Nicholas Ngagoyeanes filed a copyright infringement suit against Microsoft Corp. and OpenAI . They accuse both companies of copying their work without compensation or authorization to “build a massive commercial enterprise”. The two journalists claim that the produced LLMs are insufficiently transformative works, and are therefore seeking damages for copyright infringement, loss of opportunity to license their work, and market destruction caused by the defendants. They are also seeking a permanent injunction to prevent the recurrence of similar harms.

At the end of December 2023, the New York Times (NYT) was already accusing Microsoft and OpenAI of copyright infringement, unfair competition and trademark dilution due to the use of their articles to train the defendants’  generative AI models. The newspaper disputes the defendants’ argument that this use constitutes “fair use” for transformative purposes, claiming that this practice constitutes plagiarism of its content, not subject to compensation, and affects its quality and reputation. At this stage of the proceedings, negotiation efforts have been unsuccessful. 

These high-profile cases are based on common issues . They demonstrate that copyright is becoming increasingly blurred and difficult to protect. In particular, AI is blurring the notions of “fair use” and “transformative work”. 

In both cases, the plaintiffs refuse to consider LLMs as transformative works. These models are partly formed from protected works, without authorization, and are capable of retranscribing entire passages, a characteristic that would prevent them from being considered as transformative works. AI also complicates the notion of “fair use” (legal use of a copyrighted work). The profits generated by AI systems suggest that training models with protected works does not pursue a research goal, one that falls within the scope of “fair use”.

3.     Faced with discontent, best practices and agreements: intermediate solutions

Consent, remuneration, supervision of AI use. Within the entertainment and media industry, many voices are raised to find a balance between the development of this technology and the protection of creators.

Ethics at the service of reasoned and reasonable use of AI systems

To resolve or mitigate potential conflicts, proposals are emerging, notably on the issue of attributing authorship of a work to an artificial intelligence. In France, Librinova, a self-publishing house, has chosen to adhere to the “Création humaine” label. This label certifies that a written, audiovisual or musical work has been indeed created by a human being.

Within the media, initiatives are also underway. In France, the Comité directeur sur les médias et la société de l’information (CDMSI) has adopted guidelines on the responsible use of artificial intelligence (AI) systems in journalism. Acknowledging the benefits of artificial intelligence systems, these guidelines reiterate the fundamental principles governing their implementation within a media organization, including installing proper risk assessment and human supervision controls. This document also affects external suppliers, obliging them to take responsibility for the AI solutions they propose.

Technological countermeasures against copyright infringement

To guard against fraudulent use of their content, authors can affix a label, meaning a watermark, identifying whether an image is genuine. In addition to enabling content to be traced, this watermark also acts as a deterrent. In the United States, this practice is protected by the Digital Millennium Copyright Act (DNCA), prohibiting anyone from removing it. It is up to the author to attest to the presence of this label on his works. In the same spirit, a tool called Nightshade modifies images imperceptibly for the viewer. Learning models, on the other hand, perceive this transformation, forcing them to categorize visuals incorrectly.  

When it comes to copyright infringement, one of the cruxes of the problem lies in the data used to train the models. To demonstrate the absence of malicious intent, developers are therefore encouraged to cite the works used, track the metadata of the source data and tags. On the authors’ side, opt-out, or the blocking of data collection by indexing robots, is a practice used to counter copyright infringement. CNN, Reuters, New York Time, France Médias Monde, Radio France and many other media organizations around the world have opted for this approach. In this way, they hope to encourage the adoption of an agreement similar to the one reached with platforms such as Google or Facebook for the remuneration of neighboring rights. Some platforms are taking the opposite approach and self-regulating. Stable Diffusion, for example, has agreed to exclude from its training models the works of Greg Rutkowski, a creator of digital works, whose style appeared in a very large number of prompts (over 400,000 requests). The only downside is, faced with the increased popularity of the artist’s works, a developer created a new model, trained on his works and thus able to imitate them, against his will… 

Partnerships and compensation: economic solutions to copyright maintenance

To strike a balance between respect for copyright and the development of AI systems, the major players in the sector are moving towards agreements. This principle is supported by artists, particularly in the USA, where the Authors Guild is campaigning for the creation of collective licenses.

Artists’ compensation programs are gradually being set up. For example, Adobe compensates contributors whose content was used in the training of Firefly, its generative AI model trained on images from Adobe Stock. Similarly, Shutterstock has set up a contribution fund and signed a six-year contract with OpenAI for the use of its images. 

In the same spirit, the German press group Axel Springer has signed an agreement with OpenAI. For a fee, the company is authorized to use articles published on the group’s news sites to drive Chat GPT. Effective in the first quarter of 2024, Axel Springer sees this agreement as a new model for financing, generating traffic and monetizing its content. 

Conclusion

Artificial intelligence systems have reshaped the perception of copyright. From now on, creators will have to watch over their content, as the emergence of generative AI forces them to take an active role in protecting their rights. This context also calls for collective, global thinking, a point that puts legislators to the test. 

By 2030, “the global market for AI in media and entertainment is expected to reach $99.48 billion”, a measure of the challenge which it will pose.