Data Poisoning: Artists and Creators Fight Back Against Big AI

Home
/
Blog
/
AI
Data Poisoning: Artists and Creators Fight Back Against Big AI
Artists Are Poisoning Their Artwork To Protect It From AI. Find Out How They Use This Adversarial ML Technique And Why. Read More.

Narayana pappu

Data Poisoning: Artists and Creators Fight Back Against Big AI

Introduction

In recent years, large-scale artificial intelligence (AI) models have revolutionised the creative industries, bringing powerful new capabilities to fields ranging from graphic design to music production.

However, this rapid advancement has raised significant concerns among artists and content creators about the unauthorised use of their works. As AI systems become more prevalent, the line between innovation and infringement becomes increasingly blurred.

Other than launching lawsuits to assert their ownership over the content, one emerging tactic some artists have adopted to assert control over their copyrighted material is data poisoning. The practice has sparked a complex ethical debate, pitting the need for innovation against the rights and concerns of individual creators.

This article explores the concept of data poisoning, the methods artists use to execute it, the reasons behind their actions and the broader implications for the AI community and copyright law.

What is Data Poisoning?

Data poisoning is an adversarial attack technique aimed at undermining the performance of AI models by deliberately introducing corrupt or misleading data into their training datasets.

This method targets the foundation of how AI systems learn, making it distinct from tactics seeking to exploit AI vulnerabilities post-training. Traditional data poisoning can severely compromise an AI’s learning accuracy—subtle alterations in training images, for example, can mislead an AI, drastically reducing its ability to generate outputs or recognise styles.

Artists and other creatives have begun to use this technique to protest against AI models’ consumption of their works of art. By contaminating digital versions of their works, these artists aim to degrade the performance of AI models, particularly in how these models interact with the artist’s original works. The goal is to directly impact the AI’s operational efficacy, asserting control over how an artist's content is used without their consent.

Unlike direct hacking attacks or software bugs, data poisoning is usually more insidious and can be harder to detect because it involves the manipulation of the input data itself. It blurs the lines between legitimate data and interference, creating a unique challenge for AI developers who must discern and rectify these alterations to maintain system integrity.

In this instance, it’s not insidious - it’s a direct protest against technology companies’ assumptions that they can use someone’s work without credit, compensation or consent.

How Artists Poison the Dataset

Currently, the tools available are specific to artists, photographers and designers - those who create images. These methods subtly alter existing artwork by injecting code into the image (visible to machines but not to the naked eye) to disrupt the models' ability to process the image and make it difficult for them to learn an artist's authentic style. Some of the ways artists can achieve this are:

Manipulation

Subtle alterations to original artwork can significantly interfere with an AI model's ability to process it correctly. This can involve:

Colour Changes: Inverting colours, introducing jarring palette shifts, or applying filters.
Watermarks: Adding intrusive text or visual marks, often including copyright statements or warnings about using the image in AI training.
Minor Distortions: Applying effects like warping, blurring, noise, or introducing artefacts. Tools like Nightshade, developed by the University of Chicago, help automate these subtle but disruptive changes.

Mislabeling

An article from Scientific American explains that Nightshade also functions as a “...cloaking tool [that] turns potential training images into “poison” that teaches AI to incorrectly associate fundamental ideas and images.”

Text-to-image systems are essentially giant maps that semantically associate groups of words and images and mislabelling can disrupt the AI's understanding and categorisation of visual art:

Incorrect Descriptions: Attaching inaccurate or nonsensical text descriptions to images (e.g., labelling a portrait as "landscape").
Contradictory Metadata: Intentionally including conflicting information in embedded image metadata.

Specialised Techniques

Artists and researchers are developing advanced data poisoning tactics as awareness of AI vulnerabilities grows. These include:

Attacking Model Architectures: Techniques like Nightshade exploit specific weaknesses in image-generation models, particularly text-to-image systems. Nightshade introduces carefully designed noise into the data, disrupting the relationship between the text prompt and the resulting image output.
Targeting Data Preprocessing: Understanding how image data is prepared before being fed to the AI model allows artists to craft their poisoned data accordingly.
Style Masking Tools: Tools like Glaze help artists camouflage their artistic style, preventing AI models from learning and replicating their unique aesthetic signature.

Why Artists Are Poisoning Datasets

Artists engage in data poisoning primarily to address legal, financial and ethical concerns arising from their works being scraped from the internet and incorporated into AI training datasets.

Copyright Infringement Concerns

A significant driver for data poisoning is the widespread practice of AI developers scraping vast amounts of content, including copyrighted material, from various online platforms. This material is often used to train algorithms that power commercial AI systems, which generate revenue for tech companies. Artists argue that their work is being commercialised without permission, violating copyright laws designed to protect creators’ intellectual property. There are several ongoing class action lawsuits focused on this - some of which can be found here.

Loss of Control and Income

Artists fear that the replication capabilities of AI could lead to a saturation of the market with derivative works. If AI systems can easily mimic an artist’s style, the unique value of their creations could diminish, leading to decreased demand and lower prices for the original works. This impacts their ability to generate an income and also reduces opportunities for future commissions and collaborations, as AI-generated artworks could be seen as cheaper alternatives. In an article on Creativebloq.com, artist Kelly McKernan said, "I saw my income drop significantly in the last year. I can't say for certain how much of that is due to AI, but I feel like at least some of it is.”

Philosophical Opposition

Beyond legal and financial issues, many artists hold deep-seated ethical objections to the use of their work by AI systems. They argue that art is an expression of human experience and should not be reduced to data used to train algorithms. This philosophical stance is rooted in a belief in the sanctity of human creativity, a point echoed in this article written by Steve Dennis who says, “Mostly they [artists] want to safeguard what they see as important aspects of human creativity, and be reassured that these will not be undermined or made redundant by potential AI technological storms."

The Ethics of Web Scraping for Training Data

Web scraping for AI training is entangled in complex ethical dilemmas. These concerns span from the legality and fairness of data use to the broader societal implications of how AI models that use this data are deployed. Here’s an expanded look at the major ethical challenges:

Fair Use Debate

The debate around fair use involves determining whether the transformation of data by AI constitutes a legitimate, legal use of copyrighted materials. While some argue that the generation of new, derivative works may fall under fair use, this position is controversial, especially when the data is used to generate profit without compensating the original artists. From a business perspective, this practice can expose companies to legal challenges and reputational damage if perceived as exploiting creative works without fair compensation.

Data Ownership and Consent

Who owns data once it appears online is a pivotal question in the digital age. While the law may grant copyright to original creators, the pervasive nature of the internet complicates these rights. Using someone's creative output without explicit consent for AI training can lead to disputes over intellectual property rights. Businesses must navigate these waters carefully to avoid legal pitfalls and build systems that respect creator rights, potentially through more transparent data usage policies or consent mechanisms.

The Bias Issue

Training AI models on data scraped from the web can inadvertently encode existing biases into these systems. This creates ethical, financial and legal risks for businesses as biased algorithms can lead to discriminatory outcomes and violate anti-discrimination laws. Biased AI systems can damage a company’s reputation and erode public trust in its products. Businesses must implement rigorous data curation and testing processes to identify and mitigate these biases.

Contact Us For More Information

            If you’d like to understand more about Zendata’s solutions and how we can help you, please reach out to the
            team today.
        

Start Your Free Trial

The Effectiveness and Limitations of Data Poisoning

Data poisoning is a complex challenge for AI systems, with varying degrees of impact depending on the tactics used and the model targeted. Here’s an in-depth analysis of its effectiveness and limitations:

Success is Not Guaranteed

Data poisoning's success in degrading AI performance is not a certainty. The effectiveness depends on several factors, including the scale of the poisoned data introduced, the sophistication of the AI model and the model's ability to detect and mitigate such disruptions. Businesses need to invest in robust AI systems that can identify and rectify corrupted inputs to maintain operational integrity.

Collective Action

For data poisoning to significantly impact an AI model, it often requires a coordinated effort involving multiple artists or data providers. This collective action can be challenging to organise and sustain over time. Businesses should be aware of the potential for such movements to form, especially in industries where copyright infringement concerns are prevalent.

Unintended Consequences

While data poisoning is intended to protect artists’ rights and challenge the ethical practices of AI training, it can have unintended consequences. For example, poisoning could inadvertently damage systems that rely on accurate data to provide essential services, potentially leading to broader disruptions beyond the intended targets. Businesses need to prepare for these risks by implementing more secure data handling and verification processes.

Looking Ahead – Alternatives and Regulation

As the debate around AI data usage continues to evolve, there are promising alternatives and regulatory measures that could provide clearer guidance and more ethical practices in AI training. Here’s a detailed exploration of future directions:

Opt-in Data Systems

One promising alternative is the development of opt-in data systems where artists can voluntarily contribute their work to AI training datasets under clear, fair terms. These platforms could provide compensation and control over how the data is used, ensuring that creators are part of the AI development process more transparently and respectfully. For businesses, adopting these practices could mitigate legal risks and foster a more collaborative relationship with content creators. For example, Stability AI was one of the first companies to sign up for a “Do Not Train” registry.

Legislative Solutions

The need for updated copyright laws specific to AI is becoming increasingly evident as the technology advances. Legislative solutions could include provisions that specifically address the nuances of AI data scraping and usage, providing clearer rules for what constitutes fair use in the context of AI and ensuring adequate compensation for creators. Such laws would protect artists and give businesses a more defined operational framework, reducing uncertainty and promoting ethical practices.

The Evolving Conversation

The ongoing dialogue among artists, AI developers, policymakers and legal experts is necessary to address the complex issues surrounding AI governance and data ethics. Engaging in this dialogue can help businesses stay ahead of regulatory changes and align their practices with societal values and expectations.

The use of AI in creative domains is a significant ethical debate centred around the balance between technological advancement and the protection of artistic rights. Data poisoning is one tactic artists can use to assert control over their work, challenging the practices of AI developers and highlighting the need for greater respect and compensation for creative contributions.

These issues underscore the need for better AI governance, regulatory frameworks and more ethical data practices in AI development. Solutions such as opt-in data systems and legislative reforms must be considered to ensure that the advancement of AI technologies does not come at the expense of creators' rights.

Businesses and policymakers must work together to establish practices that support innovation AND respect and uphold the rights of artists and content creators.

‍

Our Newsletter

Get Our Resources Delivered Straight To Your Inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

We respect your privacy. Learn more here.

Table of Content

The Architecture of Enterprise AI Applications in Financial Services

Understanding and Preventing Third Party Data Leakage Risks

Mastering The AI Supply Chain: From Data to Governance

Why Data Lineage Is Essential for Effective AI Governance

AI Security Posture Management: What Is It and Why You Need It

A Guide To The Different Types of AI Bias

Implementing Effective AI TRiSM with Zendata

What California's AB 1008 Could Mean For Data Privacy and AI

What Is Third Party Risk Management (TPRM)?

Why Artificial Intelligence Could Be Dangerous

Everything You Need To Know About HIPAA

The EU-U.S. Data Privacy Framework: Safeguarding Transatlantic Data Transfers

How Easy Is It To Re-Identify Data and What Are The Implications?

Governing Computer Vision Systems

Writing an Effective Privacy Policy

Who Is Responsible for Protecting PII?

Governing Deep Learning Models

Unmasking Privacy Risks in Alternative Ad-Tech Solutions

Do Small Language Models (SLMs) Require The Same Governance as LLMs?

Data Management Policies 101: Creating an Effective Policy For The Full Data Lifecycle

Data Provenance 101: The History of Data and Why It's Different From Data Lineage

Copilot and GenAI Tools: Addressing Guardrails, Governance and Risk

Data Strategy for AI Systems 101: Curating and Managing Data

Exploring Regulatory Conflicts in AI Bias Mitigation

AI Governance Maturity Models 101: Assessing Your Governance Frameworks

AI Governance Audits 101: Conducting Internal and External Assessments

AI Ethics Training 101: Educating Teams on Responsible AI Practices

Consent Management 101: Navigating User Consent for Data Collection and Use

AI Interpretability 101: Making AI Models More Understandable to Humans

Data Retention Policy 101: Best Practices for Storing and Deleting Data Responsibly

Threat Modelling, Risk Analysis and AI Governance For LLM Security

Understanding Data Flows in the PII Supply Chain

Data Minimisation 101: Collecting Only What You Need for AI and Compliance

Data Privacy Compliance 101: Key Regulations and Requirements

Data Retention Exceptions 101: When to Deviate from Data Retention Policies

AI Incident Response 101: Handling AI Failures and Unintended Consequences

Addressing Shadow AI Risks with Zendata AI Governance

AI Risk Assessment 101: Identifying and Mitigating Risks in AI Systems

From RAG to Agent Systems: The Transition to GenAI 2.0

AI Governance Policies 101: Drafting Effective Guidelines for AI Development and Use

AI Transparency 101: Communicating AI Decisions and Processes to Stakeholders

AI Bias 101: Understanding and Mitigating Bias in AI Systems

AI Explainability 101: Making AI Decisions Transparent and Understandable

Data Breach Response 101: What to Do When Personal Data Is Compromised

Data Access Controls 101: Restricting Data Access to Authorised Users Only

AI Auditing 101: Compliance and Accountability in AI Systems

Data Discovery 101: A Comprehensive Guide

How Zendata Improves Privacy Policy Compliance

AI Metrics 101: Measuring the Effectiveness of Your AI Governance Program

Is Data Lineage The Silver Bullet For AI Bias Mitigation?

AI Ethics 101: Comparing IEEE, EU, and OECD Guidelines

Master Data Management (MDM): A Guide to Leveraging Data for Business Success

AI Governance 101: Understanding the Basics and Best Practices

Data Anonymization 101: Techniques for Protecting Sensitive Information

Data Pseudonymisation 101: Protecting Personal Data & Enabling AI Innovation

Mapping The Data Journey Across A Layered Architecture

Understand Data Context: Enhancing Value and Usability

8 Best Practices For Effective Data Mapping

What Is Metadata Management and Why Is It Important?

What Is Data Interoperability and Why Is It Important?

Balancing Privacy and Fairness In Machine Learning

How Can Federal Agencies Become AI Ready?

Privacy Impact Assessments: What They Are and Why You Need Them

PII, PI and Sensitive Data: Types, Differences and Privacy Risks