laitimes

Spending only $60 can destroy 0.01% of the dataset, significantly reducing AI model performance

Editor: Yuan Mingyi

Network-scale datasets are vulnerable to low-cost poisoning attacks that only require a small fraction of the compromised sample to poison the entire model.

The dataset used to train deep learning models has grown from thousands of curated examples to network-scale datasets with billions of automatically crawled samples from the internet. At this scale, it is not feasible to ensure the quality of each example through manpower management. So far, this trade-off between quantity and quality has been acceptable, partly because modern neural networks are highly resilient to large amounts of labeled noise, and partly because training on noisy data can even improve the utility of the model on non-distributed data.

While large deep learning models are somewhat inclusive of random noise, even a very small amount of adversarial noise in the training set (i.e., poisoning attacks) is sufficient to introduce targeted errors into model behavior. Previous research has suggested that poisoning attacks on modern deep learning models is feasible in the absence of human management. However, despite the potential threat, it appears that no real attack involving the poisoning of network-scale datasets has taken place so far. Part of the reason may be that previous research has overlooked the question of how adversaries can ensure that their corrupted data will be included in a network-scale dataset.

In this article, researchers from Google, ETH Zurich and other institutions have written articles introducing two new data poisoning attack methods:

Split-view data poisoning: The first attack targets current large datasets (e.g., LAION-400M) and takes advantage of the fact that the data that the researcher sees at the time of collection may be different (significant and random) from the data that the end user sees at training time.

Frontrunning data poisoning: The second attack exploited popular datasets, such as Wikipedia's snapshot. This poisoning is possible: even if content moderators detect and revert malicious modifications after the fact, the attacker's malicious content persists in the snapshot of the trained deep learning model.

Spending only $60 can destroy 0.01% of the dataset, significantly reducing AI model performance

Address: https://arxiv.org/pdf/2302.10149.pdf

The study explored the feasibility of both attacks on 10 popular datasets. The results show that these attacks are feasible even for low-resource attackers: for just $60, they can poison 0.01% of the LAION-400M or COYO-700M datasets.

To combat these poisoning modalities, this article will describe two defenses:

Integrity verification: Prevent split view poisoning by distributing cryptographic hashes to all indexed content;

Time-based defense: Prevent frontrunning data poisoning with random data snapshots and the sequence in which network-scale datasets are ingested.

In addition, this article will discuss the limitations of these defenses and future solutions.

Two means of attack

Segmented view poisoning

The first poisoning method described in this article takes advantage of the fact that the index of a distributed dataset published by the maintainer cannot be modified, but the contents of the URL in the dataset can be modified.

The study observed: Sometimes domain names expire, and once they expire, anyone can buy them, so domain expiration is common in large datasets. By owning a domain name, the data downloaded in the future can be toxic.

The study also noted that attackers often purchase expired domain names in order to gain the remaining trust that comes with those domains.

Studies have shown that segmented view poisoning is effective in practice because the indexes of most network-scale datasets remain the same long after they are first published, even after a large portion of the data becomes obsolete. And crucially, very few (and no modern) datasets contain any form of cryptographic integrity checking of the downloaded content.

Frontrunning data poisoning

The second poisoning extends the scope of segmented view poisoning to the settings of web resources where the attacker has no continuous control over the dataset index. Instead, attackers can only modify web content for a short period of time, possibly minutes, before malicious modifications are detected.

Frontrunning attacks rely on the fact that, in some cases, adversaries can accurately predict when to access web resources and include them in dataset snapshots. As a result, an attacker can poison the dataset contents before the administrator collects the snapshot, pre-empting the content administrator who will later resume the malicious edit. As a result, an attacker can predict the snapshot time of any Wikipedia article, down to the minute.

Attack results

The rightmost column of Table 1 shows the findings. Even the oldest and least frequently accessed datasets have at least 3 downloads per month. As a result, more than 800 downloads were compromised by the attack methods described in this article in the 6 months of tracking data. Unsurprisingly, newer datasets have a higher volume of requests than older datasets. As a result, different datasets offer different trade-offs for attackers: newer datasets have a smaller percentage of purchasable images, but the scope of the attack can reach more and more vulnerable clients.

Spending only $60 can destroy 0.01% of the dataset, significantly reducing AI model performance

Measure the cost of an attack. The most immediate question is whether this attack can be implemented in practice, with its main limitation being the monetary cost of buying a domain name, measured using the cost reported by Google Domains in August 2022. Figure 1 shows the proportion of images in the dataset that can be controlled by an attacker as a function of their budget. The study found that at least 0.01% of the data in each dataset can be controlled, costing less than $60 per year.

Spending only $60 can destroy 0.01% of the dataset, significantly reducing AI model performance

By monitoring the URLs requested in the domain purchased by the study, the researchers plotted the time each URL was requested, color-coded by the source IP, and could directly read dozens of Conceptual 12M users. See Figure 2 for details.

According to conservative analysis, 6.5% of Wikipedia documents can currently be poisoned without any other defenses.

Read on