Demystifying the real "AI migrant workers": mechanically "doing tasks" for survival The hourly wage is as low as 1 US dollar

1. Even the most powerful AI is backed by people, and it requires a lot of human labeled data to train it. AI annotators are jobs that people want to automate, and they are often thought to be automated, but still need human assistance.

2. They call the job "doing tasks" and don't know what they're doing.

3. Earn a meager income as low as $1-$3 per hour.

4. The labeler complains that "if I make someone a billionaire and I only make a few dollars a week, I'm really wasting my life." ”

Behind the AI mania lies an army of migrant workers

The popularity of artificial intelligence (AI) has not only replaced a number of jobs, but also created new industries, and a huge army of AI migrant workers who live on "doing tasks" is quietly rising.

Joe, 30, graduated a few months ago from a university in Kenya's capital, Nairobi, and landed a job as a data labeler. The job is tedious, and the whole day is processing raw information used to train AI. AI learns by looking for patterns in massive amounts of data, but that data must first be classified and labeled by humans. Humans are the vast workforce behind these machines.

In Joe's case, his job is to tag videos for self-driving cars, frame by frame at every possible camera angle, marking every car, pedestrian, cyclist, and anything that the owner needs to pay attention to. This is a difficult and repetitive work. A video clip with only a few seconds takes 8 hours to annotate, and Joe can only earn about $10 (about 72 yuan) for half a day.

However, in 2019, Joe ushered in a "fortune opportunity". Since a new company desperately needed labelers, they set up a bootcamp to train new labelers. Joe became the head of the camp and was paid four times as much as he had been as a labeler.

Every two weeks, 50 novices line up to enter an office building in Nairobi to begin their apprenticeships, giving the impression that the demand for labelers seems limitless. They were asked to sort through the clothes seen in photos of selfies taken in the mirror, determine the room they were in through the perspective of a robot vacuum cleaner, and draw squares around a motorcycle scanned by lidar.

More than half of Joe's students usually drop out before the end of bootcamp. "Some people don't know how to stay in one place for a long time." He explained tactfully. "It's boring," he admits.

"Do the task"

But in a place where jobs are scarce, it's a subsistence job. In the end, Joe produced hundreds of graduates. After the camp, the students returned home and worked alone in their bedrooms and kitchens, forbidden to tell anyone what they were doing. In fact, confidentiality is not a problem, because they hardly know what they are doing.

Marking obstacles for self-driving cars is easy to understand for these new students, but it's not so easy to sort through distorted pieces of dialogue that don't know if it's a robot or a human. They upload photos of themselves: first staring blankly at the camera, then grinning, and putting on motorcycle helmets. Each project is a small part of a larger program, so it's hard to say exactly what they're training the AI for. Nor could they look for clues from the names of these projects: "Crab Descendants," "Whale Segment," "Woodland Gyroscope," and "Pillbox Sausage," which are non-marginal project code names.

So, who exactly are they working for? Most people only know it as Remotasks, which is a website that offers jobs for English fluent speakers. Like most annotators, Joe didn't know that Remotasks was an outsourcing company owned by Scale AI, an American AI annotation company. Scale AI is a multibillion-dollar Silicon Valley data provider whose customers include OpenAI and the U.S. military. However, neither Remotasks nor Scale AI's websites mention the relationship.

Much of the focus on big language models like ChatGPT is focused on the jobs that AI replaces through automation. But behind even the most powerful AI is people, and it needs a lot of human labor to label the data to train it and clarify the data when it feels "confused." Only those companies that have the money to buy this data can compete, and these companies will try to prevent data breaches once they get it. As a result, with few exceptions, very little is known about the information that shapes the behavior of AI systems, and even less about the people who shape the behavior of those systems.

For Joe's students, it was a "very abnormal" job: no schedule, no colleagues, no idea what they were doing, and no idea who they were working for. In fact, they rarely call this labor work, only "task", they are task workers.

Anthropologist David Graeber once gave meaningless jobs: "bullshit jobs," referring to jobs that should have been replaced by automation but aren't automated for reasons like bureaucracy, status, or inertia. AI labellers' jobs are the opposite: jobs that people want to automate and often think they're automated, but still need to be replaced by humans. These jobs do have a purpose, it's just that workers often don't know what it is.

Label big business

The current AI boom stems from this unprecedented large-scale monotonous and repetitive labor.

In 2007, Li Feifei, an AI researcher who was a professor at Princeton University at the time, suspected that the key to improving image recognition neural networks was to train with more data, requiring millions of labeled images instead of tens of thousands. But the problem is that it took her team of undergraduates decades and millions of dollars to tag so many photos.

At the time, though, Amazon already had a crowdsourcing platform, Mechanical Turk, where people around the world could complete small tasks at low prices. So, Li Feifei found thousands of data annotators on Mechanical Turk and created the annotation dataset ImageNet. It led to a breakthrough in machine learning, a new life in the field, and a decade of progress.

Today, labeling is still a fundamental component of AI development. But engineers often feel that annotation is a short-lived, inconvenient prerequisite for the more fascinating task of building large models. You can collect as much labeled data as possible to train the model cheaply, and if it works, at least in theory, you won't need labellers. However, labeling is never really done. In the eyes of researchers, machine learning systems are "fragile." It can easily fail when it encounters something that is not adequately explained in the training data. These failures are known as "edge cases" and can have serious consequences.

In 2018, for example, a self-driving test car of ride-hailing giant Uber hit and killed a woman. Although the car's self-driving system is programmed to avoid cyclists and pedestrians, it doesn't know what to do with cyclists crossing the road. As more AI systems are thrown into the world to provide legal advice and medical assistance, the more edge situations they face and more humans will need to assist with them. This has spawned a global industry made up of people like Joe who use their own unique human abilities to help machines.

Labeling is big business. Scale AI was founded by 19-year-old Alexandr Wang in 2016 and was valued at $7.3 billion in 2021, placing him on Forbes' "Youngest Self-Made Billionaires" list. Since then, though, the value of his shares in the secondary market has fallen.

"The labeling business has a complete supply chain," said Sonam Jindal, head of program and research at the nonprofit Partnership on AI. All the excitement revolves around AI, and once we build it, there's no longer a need for labeling, so why think about it? But annotation is the infrastructure of AI. Human intelligence is the foundation of AI, and we need to think of it as real jobs in the AI economy that will exist for some time. ”

Well-known AI companies such as OpenAI, Google, and Microsoft all have their own data vendors. Some private outsourcing companies have call-center-like offices, such as CloudFactory in Kenya and Nepal. That's where Joe labels, $1.20 an hour, before switching to Remotasks. There are also "crowdsourced" sites like Mechanical Turk and Clickworker, where anyone can sign up to complete tasks. The middle tier is a service like Scale AI that anyone can sign up for, but everyone must pass qualification exams and training courses and undergo performance monitoring.

How to pick up the work?

Earlier this year, reporters registered on the website of Scale AI's outsourcing company, Remotasks. The process was simple, after entering the computer configuration, Internet speed and some basic contact information, the reporter arrived at the "training center". In order to receive a paid assignment, journalists must first complete a relevant, unpaid introductory course.

The training center displays a series of courses, but the names of these courses are puzzling, such as "Glue Swimsuit" and "Poster Hawaii." The reporter clicked on a course called "GFD Modular", which is to label clothing in social media photos.

The instructions for the course are strange, though. For example, they are basically made up of the same instructions and emphasized in special colors and capital letters, next to a bomb threat collage used as a warning.

"Be sure to label items that are real and can be worn by humans or intended for real people." This is what the directive requires.

"All items below should be labeled because they are real and can be worn by people in real life." The directive emphasizes again. The items come from an AJ brand ad, a man wearing a Star Wars Kyloren helmet, and a mannequin wearing a skirt. The images have a lime-green box on top of them, and the text inside again explains, "Label real things that real people can wear."

For items that cannot be marked, the directive also gives a prominent reminder: "The following items should not be labeled, because it is impossible for humans to wear these items in real life!" ”

The reporter felt confident in his ability to discriminate and began the test. The first is a photo of a magazine with a woman wearing a skirt. Is the costume in the photo real one? The reporter thinks not, because people can't wear the costumes in the photos. But, wrong! In AI's view, a photo of real clothing is real clothing. Next is a photograph of a woman standing in front of a full-body mirror in a dimly lit bedroom. The shirt and shorts she was wearing were real. And what about reflections? The same is true! The reflection of real clothing is also real clothing.

After an embarrassing lot of trial and error, the reporter finally got to work, only to be horrified to find that the instructions he had been trying to follow had been updated and clarified many times, and had now become a 43-page instruction book: do not mark suitcases that are full of clothes and open; Mark shoes but not fins; Mark leggings but not tights; Don't mark it even if someone is wearing a towel; Label clothing, but not armor.

Income is meagre

Most of the work on Remotasks is paid by the piece, with income ranging from a few cents to a few dollars for a task. Because tasks can take seconds or hours, salaries are difficult to predict. The labelers say that when Remotasks first arrived in Kenya, they were paid relatively well. Depending on the task, the average is around $5 to $10 per hour. But over time, the pay will drop.

Scale AI spokeswoman Anna Franko said the company's economists analyze the details of the project, the skills needed, the cost of living in the area and other factors "to ensure fair and competitive compensation." Former Scale AI employees also said that labeler compensation is determined through a dynamic pricing-like mechanism that adjusts based on the number of labelers available and the urgency of the data needs.

According to interviews and job postings, Remotasks annotators in the United States typically earn $10 to $25 an hour, and some experts in professional annotation fields are paid more. At the beginning of the year, the Kenyan labellers' pay had dropped to $1 to $3 an hour.

This is still when you can make money. The most common complaint that annotators complain about Remotasks' work is its instability. It may provide a stable enough job for an annotator to work full-time for a long time, but it is also unpredictable enough to be relied on. The labeller spends hours reading instructions and completing unpaid training, only to complete a few dozen tasks, and then the project is over. Then, the labeler might not have a new task for several days, and then suddenly a completely different task would appear, which could last from a few hours to a few weeks. Any task may be their last, and they never know when the next one will come.

Engineers and data vendors say this on-and-off work depends on the pace of AI development. Training a large model requires a lot of labeling, followed by more iterative updates. Engineers want all of this done as quickly as possible so they can meet target release dates, perhaps thousands of annotators in a matter of months, then a few hundred, then a dozen or so specific types of experts, then thousands. "The question is, who will bear the cost of these fluctuations in demand?" Jindal of the AI Partnership said, "Because it's the labellers who bear the costs at the moment." ”

Huddle for warmth

To be successful, labelers work with each other. Victor started working for Remotasks while attending university in Nairobi. When the reporter told him about the difficulties he had encountered in the traffic guide mission, he said that everyone knew to stay away from the task: it was too tricky, it was not paid well, and it was not worth doing.

Like many annotators, Victor uses unofficial WhatsApp group chats to notify people when there is a good task. When he figures out a new task, he impromptu uses Google Meets, a video conferencing service, to share with others how to complete the task. Anyone can join the meeting, research together, and share tips. "It's a culture of helping each other that we've fostered, because we know that one person can't master all the skills." He said.

Since work can appear and disappear without warning, labelers always need to be vigilant. Victor found that projects would pop up late at night, so he used to wake up every three hours or so to check his task queue. When there is a task, he will stay awake as long as he can get himself to work. At one point, he stayed up 36 hours to mark his elbows, knees and head in crowd photos, and he didn't know what he was for. Another time, he stayed up too long and his mother asked him what was wrong with his eyes. When he looked in the mirror, he realized that his eyes were swollen

Labelers often only vaguely know that they're training AI for businesses elsewhere, but sometimes that mystery disappears because instructions mention a certain brand or chatbot says too much. "I read some material and found out through a Google search that I was working for a 25-year-old billionaire." One employee said he was labeling the mood of the person who called to order Domino's pizza.

"If I make someone a billionaire and I only make a few dollars a week, I'm really wasting my life." He said dissatisfied.

No one will remember us

Victor describes himself as an AI "fanatic." He started labeling because he wanted to help achieve a fully automated post-work future. But earlier this year, someone posted a Time magazine story on his WhatsApp group about labelers training ChatGPT to recognize toxic content, but Scale AI paid them less than $2 an hour.

"It's outrageous that these companies are lucrative, but they pay labellers so low." Victor said. When told about Remotasks' relationship with Scale, he learned that one of the tasks he was involved in had nearly identical instructions to those used by OpenAI, meaning he might also be training ChatGPT for about $3 an hour.

"I remember someone posting that we will be remembered in the future," he said, "and then another person replied that we were treated worse than infantrymen and that we would not be remembered for anything in the future." I remember this passage very well. No one will recognize the work we do and the effort we put in. "Phoenix Network Technology "AI Outpost" will continue to pay attention to this.

This article is from Phoenix Technology