Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

Reporting by XinZhiyuan

EDIT: LRS

【New Zhiyuan Introduction】Codeless AI development platform, a line of code can be trained AI model without writing! But AI researchers at Queen Mary University of London warn that such platforms may bias the trained model, and users may be completely unaware of it. The developer has a very good opinion on this remark: the pot of user data, we do not back!

With the continuous development of artificial intelligence technology, AI has also begun to set off a technological revolution within major companies.

The AI Journal report notes that executives at major companies generally believe that AI can make business processes more efficient and help create new business models and products, such as at PwC, where 86 percent of upper-level decision makers believe that AI is already a mainstream technology within the company.

Companies need experts in data science to analyze data, but AI-driven technology trends are coming too fast, causing the market to not keep up with the pace.

So the code-free AI development platform came into being.

As the name suggests, codeless AI development means that YOU CAN DEVELOP AI APPLICATIONS WITHOUT WRITING CODE. Such tools can abstract away the various complex modules needed to build a sound AI system, and then visualize it so that non-data science experts can also develop a machine learning model according to different market needs.

In fact, not only is it codeless AI development, the normal application development trend is also codeless development, and the famous IT consulting company Gartner predicts that by 2024, 65% of AI application development will adopt a no-code or low-code development method.

But abstracting data science work is actually risky, because non-experts are not clear about the underlying logic of the model, so what the model can do, what cannot be done, and what flaws exist are easily ignored in the process of no-code development.

No-code AI development platform

The more well-known no-code AI development platforms include DataRobot, Google AutoML, Lobe (acquired by Microsoft in 2018), and Amazon SageMaker, which provide different types of models for end customers, but one thing is the same, that is, it provides drag-and-drop dashboards that allow users to upload or import data to train or fine-tune models, and can automatically classify and normalize data. The platform can also automatically find an optimal model to adapt to these tasks based on the data and predictions provided by the customer.

Tasks such as preprocessing data and finding models have traditionally required guidance from data scientists, but abstracting these modules and then automatically finding the best solutions in various ways can greatly alleviate the need for professional data scientists.

Using a no-code AI platform, users can upload a data spreadsheet to the interface, select from a menu, and start model creation.

Based on the uploaded data, the platform will train different models to find patterns in text, audio, images, and videos, and a common task within a company is to analyze sales records and marketing data to predict future sales.

At first glance, no-code development tools are simply too convenient! Compared to programmers who go bald in front of a computer, no-code development has clear advantages in terms of accessibility, usability, speed, cost, and scalability. What you need, the data expert has already wrapped it up for you, and all that's left is to train the model to make predictions based on specific application scenarios.

But there is no free lunch in the world, and all the shortcuts are secretly marked with prices.

Mike Cook, an AI researcher at Queen Mary University of London, points out that while most platforms are "implying" that customers are responsible for the prediction errors in their models, these tools are not advertised!

They've been stressing how smart they are, which could lead people to stop paying attention to the importance of debugging and auditing models.

"These kinds of AI tools all have one thing in common, and like everything related to the AI craze, they look and sound serious, official, and secure. So if they tell you to use this new model, your prediction accuracy can be improved by 20%, then unless they tell you, you probably won't ask why the model performance is improving. This is not to say that you are more likely to create biased models, but rather that you may not realize that the model has these kinds of problems, which may be important in practical applications."

This phenomenon is known as automation bias, in which people tend to trust the predictions of automated decision-making systems.

A 2018 Microsoft Research study suggests that if the internal implementation of machine learning models is too transparent to users, the "fan confidence" of AI models may backfire.

In 2020, the University of Michigan and Microsoft Research published another paper showing that even experts tend to over-trust visual overviews of models, even though such graphical visualizations may be mathematically meaningless.

In fact, it is normal to think about it, who does not like to look at the diagrams in the paper? If you draw a beautiful diagram, the credibility of your results will also skyrocket.

Image bias is particularly serious in the field of computer vision, CV models in the process of receiving training images are highly susceptible to bias, even changes in the background will affect the accuracy of the model, and even photos taken by different cameras have an impact on the accuracy. If the dataset categories are unbalanced, the bias impact may be more severe.

Natural language models have also been shown to be biased, and if the training corpus comes from Reddit's post content, the trained model will have more biases about race, ethnicity, religion, and gender, for example, black people may be more relevant to negative emotions.

What do developers say?

While the model's bias is true, suppliers feel very differently about it than researchers.

Jonathon Reilly, co-founder of Akkio, a code-free AI platform, says anyone who creates a model should understand that the better the quality of the data entered, the better the model's predictions.

While he acknowledges that no-code AI development platforms have a responsibility to give users an idea of how models make decisions, he believes that the problem of data bias is the user's responsibility.

"The best way to eliminate bias in model predictions is to eliminate bias in the input data so that the model doesn't learn unwanted data patterns from the data." The best candidates to eliminate data bias are usually topic-matter experts, not data scientists."

Bill Kish, founder of cogniac, a codeless computer vision startup, agrees that bias is a dataset problem rather than a tool problem. Bias often reflects existing human perception issues, and platforms can mitigate bias but have no responsibility to completely eliminate bias in the data.

"The problem of bias in computer vision systems is due to the inherent bias in the ground truth data of human management. Cogniac can act as a system of record for managing visual data assets, assisting experts in finding bias problems in the data and addressing them through interactive scenarios."

Considering that the data for user-trained models is usually provided by themselves, Bill thinks that we will not recognize it by throwing the pot completely to the developer.

There's also the fact that model bias doesn't just come from the training dataset.

The 2019 MIT Tech Review pointed out that the AI models adopted by some companies may have potential unfairness and discrimination problems in some application scenarios (such as credit assessment), and this bias may be introduced in the data preparation or model selection stage, thus affecting the accuracy of predictions.

Developers of code-free platforms are already working to address model bias.

DataRobot has a Humility setting that reminds users of whether there is a problem with the data when the model performance is "incredible", and takes corrective action in time, such as specifying a prediction range, truncate when it is out of range, etc.

But for now, the tools and technical capabilities of these debiasing are still very limited, and if you don't understand the possibilities and causes of bias, the likelihood of problems in the model increases.

Reilly believes that what developers should do is to improve user awareness, model transparency and accessibility, while promoting a clear regulatory frameworks. Businesses using AI models should be able to easily indicate how models are making decisions through supporting evidence of AI development platforms, giving users confidence in the ethical and legal implications of the trained models.

The extent to which a model needs to be useful in the real world depends on the "specific task", so there is no need to blindly pursue accuracy and introduce data bias.

Resources:

https://shimo.im/docs/XKgDJGwVx6dHgvvQ

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

Read on

Interview with the new giant| Hexun Technology: leading the domestic C+ development platform and building an autonomous and controllable firewall

Accurately identifying technical debt is the way to transform legacy systems

The code shows that AMD is adding ray-traced driver support to GCN graphics cards

I wrote neural networks in ChatGPT: without changing a word, it turned out to be very good

With an annual salary of 1.35 million, what kind of programmer "gets older and eats better"?

GATES: The last Microsoft product I worked on was back in 1985

DeepMind released a new model design tool: building models backwards from interpretable logic

Programmer Danger! OpenAI is rumored to recruit an outsourcing army around the world to train ChatGPT code farmers

iOS17 code leaked, but exposed the "melon" of iPhone 15

One person changed the code to crash Twitter, Musk was mad: all rewritten!

Stop inciting anxiety, GPT-4 can't replace you

GPT-4 liberates programmers! GitHub launches Copilot X, which allows you to write code at your fingertips

ChatGPT can run the code by itself: directly enter the running result when asking for it, and netizens call it "magic"

Musk fulfilled his promise, Twitter open-sourced the recommendation algorithm: listen to users' suggestions, improve the algorithm

Mathematical genius Tao Zhexuan: GPT-4 cannot solve an unsolved mathematical problem, but it helps in the work

Not writing code and "playing" ChatGPT make millions a year, suggesting that engineers are becoming the new favorite of Silicon Valley

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

GPT4o, the most powerful AI model in history, was released, and openai continued to lead in the field of artificial intelligence

Stuck in the neck by electricity? Zuckerberg: The shortage of GPUs in AI data centers has been alleviated, and it is impossible to grow at a high rate for a long time, and the bottleneck in the future will be power supply [with analysis of R&D investment in the artificial intelligence industry]

The 77th Festival de Cannes is about to open with the deployment of artificial intelligence security technology

The "sweet spot" of artificial intelligence drove Taiwan's major stock indexes to new highs

Focusing on the frontiers of science and technology such as artificial intelligence, deep space universe, and brain-computer interface, the 2024 Sohu Technology Annual Forum opened this week

Technology Application | Research on the application of artificial intelligence technology in anti-money laundering work

The semi-final of the 2024 Digital China Innovation Competition and Artificial Intelligence Track was successfully held

In the era of artificial intelligence, embrace or reject? The cyber police will give you tips

U.S. President's Council of Advisers on Science and Technology Releases: "Enabling Research: Harnessing Artificial Intelligence to Address Global Challenges"

New research finds that some AI systems are already good at "lying...... Let's listen to the health morning news! May 14, 2024

Artificial intelligence can also provide "emotional value", OpenAI released a new large model GPT-4o

Weaving Unseen: Artificial Intelligence and Visual Storytelling|MC2 AI Image Pilot Event

Musk talks about the development of artificial intelligence in China and the United States: the gap between the two is rapidly narrowing

AMD's new AI chip: subverting Nvidia's monopoly and leading a new wave of AI hardware

Delay retirement until the age of 65, what if it is replaced by artificial intelligence at the age of 35.

Huang Qifan: In the era of artificial intelligence, there are also "four major pieces" and "five major pieces" that have entered thousands of households