laitimes

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

Reporting by XinZhiyuan

EDIT: LRS

【New Zhiyuan Introduction】Codeless AI development platform, a line of code can be trained AI model without writing! But AI researchers at Queen Mary University of London warn that such platforms may bias the trained model, and users may be completely unaware of it. The developer has a very good opinion on this remark: the pot of user data, we do not back!

With the continuous development of artificial intelligence technology, AI has also begun to set off a technological revolution within major companies.

The AI Journal report notes that executives at major companies generally believe that AI can make business processes more efficient and help create new business models and products, such as at PwC, where 86 percent of upper-level decision makers believe that AI is already a mainstream technology within the company.

Companies need experts in data science to analyze data, but AI-driven technology trends are coming too fast, causing the market to not keep up with the pace.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

So the code-free AI development platform came into being.

As the name suggests, codeless AI development means that YOU CAN DEVELOP AI APPLICATIONS WITHOUT WRITING CODE. Such tools can abstract away the various complex modules needed to build a sound AI system, and then visualize it so that non-data science experts can also develop a machine learning model according to different market needs.

In fact, not only is it codeless AI development, the normal application development trend is also codeless development, and the famous IT consulting company Gartner predicts that by 2024, 65% of AI application development will adopt a no-code or low-code development method.

But abstracting data science work is actually risky, because non-experts are not clear about the underlying logic of the model, so what the model can do, what cannot be done, and what flaws exist are easily ignored in the process of no-code development.

No-code AI development platform

The more well-known no-code AI development platforms include DataRobot, Google AutoML, Lobe (acquired by Microsoft in 2018), and Amazon SageMaker, which provide different types of models for end customers, but one thing is the same, that is, it provides drag-and-drop dashboards that allow users to upload or import data to train or fine-tune models, and can automatically classify and normalize data. The platform can also automatically find an optimal model to adapt to these tasks based on the data and predictions provided by the customer.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

Tasks such as preprocessing data and finding models have traditionally required guidance from data scientists, but abstracting these modules and then automatically finding the best solutions in various ways can greatly alleviate the need for professional data scientists.

Using a no-code AI platform, users can upload a data spreadsheet to the interface, select from a menu, and start model creation.

Based on the uploaded data, the platform will train different models to find patterns in text, audio, images, and videos, and a common task within a company is to analyze sales records and marketing data to predict future sales.

At first glance, no-code development tools are simply too convenient! Compared to programmers who go bald in front of a computer, no-code development has clear advantages in terms of accessibility, usability, speed, cost, and scalability. What you need, the data expert has already wrapped it up for you, and all that's left is to train the model to make predictions based on specific application scenarios.

But there is no free lunch in the world, and all the shortcuts are secretly marked with prices.

Mike Cook, an AI researcher at Queen Mary University of London, points out that while most platforms are "implying" that customers are responsible for the prediction errors in their models, these tools are not advertised!

They've been stressing how smart they are, which could lead people to stop paying attention to the importance of debugging and auditing models.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

"These kinds of AI tools all have one thing in common, and like everything related to the AI craze, they look and sound serious, official, and secure. So if they tell you to use this new model, your prediction accuracy can be improved by 20%, then unless they tell you, you probably won't ask why the model performance is improving. This is not to say that you are more likely to create biased models, but rather that you may not realize that the model has these kinds of problems, which may be important in practical applications."

This phenomenon is known as automation bias, in which people tend to trust the predictions of automated decision-making systems.

A 2018 Microsoft Research study suggests that if the internal implementation of machine learning models is too transparent to users, the "fan confidence" of AI models may backfire.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

In 2020, the University of Michigan and Microsoft Research published another paper showing that even experts tend to over-trust visual overviews of models, even though such graphical visualizations may be mathematically meaningless.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

In fact, it is normal to think about it, who does not like to look at the diagrams in the paper? If you draw a beautiful diagram, the credibility of your results will also skyrocket.

Image bias is particularly serious in the field of computer vision, CV models in the process of receiving training images are highly susceptible to bias, even changes in the background will affect the accuracy of the model, and even photos taken by different cameras have an impact on the accuracy. If the dataset categories are unbalanced, the bias impact may be more severe.

Natural language models have also been shown to be biased, and if the training corpus comes from Reddit's post content, the trained model will have more biases about race, ethnicity, religion, and gender, for example, black people may be more relevant to negative emotions.

What do developers say?

While the model's bias is true, suppliers feel very differently about it than researchers.

Jonathon Reilly, co-founder of Akkio, a code-free AI platform, says anyone who creates a model should understand that the better the quality of the data entered, the better the model's predictions.

While he acknowledges that no-code AI development platforms have a responsibility to give users an idea of how models make decisions, he believes that the problem of data bias is the user's responsibility.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

"The best way to eliminate bias in model predictions is to eliminate bias in the input data so that the model doesn't learn unwanted data patterns from the data." The best candidates to eliminate data bias are usually topic-matter experts, not data scientists."

Bill Kish, founder of cogniac, a codeless computer vision startup, agrees that bias is a dataset problem rather than a tool problem. Bias often reflects existing human perception issues, and platforms can mitigate bias but have no responsibility to completely eliminate bias in the data.

"The problem of bias in computer vision systems is due to the inherent bias in the ground truth data of human management. Cogniac can act as a system of record for managing visual data assets, assisting experts in finding bias problems in the data and addressing them through interactive scenarios."

Considering that the data for user-trained models is usually provided by themselves, Bill thinks that we will not recognize it by throwing the pot completely to the developer.

There's also the fact that model bias doesn't just come from the training dataset.

The 2019 MIT Tech Review pointed out that the AI models adopted by some companies may have potential unfairness and discrimination problems in some application scenarios (such as credit assessment), and this bias may be introduced in the data preparation or model selection stage, thus affecting the accuracy of predictions.

Is the code-free AI development platform really fragrant? AI researchers accuse the training model of bias

Developers of code-free platforms are already working to address model bias.

DataRobot has a Humility setting that reminds users of whether there is a problem with the data when the model performance is "incredible", and takes corrective action in time, such as specifying a prediction range, truncate when it is out of range, etc.

But for now, the tools and technical capabilities of these debiasing are still very limited, and if you don't understand the possibilities and causes of bias, the likelihood of problems in the model increases.

Reilly believes that what developers should do is to improve user awareness, model transparency and accessibility, while promoting a clear regulatory frameworks. Businesses using AI models should be able to easily indicate how models are making decisions through supporting evidence of AI development platforms, giving users confidence in the ethical and legal implications of the trained models.

The extent to which a model needs to be useful in the real world depends on the "specific task", so there is no need to blindly pursue accuracy and introduce data bias.

Resources:

https://shimo.im/docs/XKgDJGwVx6dHgvvQ

Read on