laitimes

A decade of AI review

author:AI China
A decade of AI review

The past decade has been an exciting and eventful time for the field of artificial intelligence (AI). The modest exploration of the potential of deep learning has turned into an explosive proliferation of fields that now include everything from recommender systems in e-commerce to object detection in self-driving cars and generative models that can create everything from photorealistic images to coherent text.

In this article, we'll walk along the path of memory and revisit some of the key breakthroughs that got us to where we are today. Whether you're an experienced AI practitioner or just interested in the latest developments in the field, this article will give you a comprehensive overview of the remarkable advances that have made AI a household name.

2013: AlexNet and variational autoencoders

2013 was widely considered the "adult" of deep learning, initiated by major advances in computer vision. According to a recent interview with Geoffrey Hinton, by 2013, "almost all computer vision research was moving to neural networks." This boom was largely driven by a rather surprising breakthrough in image recognition a year ago.

In May 2012, AlexNet, a deep convolutional neural network (CNN), demonstrated the potential of deep learning for image recognition tasks with record-breaking performance at the ImageNet Massive Visual Recognition Challenge (ILSVRC). It achieved 15.3% of the top 10 errors, 9.<>% lower than its closest competitor.

A decade of AI review

The technical improvements behind this success contribute to the future trajectory of AI and dramatically change the way people think about deep learning.

First, the authors applied a deep CNN consisting of five convolutional layers and three fully connected linear layers—an architectural design that many considered impractical at the time. In addition, due to the large number of parameters generated by the depth of the network, training is done in parallel on two graphics processing units (GPUs), demonstrating the ability to significantly accelerate training on large datasets. Training time is further reduced by replacing traditional activation functions such as sigmoid and tanh with more efficient rectified linear units (ReLUs).

A decade of AI review

Together, these led to AlexNet's successful advances, marking a turning point in the history of artificial intelligence and triggering a surge in interest in deep learning in academia and technology. As a result, 2013 is considered by many to be the inflection point where deep learning really began to take off.

Also happened in 2013, albeit a bit drowned out by the noise of AlexNet, the development of variational autoencoders (VAEs) – generative models that can learn to represent and generate data such as images and sounds. They work by learning compressed representations of input data in low-dimensional spaces, called latent spaces. This allows them to generate new data by sampling from this learned latent space. Later, VAE opened up new avenues for generative modeling and data generation, with applications in art, design, and games.

2014: Generative adversarial networks

The following year, <> 2014, the field of deep learning witnessed another major advance with the introduction of generative adversarial networks (GANs) by Ian Goodfellow and colleagues.

A GAN is a neural network that generates new samples of data similar to the training set. Essentially, two networks are trained simultaneously: (1) the generator network generates fake or synthetic samples, and (2) the discriminator network evaluates their authenticity. This training takes place in a game-like setting, where the generator tries to create samples that trick the discriminator, which tries to call fake samples correctly.

At the time, GANs represented a powerful and novel data generation tool for generating not only images and videos, but also music and art. They also contributed to advances in unsupervised learning by demonstrating the possibility of generating high-quality data samples without relying on explicit labels, an area largely considered underdeveloped and challenging.

2015: ResNets and NLP breakthrough

In 2015, the field of artificial intelligence made considerable progress in computer vision and natural language processing (NLP).

Kaim He and colleagues published a paper titled "Deep Residual Learning for Image Recognition," in which they introduced the concept of residual neural networks, or ResNets—architectures that make it easier for information to flow through the network by adding shortcuts. Unlike regular neural networks, where each layer takes the output of the previous layer as input, in ResNet additional residual connections are added that skip one or more layers and connect directly to deeper layers in the network.

As a result, ResNets was able to solve the problem of vanishing gradients, which allowed training deeper neural networks beyond what was expected at the time. This, in turn, has led to significant improvements in image classification and object recognition tasks.

Around the same time, researchers made considerable progress in the development of recurrent neural network (RNN) and long short-term memory (LSTM) models. Although these models have been around since the 1990s, they didn't start to make some splash until around 2015, mainly due to the following factors: (1) larger and more diverse datasets available for training, (2) improvements in computing power and hardware that made it possible to train deeper and more complex models, and (3) modifications made along the way, such as more complex gating mechanisms.

As a result, these architectures enable language models to better understand the context and meaning of text, resulting in dramatic improvements in tasks such as language translation, text generation, and sentiment analysis. The success of RNNs and LSTMs at the time paved the way for the development of Large Language Models (LLMs) that we see today.

2016: Alpha Go

After Garry Kasparov was defeated by IBM's Deep Blue in 1997, another human-machine battle sent shockwaves through the gaming world in 2016: Google's AlphaGo defeated Go world champion Lee Sedol.

A decade of AI review

Sonishi's defeat marks another important milestone in the trajectory of artificial intelligence: it shows that machines can outperform even the most skilled human players in a game that was once considered too complex for computers to handle. AlphaGo combines deep reinforcement learning and Monte Carlo tree search to analyze millions of locations from previous games and evaluate the best possible action — a strategy that far outperforms human decision-making in this case.

2017: Converter architecture and language model

Arguably 2017 was the most critical year, laying the groundwork for the breakthroughs we are witnessing today in generative AI.

In <> 2017, Vaswani and colleagues published the foundational paper, "Attention is All You Need," which introduced converter architectures that utilize the concept of self-attention to process sequential input data. This allows for more efficient handling of remote dependencies, which was previously a challenge for traditional RNN architectures.

A decade of AI review

A transformer consists of two basic components: an encoder and a decoder. The encoder is responsible for encoding the input data, for example, the input data can be a sequence of words. It then takes input sequences and applies multiple layers of self-attention and feedforward neural networks to capture relationships and features in sentences and learn meaningful representations.

Essentially, self-attention allows the model to understand the relationship between different words in a sentence. Unlike the traditional model of processing words in a fixed order, the converter actually checks all words at once. They assign each word something called an attention score based on its relevance to other words in the sentence.

The decoder, on the other hand, takes the encoding representation from the encoder and generates an output sequence. In tasks such as machine translation or text generation, the decoder generates a translated sequence based on the input received from the encoder. Similar to encoders, decoders also consist of multi-layer self-attention and feed-forward neural networks. However, it includes an additional attention mechanism that allows it to focus on the output of the encoder. This then allows the decoder to consider relevant information from the input sequence when generating the output.

Since then, the converter architecture has become a key component of LLM development and has brought significant improvements in the entire NLP field, such as machine translation, language modeling, and question answering.

2018: GPT-1, BERT and graph neural networks

A few months after Vaswani et al. published their basic paper, OpenAI launched the Generative P Retraining Transformer or GPT-1 in January 2018, which utilizes a transformer architecture to efficiently capture long-term dependencies in text. GPT-<> is one of the first models to demonstrate the effectiveness of unsupervised pre-training, which is then fine-tuned for specific NLP tasks.

Google also takes advantage of the transformer architecture, which is still fairly novel, and it released and open-sourced their own pre-training method in late 2018, called Bidirectional Encoder R from a demo of Transformers or BERT. Unlike previous models that processed text in a unidirectional fashion, including GPT-1, BERT considers the context of each word in both directions. To illustrate this, the author provides a very intuitive example:

。 In the sentence "I accessed a bank account", the one-way context model will represent a "bank" based on "I visited" instead of "Account". However, BERT uses its previous and next contexts to mean "bank" – "I visited... Account" – Start at the very bottom of the deep neural network and make it deep bidirectional.

The concept of bidirectionality is so powerful that it makes BERT superior to state-of-the-art NLP systems on a variety of benchmark tasks.

In addition to GPT-1 and BERT, graph neural networks, or GNNs, also caused some buzz that year. They belong to the category of neural networks specifically designed to process graph data. GNNs utilize messaging algorithms to propagate information between nodes and edges of a graph. This enables the network to learn the structure and relationships of the data in a more intuitive way.

This work allows deeper insights to be extracted from the data, expanding the range of problems that deep learning can be applied to. With GNN, significant advances have been made in areas such as social network analysis, recommendation systems, and drug discovery.

2019: GPT-2 and improved generative models

2019 marked several notable advancements in generative models, notably the introduction of GPT-2. The model achieves state-of-the-art performance in many NLP tasks and is capable of generating highly realistic text, which, in hindsight, really discouraged peers.

Other improvements in the field include DeepMind's BigGAN and NVIDIA's StyleGAN, which produces high-quality images that are virtually indistinguishable from real images, and NVIDIA's StyleGAN, which gives more control over the appearance of these generated images.

Overall, these advances, now known as generative AI, push the boundaries of the field further, and...

2020: GPT-3 and Self-supervised Learning

... Soon after, another model was born, which had become a household name even outside the technical community: GPT-3. The model represents a major leap forward in the scale and capabilities of LLM. To put things in context, GPT-1 has a pitiful 11.71 billion parameters. This number rises to 200 million for GPT-5 and 300 million for GPT-175.

The large parameter space enables GPT-3 to produce very coherent text across a variety of prompts and tasks. It also performs impressively in various NLP tasks such as text completion, question answering, and even creative writing.

In addition, GPT-3 once again emphasizes the potential of using self-supervised learning, which allows models to be trained on large amounts of unlabeled data. The benefit of this is that these models can gain a broad understanding of the language without extensive task-specific training, which makes it more economical.

From protein folding to image generation and automatic coding assistance, 2021 was eventful, thanks to AlphaFold 2, DALL· E and GitHub Copilot.

AlphaFold 2 has been hailed as a long-term solution to decades of protein folding problems. Researchers at DeepMind extended the transformer architecture to create evoformer blocks — architectures that utilize evolutionary strategies for model optimization — to build models that can predict protein 1D structure based on their 3D amino acid sequences. This breakthrough has great potential to revolutionize areas such as drug discovery, bioengineering, and our understanding of biological systems.

OpenAI also made the news again this year, releasing DALL·E. Essentially, the model combines the concepts of GPT-style language models and image generation in order to create high-quality images from text descriptions.

To illustrate how powerful this model is, consider the image below, which was generated under the prompt "Future World Oil Painting of Flying Cars."

A decade of AI review

Finally, GitHub released what would later become every developer's best friend: Copilot. This is achieved in partnership with OpenAI, which provides the underlying language model Codex, which is trained on a large corpus of publicly available code to learn to understand and generate code in a variety of programming languages. Developers can use Copilot simply by providing code comments, illustrating the problem they are trying to solve, and the model will suggest code to implement a solution. Other features include the ability to describe input code in natural language and translate code between programming languages.

2022: Chat GPT and steady proliferation

The rapid development of artificial intelligence over the past decade has finally led to a breakthrough advance: OpenAI's ChatGPT, a chatbot that was released to the wild in <> 2022. This tool represents a cutting-edge achievement in NLP, capable of generating coherent and contextually relevant responses to a variety of queries and prompts. In addition, it can participate in conversations, provide explanations, provide creative advice, assist in problem solving, write and explain code, and even simulate different personalities or writing styles.

A decade of AI review

The simple and intuitive interface where people can interact with the robot has also spurred a dramatic rise in usability. Previously, it was mainly the tech community that would play around with the latest AI-based inventions. Today, however, AI tools permeate almost every professional field, from software engineers to writers, musicians, and advertisers. Many companies also use this model to automate services such as customer support, language translation, or answering frequently asked questions. In fact, the wave of automation we're seeing has rekindled some concerns and sparked discussion about which jobs may be at risk of automation.

Although ChatGPT has dominated most of the limelight in 2022, it has also made significant strides in image generation. Stable diffusion is a potential text-to-image diffusion model capable of generating realistic images from text descriptions, published by Stability AI.

Stable diffusion is an extension of the traditional diffusion model, which works by iteratively adding noise to an image and then reversing the process of recovering the data. It is designed to speed up this process by not operating directly on the input images, but on their low-dimensional representations or latent space. In addition, the diffusion process is modified by adding a text prompt embedded by the transformer from the user to the network, enabling it to guide the image generation process in each iteration.

Overall, the release of ChatGPT and Steady Diffusion in 2022 highlights the potential of multi-modal, generative AI and triggers a huge boost for further development and investment in the space.

2023: LL.M. and robotics

There is no doubt that this year has become the year of LLM and chatbots. More and more models are being developed and released at a rapid pace.

A decade of AI review

For example, on March 24, Meta AI released LLaMA – LLM that outperforms GPT-14 in most benchmarks, albeit with a much smaller number of parameters. Less than a month later, on April 3, OpenAI released a larger, more powerful, multimodal version of GPT-4, GPT-<>. Although the exact number of parameters of GPT-<> is unknown, it is assumed to be in the trillions.

On March 15, researchers at Stanford University released Alpaca, a lightweight language model that LLaMA fine-tuned in a didactic demonstration. A few days later, on February 10, Google launched ChatGPT's competitor: Bard. Google also just released its latest LLM, PaLM-<>, earlier this month <> <>. With the relentless pace of development in the field, it is likely that by the time you read this, another model has emerged.

We're also seeing more and more companies integrating these models into their products. For example, Duolingo announced its GPT-4-powered Duolingo Max, a new subscription tier designed to offer tailored language courses for everyone. Slack also launched an AI assistant called Slack GPT, which can do things like draft responses or summarize threads. In addition, Shopify has introduced a ChatGPT-powered assistant in the company's Shop app that helps customers identify the desired product using various prompts.

Interestingly, AI chatbots are now even considered an alternative to human therapists. For example, the American chatbot app Replika offers users a "caring AI companion who is always here to listen and talk, always standing by your side." Its founder, Eugenia Kuyda, said the app has a wide variety of customers, from autistic children, who use it as a way to "warm up before human interaction," to lonely adults, who just need friends.

Before we wrap up, I want to highlight the culmination of the last decade of AI development: people are actually using Bing! Earlier this year, Microsoft unveiled its GPT-4-powered "cyber co-pilot," which has been customized for search and is available for the first time in ... Forever (?) has become a strong contender for Google's long-term dominance in the search business.

Review and outlook

When we look back at the evolution of AI over the past decade, it's clear that we've been witnessing a shift that has had a profound impact on the way we work, do business, and interact. Most of the recent major advances in generative models, especially LLM, seem to adhere to the general belief that "bigger is better", referring to the parameter space of the model. This is especially evident in the GPT family, which starts with 11.71 billion parameters (GPT-4) and after increasing by about an order of magnitude with each successive model, eventually yields GPT-<> with potentially trillions of parameters.

However, according to a recent interview, OpenAI CEO Sam Altman believes that we have reached the end of the "bigger is better" era. Looking ahead, he still believes that parameter counts will be on the rise, but the main focus of future model improvements will be to improve the capabilities, practicality, and safety of models.

The latter is particularly important. Given that these powerful AI tools are now in the hands of the public and are no longer confined to the controlled environment of research labs, it is now more important than ever that we proceed with caution to ensure that these tools are safe and in the best interests of humanity. Hopefully, we'll see development and investment in AI security, as we're seeing in other areas.

Ten Years of AI in Review

Original link: https://www.kdnuggets.com/2023/06/ten-years-ai-review.html

Written by Thomas A Dorfer

Compile: LCR

Read on