laitimes

Neural Architecture Optimization (NAO): A new neural architecture search (NAS) algorithm

author:Lei Feng network

This article is a technical blog compiled by AI Workshop, original title:

Discovering the best neural architectures in the continuous space | Microsoft Research

Author | Fei Tian

Translate | Sun Zhihao 2

Proofreading | Sauce with parsley Finishing | Pineapple sister

Original link:

https://www.microsoft.com/en-us/research/blog/discovering-the-best-neural-architectures-in-the-continuous-space/

Neural Architecture Optimization (NAO): A new neural architecture search (NAS) algorithm

If you're a deep learning practitioner, you may find yourself often running into the same key question: Which neural network architecture should I choose for my current task? This decision depends on a variety of factors as well as the answers to many other questions. What should I choose for this layer---- convolution, deeply separable convolution, or maximum pooling? How big a core should the convolutional layer choose? 3*3 or 1*1 ? Which other node should be used as input to a recurrent neural network (RNN) node? These decisions are critical to the success of the architecture. If you're both an expert in neural network modeling and the specific task at hand, you might be able to easily find the answer. But what if you have limited experience in a certain area?

In this case, you might try Neural Architecture Search (NAS), an automated process where another machine learning algorithm guides the creation of better architectures based on previously observed architectures and how they behave. Thanks to NAS, we can find the best neural network architectures on widely used public datasets, such as ImageNet, without the need for human intervention.

However, existing methods of automatically designing neural network architectures—often based on augmented learning or evolutionary algorithms—require searching in exponentially growing discrete spaces, and my peers and I in the Microsoft Research Asia Machine Learning Group devised a simplified, more efficient approach based on optimization within continuous spaces. With our new approach, called Neural Architecture Optimization (NAO), we use a gradient-based approach to optimize in tighter spaces. This work was participated in this year's Neural Information Processing Systems Conference (NeurIPS)

<h2>A key component of NAO</h2>

Driving NAO for gradient-based optimization in continuous spaces is based on the following three components:

An encoder that converts a discrete neural network architecture into a vector of continuous values, also known as an embedding model

A result estimation function that takes a vector as input and produces a numeric value as a representation of the schema (e.g., accuracy)

A decoder that restores continuous value vectors to the network architecture

These three components are trained together. After we complete the training, starting with an architecture x, we use encoder E to convert x into a vector representation ex, and then convert ex into a new embedded ex' (such as the green line representation) by the gradient direction given by the result estimation function f. Since we are doing gradient rise, as long as the step is small enough, we can guarantee f(ex') &gt; = f(ex). Finally, we use decoder D to transform ex' into a discrete architecture x'. In this way, we get a probably better schema x'. by constantly updating the schema like this. We get the final architecture, which should perform at its best.

Neural Architecture Optimization (NAO): A new neural architecture search (NAS) algorithm

Figure 1: The nao process

<h2>There are limited resources to achieve good results</h2>

We did follow-up experiments to verify the effectiveness of NAO's auto-discovery of the best neural architecture. Table 1 (below) shows how different convolutional neural network (CNN) architectures behave on the CIFAR-10 picture classification dataset, which are generated by different NAS algorithms. As we can see from the table, the networks found with NAO get the lowest error rates. In addition, by combining NAO with a weight-sharing mechanism (called NAO-WS), we get a significant increase in search speed. Weight sharing can reduce the computational cost of network architecture search by allowing multiple network structures to share the same parameter. In our experiments, we used an image processor (GPU) to get a CNN architecture in 7 hours, reaching an error rate of 3.53. With weight sharing, we don't have to train other different neural networks from scratch.

Table 2 (below) summarizes the results of the PTB language model. Lower Perplexity indicates better performance. Again, the RNN architecture we found with NAO yielded good results with limited computing resources.

By optimizing on continuous spaces, NAOs get better results, and they search directly in discrete architecture space than existing NAS methods. As for future applications, we plan to use NAO to search for architectures for other important AI tasks, such as neural machine translation. Equally important, simpler and more efficient automatic neural architecture designs can make machine learning techniques available to people at all stages.

Neural Architecture Optimization (NAO): A new neural architecture search (NAS) algorithm

Table 1: CIFAR-10 classification results

Neural Architecture Optimization (NAO): A new neural architecture search (NAS) algorithm

Table 2: PTB language model results

Want to continue checking out links and references to this article?

Long press the link to open or click on Neural Architecture Optimization (NAO): New Neural Architecture Search (NAS) Algorithm:

https://ai.yanxishe.com/page/TextTranslation/1335

AI Workshop updates wonderful content every day to watch more wonderful content: Lei Feng Network Lei Feng Network Lei Feng Net

<h3>Five great CNN architectures</h3>

An article that takes you through computer vision

<h3>Do deep learning with Pytorch (part one</h3>).

<h3>Python advanced tip: Cut your memory footprint in half with one line of code</h3>

Waiting for you to translate:

<h3>(Python) 3D face processing tool face3d</h3>

<h3>25 speech research datasets that can be placed in a data lake</h3>

<h3>How to stand out in a data science interview </h3>

<h3>The basic concepts, architecture, and use cases of Apache Spark SQL and DataFrame</h3>

Read on