Where have researchers gone in solving the traveling salesman problem with deep learning?

Excerpted from chaitjo's blog

Written by Chaitanya K. Joshi, Rishabh Anand

Machine Heart Compilation

Machine Heart Editorial Department

Recently, the development of neural network-driven solvers for combinatorial optimization problems such as traveling salesmen has attracted great interest from the academic community. This blog post describes a neural combination optimization step that unifies several recently proposed model architectures and learning paradigms into a single framework. Through this series of steps, the authors analyze the latest advances in deep learning in routing problems and provide new directions to inspire future research to create real value.

Where have researchers gone in solving the traveling salesman problem with deep learning?

Context for combinatorial optimization problems

Combinatorial optimization is a practical area at the intersection of mathematics and computer science designed to solve the problem of constrained optimization that is difficult to NP. The challenge of the NP difficult problem is that the exhaustive search for the solution of the NP difficult problem is beyond the limitations of modern computers, so it is impossible to optimally solve the NP difficult problem at large scale.

Why should we care about this? Because robust and reliable approximation algorithms for popular problems have great practical application value and are also the backbone of modern industry. For example, the Traveling Salesman Problem (TSP) is the most popular combination optimization problem (COP) that occurs in applications ranging from logistics and scheduling to genomics and systems biology.

The traveling salesman problem is so famous, or rather difficult to overcome, that there are even specialized xkcd comics!

TSP and routing issues

TSP is also a classic example of a routing problem— a type of COP that requires a series of nodes (such as cities) or edges (such as roads between cities) to traverse in a specific order, while satisfying a set of constraints or optimizing a set of variables. The TSP requires that a set of edges be traversed in the order in which all nodes are accessed once. From an algorithmic point of view, the best "travel" route for our salespeople is a series of selected edges that satisfy the minimum distance or time in the Hamiltonian cycle, as illustrated in Figure 1.

Figure 1: The TSP asks the following question: Given a list of cities and the distance between each pair of cities, what is the shortest route for a salesperson to visit each city and return to the departure city? (Source: MathGifs)

In both real-world and real-world scenarios, routing problems or vehicle routing problems (VRPs) can involve challenging constraints beyond the ordinary TSP. For example, a TSP with a time window (TSPTW) adds a Time Window constraint to a node in a TSP graph. This means that some nodes can only be accessed at regular intervals. Another variant is the Capacity Vehicle Routing Problem (CVRP), which is designed to find the best route for a fleet of vehicles (i.e., multiple salespeople) visiting a group of customers (i.e., cities), each with maximum carrying capacity.

Figure 2: TSP and related vehicle path problem categories. The conditions of the VRP constraints differ from those of the TSP, and the plot presents those constraints that are relatively well studied. There may be VRP-like problems in the real world with more complex and non-standard constraints! (Source: Adapted from Benslimane and Benadada, 2014)

Solve routing problems with deep learning

Developing reliable algorithms and solvers for routing problems requires a lot of expert intuition and years of trial and error. For example, Concorde, the most advanced TSP solver in linear programming, cutting plane algorithms, and branch delimitation problems, took more than 50 years to get; this is an inspiring video about its history (https://www.youtube.com/watch?v=q8nQTNvCrjE). Concorde can find optimal solutions for up to tens of thousands of nodes, but the execution time is extremely long. As readers imagine, designing algorithms for complex VRPs can be more challenging and time-consuming, especially under real-world constraints such as blended capacity or time window problems.

As a result, the machine learning community began to focus on the following issues:

Can we use deep learning to automate, or even enhance, the expert intuition process needed to solve COP?

For a more in-depth motivation, see this fascinating survey by Mila: https://arxiv.org/abs/1811.06128

Neural combination optimization

If the COP problem is likened to a nail, then neural combination optimization can be said to be a hammer that tries to solve the problem using deep learning methods. After the neural network is trained, it can learn directly from the problem instance itself to produce an approximate solution to the COP. This series of research began with Google Brain's pioneering Paper on Seq2seq Pointer Networks and the Use of Reinforcement Learning to Optimize Neural Combinations. Today, graph neural networks are often the core architectural choice for deep learning-driven solvers because they solve these problems with the related graph structures.

Neural combination optimization aims to improve traditional COP solvers by:

Non-manual heuristics. Neural networks do not require application experts to manually design heuristics and rules, but instead learn these heuristics and rules by mimicking the best solver or through reinforcement learning (an example is shown in the next section).

GPU fast inference. For large-scale problems, traditional solvers typically take long execution times, such as Concorde's 7.5-month solution to a maximum TSP of 109,399 nodes. On the other hand, once the neural network used to approximate cops is trained, it has a significantly favorable temporal complexity when used and can be parallelized by the GPU. This makes them ideal for solving real-time decision-making problems, especially routing issues.

Respond to novel and under-researched COP. Neural combination optimization can significantly speed up the development of specific COP solvers for new or unprovoked problems with esoteric constraints. Such problems often arise in scientific-grade discoveries or computer architectures, and an exciting success story is Google's chip design system that will power the next generation of TPUs. You read that right – the next TPU chip to run a neural network was designed by a neural network!

Neural combination optimization steps

Using TSP as a typical example, we now propose a general neural combination optimization step that can be used to characterize several routing problems driven by modern deep learning.

State-of-the-art TSP methods take the original coordinates of the city as input and use GNN or Transformer in conjunction with classic graph search algorithms to constructively construct approximate solutions. Its architecture can be roughly divided into: (1) autoregressive models, which build solution sets in a step-by-step manner; and (2) non-autoregressive models, which produce all the solutions at once. Models can be trained to mimic the best solver by supervised learning or by minimizing the length of TSP traversals through reinforcement learning.

Figure 3: Neural combination optimization steps (Source: Joshi et al., 2021).

The 5-phase steps proposed by Joshi et al. in 2021 integrate prominent model architectures and learning paradigms into a unified framework. This step will allow us to dissect and analyze the latest developments in deep learning in routing problems and provide new directions for future research.

The first step is to define the problem through a diagram

Figure 4: Problem Definition: The TSP is defined through a full connectivity graph of cities/nodes, which can be further sparsed.

TSPs are defined through a full connectivity graph where nodes correspond to cities and edges represent roads between them. The graph can be sparsed by heuristics such as the k-nn nearest neighbor algorithm. This enables the model to scale to large instances where paired computations at all nodes are difficult to handle [Khalil et al., 2017], or to learn faster by reducing the search space [Joshi et al., 2019].

Step 2: Get the hidden space embeddings of graph nodes and edges

Figure 5: Graph embedding: The embedding of each graph node is obtained using a graph neural network encoder that constructs local structural features by recursively aggregating features from each node's neighbors.

The GNN or Transformer encoder takes each node and edge in the TSP graph, or chooses one of the two, as input to compute the hidden spatial representation or embedded feature. In each layer, nodes collect features from their neighbors and then represent the local graph structure through recursive messaging. Once the L layers are stacked, the network can build the characteristics of the nodes from the L-hop neighborhood of each node.

Anisotropy and attention-based GNNs such as Transformers [Deudon et al., 2018, Kool et al., 2019] and Gated Graph ConvNets [Joshi et al., 2019] have become the default choice for coded routing problems. The attention mechanism during neighborhood aggregation is critical because it allows each node to weigh its neighbor nodes against its relative importance to solving the task at hand.

Importantly, the Transformer encoder can be seen as an attention GNN on a fully connected graph, known as a graph attention network (GAT). See this blog post for an intuitive explanation.

Step 3 and 4: Convert the embedding to a discrete solution

Figure 5: Decoding and searching: Assigning each node or edge a probability belonging to a solution set (here, MLP predicts each edge to obtain a "heat map" of the edge probability), which is then converted to the classic graph search techniques in discrete decision making, such as greedy search or bundle search.

Once the nodes and edges of the graph are encoded as hidden spatial representations, we must decode them into discrete TSP resolution. Specifically, this can be done in a two-step process: first, assigning probabilities to each node or edge to add nodes or edges to the solution set, either independently of each other (i.e., non-autoregressive decoding) or conditionally through graph traversal (i.e., autoregressive decoding). Next, the predicted probabilities are transformed into discrete decisions using classic graph search techniques such as greedy or bundle searches led by probability predictions (we'll talk more about graph search later when we discuss recent trends and future directions).

The choice of decoder requires a trade-off between data efficiency and implementation efficiency: autoregressive decoder [Kool et al., 2019] converting TSPs into Seq2Seq models, or language translation tasks based on an ordered set of out-of-order urban nodes. They explicitly simulated the sequential induction bias of the routing problem by selecting one node at a time. The non-autoregressive decoder [Joshi et al., 2019], on the other hand, sees TSPs as a task for generating edge probability heat maps. The NAR method is significantly faster and better suited for real-time reasoning because it generates predictions one-time rather than step-by-step. However, the NAR method ignores the sequentiality of TSPs, and training efficiency may be lower than AR decoding [Joshi et al., 2021].

Step 5: Model training

Finally, the entire encoder-decoder model is trained in an end-to-end manner, just like a deep learning model for computer vision or natural language processing. In the simplest case, a model can be trained to produce a near-optimal solution by mimicking the optimal solver (i.e., through supervised learning). For TSPs, the Concrode solver is used to generate a labeled training dataset for millions of random instances of the best travel routes. Models with AR decoders are trained in a teacher-forcing mode to output the optimal travel sequence for nodes [Vinyals et al., 2015], while models with NAR decoders are trained to identify edges that traverse during travel from untraversed edge sets [Joshi et al., 2019].

However, creating labeled datasets for supervised learning is an expensive and time-consuming process. Especially for large-scale problem instances, the guarantee of accuracy of the best solver may no longer exist, which can lead to inaccurate solutions for supervised training. From a practical and theoretical point of view, this is far from the ideal approach [Yehuda et al., 2020].

Reinforcement learning is often an elegant alternative to under-studied problems in the absence of standard solutions. Since routing problems often require sequential decisions to minimize problem-specific cost functions (such as the length of travel for a TSP), they can be gracefully thrown into the RL framework, which trains agents to maximize the reward function (the negative value of the loss function). Models with AR decoders can be trained by the standard strategy gradient algorithm [Kool et al., 2019] or Q learning [Khalil et al., 2017].

A brief summary of the results in each phase

We can describe outstanding results in TSP deep learning in 5-phase steps. Recall that the steps include: (1) problem definition (2) graph embedding(3) decoding (4) solution search (5) strategy learning. The table below begins with a paper on pointer networks published by Oriol Vinyals and his collaborators, with red highlighting the paper's major innovations and contributions.

Recent developments and avenues for future work

With the unified 5-phase step in place, we'll focus on some of the latest developments and trends in deep learning routing issues. It will also provide some future research directions, focusing on how to improve the ability to generalize large-scale and real-world instances.

Take advantage of equal variance and symmetry

One of the most influential early works, the Autoregressive Attention Model [Kool et al., 2019] sees TSP as a language translation problem that can be solved with Seq2Seq and constructs the TSP travel sequence as an urban permutation. A direct disadvantage of the formula is that it does not take into account the potential symmetry of the routing problem.

Figure 6: In general, TSP has a unique optimal solution (L). However, under autoregressive formulas, there are multiple optimal permutations (R) when the solution is represented as a sequence of nodes. (Source: Kwon et al., 2020)

POMO: Policy Optimization with Multiple Optima [Kwon et al., 2020] suggests utilizing the invariance of the starting city in a constructive autoregressive formula. They trained the same attention model as before, but the difference was that they used a new reinforcement learning algorithm (step 5 of the steps above) that took advantage of multiple optimal travel permutations.

Figure 7: After rotation, reflection, and transformation, the TSP solution of the Euclidean symmetry group of city coordinates remains unchanged. Incorporating these symmetries into the model may be a principled approach to solving large-scale TSPs.

Similarly, Ouyang et al. upgraded their attention model in 2021 to take into account the invariance of rotation, reflection, and translation (i.e., Euclidean symmetry groups) of the input city coordinates. They propose an autoregressive approach to ensure invariance by simultaneously performing data enhancement during the problem definition phase (step 1) and using relative coordinates during graph encoding (step 2). Their zero-sample generalization experiments on the TSPLib dataset, from random instances to the real world, showed that their model worked well.

Future work may follow the Geometric Deep Learning (GDL) blueprint for architectural design. GDL tells us to explicitly consider symmetry and inductive biases in the data or problem and combine them. Because routing problems need to be embedded in Euclidean coordinates, and routing is circular, incorporating these constraints directly into the model architecture or learning paradigm may be a principled approach that can improve the generalization ability of larger instances compared to larger instances during training.

Improved graph search algorithm

Another influential research direction is the one-time non-autoregressive graph convolutional network method [Joshi et al., 2019]. Several recent papers have proposed keeping the same gated GCN encoder (step 2) while replacing the beam search component (step 4) with a more powerful and flexible graph search algorithm, such as dynamic programming [Kool et al., 2021] or Monte Carlo tree search (MCTS) [Fu et al., 2020].

Figure 8: Gated GCN encoders [Joshi et al., 2019] can be used to generate edge prediction "heat maps" (transparent red) for TSPs, CVRP, and TSPTW. These can be further processed by DP or MCTS to output routes (solid colors). GCN essentially reduces the solution search space for complex search algorithms that can be difficult to process when searching for all possible routes. (Source: Kool et al., 2021)

The GCN+MCTS framework proposed by Fu et al. has a very interesting approach that can effectively train models on small TSP problems and successfully transfer learned strategies to larger graphs in a zero-sample manner (similar to the GCN+ beam search method originally explored by Joshi et al.). They updated the problem definition (step 1) to ensure that the GCN encoder's predictions could still be generalized as the TSP changed from small to large: larger problem instances were represented as many smaller subgraphs that were the same size as the GCN's training plot, and then merged the edge predictions of the GCN before performing the MCTS.

Figure 9: The GCN + MCTS framework [Fu et al., 2020] represents large TSP problems as a set of smaller subgraphs of the same size as the GCN used for training. Combine the edge heat maps of the subgraph predicted by GCN together to obtain a heat map of the original map. This divide-and-conquer approach ensures that GCN embedding and prediction work well from smaller instances to larger instances. (Source: Fu et al., 2020)

This divide-and-conquer strategy was first proposed by Nowak et al. in 2018 to ensure that GNN embedding and prediction work well to generalize TSP instances (up to 10,000 nodes) from small to larger. Combining GNN, divide and conquer, and search strategies to handle large-scale CVRP problems with up to 3,000 nodes is also endless. [Li et al., 2021]。

Overall, this series of work shows that stronger coupling between the model's neurons and the design of the symbolic/search component is critical for out-of-distribution generalization [Lamb et al., 2020]. However, it's also worth noting that the highly customized and parallelized designs that implement graph search on GPUs can be a challenge for every new problem.

Learn to improve sub-optimal solutions

More recently, starting with the work of Chen et al. in 2019 and the work of Wu et al. in 2021, many papers explored alternatives to constructive AR and NAR solutions, including iteratively improved (suboptimal) solution learning or local search learning. Other notable papers include Cappart et al., 2021, da Costa et al., 2020, Ma et al., 2021, Xin et al., 2021 and Hudson et al., 2021.

Figure 10: Learn to improve the architecture of a suboptimal TSP solution by guiding decisions in a local search algorithm. (a) The original Transformer encoder-decoder architecture [Wu et al., 2021], which uses sinusoidal position encoding to represent the current suboptimal travel arrangement; (b) Ma et al., 2021 is further upgraded by symmetry problems: a double-ended Transformer encoder with learnable position encoding- decoder capable of capturing the cyclic nature of TSP travel; (c) visualization of sinusoidal curves and periodic positional coding.

In all of this work, since deep learning is used to guide decisions in classical local search algorithms that are designed to work regardless of the size of the problem, this approach implicitly results in better zero-sample generalization of larger problem instances than constructive approaches. In practice, this is a very desirable property, as training on very large or real-world TSP instances can be tricky.

Notably, NeuroLKH [Xin et al., 2021] uses edge probability heat maps generated by GNN to improve the classic Lin-Kernighan-Helsgaun algorithm and demonstrates powerful zero-sample generalization capabilities for TSPs with 5,000 nodes and instances across pairs of TSPLib datasets.

One of the limitations of this work is that local search algorithms require prior manual design, which may be missing for new or under-studied problems. On the other hand, by implementing constraints in the decoding and search process, a constructive approach can be said to be easier to adapt to new problems.

Promote a generalized learning paradigm

Future work could look at new learning paradigms (step 5) that explicitly focus on generalization beyond supervised and reinforcement learning, such as Hottung et al., 2020 exploring autoencoder goals to learn the continuous space of routing problem solutions, and Geisler et al., 2021 training neural solvers to be robust to adversarial perturbations.

Currently, most papers recommend effectively training models on very small random TSPs and then transferring learned strategies to larger graphs and real-world instances in a zero-sample fashion. The logical next step is to fine-tune the model on a small number of specific problem instances. Hottung et al., 2021 took the first step in 2021 by suggesting fine-tuning a subset of model parameters for each specific problem instance by actively searching. In future work, it may be interesting to explore fine-tuning as a meta-learning problem, the goal of which is to train model parameters for rapid adaptation to new data distributions and problems.

Another interesting direction to explore is to solve an under-studied routing problem with challenging constraints by pre-training on popular routing problems such as TSPs and CVPR, and then fine-tuning for specific problems. Similar to language modeling in natural language processing as a pre-trained target, the goal of routing pre-training is to learn potential representations that would normally be useful and can be well transferred to new routing problems.

Improved evaluation protocol

In addition to algorithmic innovations, the community has repeatedly called for more realistic evaluation protocols that can drive advances in real-world routing problems and industry's implementation [Francois et al., 2019, Yehuda et al., 2020]. Recently, Accorsi et al., 2021 provides an authoritative set of guidelines for experimental design and comparison with classical operations research (OR) techniques. They hope that a fair and rigorous comparison of standardized benchmarks will be the first step in integrating deep learning techniques into industrial routing solvers.

Overall, it's encouraging that recent papers not only show a slight performance improvement for tiny random instances of TSP, but also use real-world benchmark datasets such as TSPLib and CVPRLib. This collection of routing problems contains diagrams from global urban and road networks and their precise solutions, and has become a standard testbed for new solvers in the OR community.

At the same time, we must not "overfit" on the first n instances of TSPLib or CVPRLib that are used in other papers. Thus, better synthetic datasets are closely related to fair benchmarking progress, such as Queiroga et al., 2021 (https://openreview.net/forum?id=yHiMXKN6nTl) recently proposed a new library synthesizing 10,000 CVPR test instances.

Figure 11: Following community competitions such as ML4CO can effectively track research progress. (Source: ML4CO website). Regular competitions for newly constructed real-world datasets, such as the ML4CO competition for NeurIPS 2021 and the AI4TSP for IJCAI 2021, are also an effective means of tracking progress at the intersection of deep learning and routing issues.

We strongly appeal to youtube for valuable panel discussions and presentations from ML4CO, NeurIPS 2021.

summary

This blog post describes a series of neural combination optimization steps that unify recent papers on deep learning into a single framework. Then, through the lens of this framework, we analyze and dissect recent research advances and predict the direction of future research.

The last point I'd like to make is that the deeper research motivation for neural composition optimization may not be to outperform classical methods on well-studied routing problems. Neural networks can be used as a versatile tool for solving previously unexplained NP-intractable problems, especially those that are not trivial to design heuristics. We marvel at the recent applications of neural combination optimization in the design of computer chips, optimizing communication networks, and genome reconstruction, and look forward to more valuable applications in the future!

Where have researchers gone in solving the traveling salesman problem with deep learning?

Read on