深度學習的IR“之争”

熟悉編譯器的同學應該對上圖并不陌生。它就是大名鼎鼎的LLVM的logo。Google Tensorflow XLA (Accelerated Linear Algebra)就使用了LLVM IR（Intermediate Representation）。而它的“競争對手”，剛剛釋出的TVM/NNVM，則是“Tensor IR Stack for Deep Learning Systems”。IR是什麼？為什麼重要？我們一起來看看。

上周，我們看到這樣的新聞“Facebook and Microsoft introduce new open ecosystem for interchangeable AI frameworks”。這也讓Framework之争更加熱鬧。簡單來說，ONNX也是為了解決目前多個Framework互操作的問題。但有趣的是，這個“開放”的系統看起來更像是微軟和FB連合對抗Google。目前Tensorflow的占有率已經領先不少，其它的Framework肯定也不希望看到Tensorflow一家獨大，畢竟Framework是做deep learning的一個“入口”。最近PyTorch的勢頭不錯，Caffe2, PyTorch和Cognitive Toolkit通過這種方式“聯合”，似乎也是個不錯的選擇。

“An Intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive for further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.” - Wikipedia

我們還是從目前Deep Learning的一個現實問題說起吧。

上圖來自介紹NNVM的一篇文章[1]。文中在談到NNVM的目标的時候，是這麼說的：

“This is a new interesting era of deep learning, with emergence trend of new system, hardware and computational model. The usecase for deep learning is more heterogeneous, and we need tailored learning system for our cars, mobiles and cloud services. The future of deep learning system is going to be more heterogeneous, and we will find emergence need of different front-ends, backends and optimization techniques. Instead of building a monolithic solution to solve all these problems, how about adopt unix philosophy, build effective modules for learning system, and assemble them together to build minimum and effective systems?”

簡單來說，現在Deep Learning有這麼多不同前端（framework），有這麼多不同的後端（hardware），是否能找到一個橋梁更有效實作他們之間的優化和影射呢？

實際上這個問題并不新鮮。當年，随着不同的應用場景和需求，出現了大量不同的程式設計語言和不同的處理器架構，軟體産業也遇到過類似的問題。

換句話說，這也正是重演了LLVM出現時的場景：大量不同的程式設計語言和越來越多的硬體架構之間需要一個橋梁。LLVM的出現，讓不同的前端後端使用統一的 LLVM IR ,如果需要支援新的程式設計語言或者新的裝置平台，隻需要開發對應的前端和後端即可。同時基于 LLVM IR 我們可以很快的開發自己的程式設計語言。比如，LLVM建立者Chris Lattner後來加入了Apple，又建立了Swift語言，可以看作是LLVM的前端。

由此也可以看出，LLVM統一的IR是它成功的關鍵之一，也充分說明了一個優秀IR的重要性。

當然，IR本質上是一種中間表示形式，是一個完整編譯工具的一部分。而我們下面讨論的TVM，XLA都是圍繞特定IR建構的優化和編譯工具。

陳天奇在另一篇文章中提到：“...對于深度學習，我們需要類似的項目。學習 LLVM 的思想，我們将其取名 NNVM”。(2016年10月)

8月17号，陳天奇的團隊又釋出了TVM：An End to End IR Stack for Deploying the Deep Learning Workloads to Hardwares[2]，其架構如下圖所示：

We adopt a common philosophy from the compiler community and provide two intermediate representation layers to efficiently lower high-level deep learning algorithms down to a multitude of hardware back-ends.

可以看出，他們在之前的NNVM之外上增加了一個新的IR Stack，TVM，試圖解決下圖所示的Gap,“A lot of powerful optimizations can be supported by the graph optimization framework. ...However we find that the computational graph based IR alone is not enough to solve the challenge of supporting different hardware backends. ”這裡的graph based IR則是指NNVM。

我們知道，在LLVM環境中，隻有一個統一的IR。那麼，為什麼Deep Learning環境中graph based IR還不夠呢？在随後的一篇知乎文章中[3]，陳天奇提到了去年10月知乎上關于“如何評價陳天奇的子產品化深度學習系統NNVM？”的讨論[4]。而這個讨論中王健飛的回答似乎是TVM産生的靈感之一。

同樣在這篇文章當中，陳天奇還提到，“TVM和已有的解決方案不同，以XLA作為例子，TVM走了和目前的XLA比更加激進的技術路線，TVM可以用來使得實作XLA需要的功能更加容易 ”。

既然TVM的作者點了對手的名，我們就來看看Google的XLA吧。

XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra that optimizes TensorFlow computations. The results are improvements in speed, memory usage, and portability on server and mobile platforms. Initially, most users will not see large benefits from XLA, but are welcome to experiment by using XLA via just-in-time (JIT) compilation or ahead-of-time (AOT) compilation. Developers targeting new hardware accelerators are especially encouraged to try out XLA.

下圖左半部分來自“2017 EuroLLVM Deveopers’ Meeting”上的一個報告[6]，比較清楚介紹了XLA的目标，其基本功能也是優化和代碼生成。

XLA具體的架構如圖右半部分所示，可以看出，它也是兩層優化的結構[5]，使用LLVM用作low-level IR, optimization, and code-generation。由于使用了LLVM IR, 他可以比較容易的支援不同的後端（Backend）。下圖就是使用GPU Backend的例子。

對于目前不直接支援的後端，XLA給出了三種場景的開發方法。包括：

Existing CPU architecture not yet officially supported by XLA, with or without an existing LLVM backend.

Non-CPU-like hardware with an existing LLVM backend.

Non-CPU-like hardware without an existing LLVM backend.

總的來說，XLA和TVM試圖解決的問題類似。但XLA隻是針對Google的Tensorflow的。而TVM/NNVM雖然是MxNe陣營，但試圖作為一個開發和公共的接口。

這裡插播一個新聞，Chris Lattner最近加入了Google Brain。雖然還不知道他的主要工作是不是會放在XLA這裡，但是他和Jeff Dean配合，确實是令人生畏。

由于我自己并沒有親自使用過這兩個工具，是以也不能給出更準确的評價和對比。對具體細節感興趣的讀者可以好好看看Reference的内容，并且親自嘗試一下。

其實，類似的想法還包括：Intel’s NGraph（如下圖），HP的Cognitive Computing Toolkit (CCT)， IBM的SystemML。

而在剛剛結束的Hot Chips會議上，Microsoft釋出了Project Brainwave，Cloud的AI FPGA加速平台。它的工具鍊是這樣的，是不是又看到了兩層IR？

最後，最近還看到另一個有趣的嘗試：Khronos Neural Network Exchange Format (NNEF)，試圖定義一種标準的資料交換格式。“The NNEF standard encapsulates neural network structure, data formats, commonly used operations (such as convolution, pooling, normalization, etc.) and formal network semantics. ”

T.S.：

随着Deep Learning的應用越來越廣，大家越來越關心DNN在不同硬體架構上Training和Inference的實作效率。參考傳統編譯器（compiler）設計的經驗，XLA和TVM/NNVM都開始了很好的嘗試。而“IR”的競争，将是未來Framework之争的重要一環。

Reference：

原文釋出時間為：2017-09-13

本文作者：唐杉

本文來自雲栖社群合作夥伴新智元，了解相關資訊可以關注“AI_era”微信公衆号

深度學習的IR“之争”

繼續閱讀

TensorFlow運作模型——會話

【Ubuntu-Tensorflow】TF1.0到TF1.2出現“Key LSTM/basic_lstm_cell/bias not found in checkpoin”問題

linux下的conda安裝tensorflow

Linux環境下 TensorFlow的安裝和使用基于Anaconda的tensorflow安裝

MindSpore儲存模型的格式疑惑

webstorm中配置git

【Tensorflow】Tensorflow介紹

Vue圖檔切換過渡設計

測試工程師面試解析～

2018年不想被web前端開發淘汰，你需要掌握哪些技術？

鸢尾花分類

利用tensorflow建構AlexNet模型，實作小數量級的貓狗分類（隻有train）

ImportError: libcublas.so.10.0: cannot open shared object file: No such file解決方法

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file

K-近鄰算法以及圖像分類應用