NVIDIA顯示卡cuda的多程序服務——MPS(Multi-Process Service)

相關内容：

tensorflow1.x——如何在C++多線程中調用同一個session會話

tensorflow1.x——如何在python多線程中調用同一個session會話

官方的技術文檔：

MULTI-PROCESS SERVICE

=============================================

NVIDIA顯示卡在進行CUDA運算時一個時刻下隻能運作一個context的計算，一個context預設就是指一個CPU程序下對CUDA程式進行調用時在NVIDIA GPU端申請的資源和運作資料等。

也就是說預設情況下CUDA運算時一個GPU上在一個時刻内隻能運作一個CPU程序下的調用，也就是說GPU上預設不能實作任務的并發，而是并行。

但是如果顯示卡支援HYPER-Q功能，在開啟mps服務後可以實作多個CPU程序共享一個GPU上的context，以此實作GPU上的多個程序并發執行，進而實作GPU上一個時刻下有大于1的CPU程序的調用在執行。

特别注意：

1. mps服務不能單獨為某個顯示卡進行設定，該服務的開啟意味着所有NVIDIA cuda顯示卡均開啟mps服務。

2. mps服務需要sudo權限進行開啟，mps服務的關閉指令往往失效，需要手動的sudo kill pid号

3. mps服務是使用者獨顯的(如果是多顯示卡主機，mps開啟後多個顯示卡都被單使用者獨占cuda)，也就是說一個顯示卡上運作了某使用者的nvidia-cuda-mps-server程序，那麼該顯示卡上隻能運作該使用者的cuda程式，而其他的使用者的程序則被阻塞不能執行，隻有等待上個使用者的所有cuda任務結束并且該使用者的nvidia-cuda-mps-server程序退出才可以啟動下個使用者的nvidia-cuda-mps-server程序然後運作下個使用者的cuda程序。需要注意這裡說的任務結束并不是指分時系統的調配而是指一個程序的cuda調用結束。

從上面的mps特點上我們可以看到mps服務隻适合于單使用者獨占某塊顯示卡，并且在該顯示卡上運作多個cuda程序的任務，可以說mps服務是單使用者獨占顯示卡的一種服務。正是由于mps服務的特點導緻該服務在實際的生産環境下較少被使用，不過mps服務對于個人cuda使用者來說還是非常不錯的選擇。

多核心GPU和多核心CPU在運算原理上有很大不同，多核心CPU可以在一個時刻運作多個程序，而多核心GPU在一個時刻隻能運作一個程序，mps服務就是為提高gpu使用效率而設計的，開啟mps後一個gpu上可以在一個時刻内運作多個程序的cuda調用（多個程序可能隻是部分生命周期可以重疊并發執行），但是要求是這些程序必須屬于同一個使用者，而且隻有當gpu上沒有其他使用者的cuda程式後才可以允許其他使用者調用該GPU的cuda運算。

對于多使用者的linux的cuda系統來說mps服務不可用，但是對于單使用者的linux系統，mps服務可以大幅度提高單卡多程序的運作效率。

mps服務的開啟指令：

sudo nvidia-cuda-mps-control -d

需要注意的是如果你是多顯示卡主機，該指令意味為所有顯示卡均開啟mps服務，mps服務不能單獨指定顯示卡。

mps服務的檢視指令：

ps -ef | grep mps

mps服務的關閉指令：

sudo nvidia-cuda-mps-control quit

需要注意的是該指令并不能強制關閉mps服務，如果檢視mps服務沒有被被關閉則需要使用sudo kill 程序号。

mps服務的幫助文檔：

nvidia-cuda-mps-control(1)                                              NVIDIA                                             nvidia-cuda-mps-control(1)

NAME
       nvidia-cuda-mps-control - NVIDIA CUDA Multi Process Service management program

SYNOPSIS
       nvidia-cuda-mps-control [-d | -f]

DESCRIPTION
       MPS is a runtime service designed to let multiple MPI processes using CUDA to run concurrently in a way that's transparent to the MPI program.
       A CUDA program runs in MPS mode if the MPS control daemon is running on the system.

       When CUDA is first initialized in a program, the CUDA driver attempts to connect to the MPS control daemon. If the connection  attempt  fails,
       the  program  continues  to  run as it normally would without MPS. If however, the connection attempt to the control daemon succeeds, the CUDA
       driver then requests the daemon to start an MPS server on its behalf. If there's an MPS server already running, and the user id of that server
       process  matches  that  of  the requesting client process, the control daemon simply notifies the client process of it, which then proceeds to
       connect to the server. If there's no MPS server already running on the system, the control daemon launches an MPS server with the same user id
       (UID) as that of the requesting client process. If there's an MPS server already running, but with a different user id than that of the client
       process, the control daemon requests the existing server to shutdown as soon as all its clients are done. Once the existing server has  termi‐
       nated, the control daemon launches a new server with the user id same as that of the queued client process.

       The MPS server creates the shared GPU context, and manages its clients.  An MPS server can support a finite amount of CUDA contexts determined
       by the hardware architecture it is running on. For compute capability SM 3.5 through SM 6.0 the limit is 16 clients per GPU at a time. Compute
       capability SM 7.0 has a limit of 48. MPS is transparent to CUDA programs, with all the complexity of communication between the client process,
       the server and the control daemon hidden within the driver binaries.

       Currently, CUDA MPS is available on 64-bit Linux only, requires a device that supports Unified Virtual Address (UVA) and has compute  capabil‐
       ity  SM  3.5  or  higher.   Applications requiring pre-CUDA 4.0 APIs are not supported under CUDA MPS. Certain capabilities are only available
       starting with compute capability SM 7.0.

OPTIONS
   -d
       Start the MPS control daemon in background mode, assuming the user has enough privilege (e.g. root). Parent process exits when control  daemon
       started listening for client connections.

   -f
       Start  the  MPS control daemon in foreground mode, assuming the user has enough privilege (e.g. root). The debug messages are sent to standard
       output.

   -h, --help
       Print a help message.

   <no arguments>
       Start the front-end management user interface to the MPS control daemon, which needs to be started first. The front-end UI keeps reading  com‐
       mands  from  stdin until EOF.  Commands are separated by the newline character. If an invalid command is issued and rejected, an error message
       will be printed to stdout. The exit status of the front-end UI is zero if communication with the daemon is successful. A non-zero value is re‐
       turned  if the daemon is not found or connection to the daemon is broken unexpectedly. See the "quit" command below for more information about
       the exit status.

       Commands supported by the MPS control daemon:

       get_server_list
              Print out a list of PIDs of all MPS servers.

       start_server -uid UID
              Start a new MPS server for the specified user (UID).

       shutdown_server PID [-f]
              Shutdown the MPS server with given PID. The MPS server will not accept any new client connections and it exits when all current clients
              disconnect.  -f  is  forced  immediate  shutdown.  If a client launches a faulty kernel that runs forever, a forced shutdown of the MPS
              server may be required, since the MPS server creates and issues GPU work on behalf of its clients.

       get_client_list PID
              Print out a list of PIDs of all clients connected to the MPS server with given PID.

       quit [-t TIMEOUT]
              Shutdown the MPS control daemon process and all MPS servers. The MPS control daemon stops accepting new clients while waiting for  cur‐
              rent MPS servers and MPS clients to finish. If TIMEOUT is specified (in seconds), the daemon will force MPS servers to shutdown if they
              are still running after TIMEOUT seconds.

              This command is synchronous. The front-end UI waits for the daemon to shutdown, then returns the daemon's exit status. The exit  status
              is zero iff all MPS servers have exited gracefully.

       Commands available to Volta MPS control daemon:

       get_device_client_list PID
              List the devices and PIDs of client applications that enumerated this device. It optionally takes the server instance PID.

       set_default_active_thread_percentage percentage
              Set  the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next
              server. The set value is lost if a quit command is executed. The default is 100.

       get_default_active_thread_percentage
              Query the current default available thread percentage.

       set_active_thread_percentage PID percentage
              Set the active thread percentage for the MPS server instance of the given PID. All clients created with that server afterwards will ob‐
              serve the new limit. Existing clients are not affected.

       get_active_thread_percentage PID
              Query the current available thread percentage of the MPS server instance of the given PID.

ENVIRONMENT
       CUDA_MPS_PIPE_DIRECTORY
              Specify  the  directory that contains the named pipes and UNIX domain sockets used for communication among the MPS control, MPS server,
              and MPS clients. The value of this environment variable should be consistent in the MPS control daemon and all  MPS  client  processes.
              Default directory is /tmp/nvidia-mps

       CUDA_MPS_LOG_DIRECTORY
              Specify  the  directory  that  contains  the  MPS log files. This variable is used by the MPS control daemon only. Default directory is
              /var/log/nvidia-mps

FILES
       Log files created by the MPS control daemon in the specified directory

       control.log
              Record startup and shutdown of MPS control daemon, user commands issued with their results, and status of MPS servers.

       server.log
              Record startup and shutdown of MPS servers, and status of MPS clients.

nvidia-cuda-mps-control                                               2013-02-26                                           nvidia-cuda-mps-control(1)

View Code

=============================================

給出一個TensorFlow1.x的代碼：

import tensorflow as tf
from tensorflow import keras
import numpy as np
import threading
import time

def build():
    n = 8
    with tf.device("/gpu:1"):
        x = tf.random_normal([n, 10])
        x1 = tf.layers.dense(x, 10, activation=tf.nn.elu, name="fc1")
        x2 = tf.layers.dense(x1, 10, activation=tf.nn.elu, name="fc2")
        x3 = tf.layers.dense(x2, 10, activation=tf.nn.elu, name="fc3")
        y = tf.layers.dense(x3, 10, activation=tf.nn.elu, name="fc4")
        
    queue = tf.FIFOQueue(10000, y.dtype, y.shape, shared_name='buffer')
    enqueue_ops = []
    for _ in range(1):
        enqueue_ops.append(queue.enqueue(y))
    tf.train.add_queue_runner(tf.train.QueueRunner(queue, enqueue_ops))

    return queue 
        
# with sess.graph.as_default():
if __name__ == '__main__':
    queue = build()
    dequeued = queue.dequeue_many(4)

    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.per_process_gpu_memory_fraction = 0.2
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        tf.train.start_queue_runners()

        a_time = time.time()
        print(a_time)
        for _ in range(100000):
            sess.run(dequeued)
        b_time = time.time()
        print(b_time)
        print(b_time-a_time)

        time.sleep(11111)

View Code

在2070super顯示卡上單獨運作耗時約 37秒（javascript:void(0)）

如果同樣環境同時運作兩個該程序的代碼，用時：

可以看到在一塊顯示卡同時運作兩個相同的任務要比隻運作一個任務要耗時很多，其用時大緻是單任務下的2倍。

如果我們在顯示卡上開啟mps服務後，用時：

可以看到在顯示卡上開啟mps服務後可以有效的加速多程序程式的運作效率。注意的是mps對一個顯示卡上隻運作使用者的一個程序的情況無效，沒有提升效果，并且需要注意mps開啟後是使用者獨占的，隻要運作mps的顯示卡上有某使用者的cuda程序在運作就會阻塞其他使用者的cuda調用（無法啟動）。

NVIDIA顯示卡cuda的多程序服務——MPS(Multi-Process Service)

繼續閱讀

debian9更新4.9.0核心到4.19.2核心過程

centOS7 配置 vsftpd 虛拟使用者及權限Vsftpd配置虛拟使用者及權限

linux-svn解除安裝與安裝

vsftp虛拟多使用者多權限一鍵部署腳本

模拟A卷二、6 unix系統中tail指令實作

Ubuntu14.04 LTS下安裝mongodb

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三