天天看點

CNCF -容器安全隔離技術學習筆記gvisorKata ContainersFloating Topic

gvisor

Sentry

  • 對上(容器應用)提供所有系統操作,但不是透傳給底層主控端
    • 隻提供受限的API
    • 通過隔離提供安全性
  • File system operations 發送給Gofer
  • Sentry process is started in a restricted seccomp container without access

    to file system resources.

KVM (experimental)-定制kvm

The KVM platform allows the Sentry to act as both guest OS and VMM, switching

back and forth between the two worlds seamlessly

Gofer

  • 使用9P 協定與Sentry通訊,9Pworks both as a distributed file system and as a

    network transparent and language agnostic ‘API’.

  • 對檔案系統通路提供額外保護

sandbox

  • Communicate with a Gofer process via a connected socket. The sandbox may

    receive new file descriptors from the Gofer process, corresponding to opened

    files. These files can then be read from and written to by the sandbox.

  • Make a minimal set of host system calls. The calls do not include the

    creation of new sockets (unless host networking mode is enabled) or opening

    files. The calls include duplication and closing of file descriptors,

    synchronization, timers and signal management.

  • Read and write packets to a virtual ethernet device. This is not required if

    host networking is enabled (or networking is disabled).

  • gVisor sandbox will only be able to manipulate virtualized system resources

    (e.g. the system time, kernel settings or filesystem attributes) and not

    underlying host system resources.

Principles: Defense-in-Depth

  • No system call is passed through directly to the host
  • Only common, universal functionality is implemented.
  • The host surface exposed to the Sentry is minimized
    1. The Sentry is not permitted to open new files, create new sockets or

      do many other interesting things on the host

工程實施層限制措施

  1. Unsafe code is carefully controlled
  2. No CGo is allowed. The Sentry must be a pure Go binary
  3. External imports are not generally allowed within the core packages. Only

    limited external imports are used within the setup code.

設計思想有點類似mechanism used by User-Mode Linux (UML).

性能

  • 在SystemCall時候消耗比一般runc大:golang及Sentry進行了攔截處理
  • 網絡及檔案IO (也由于攔截-轉發)要低一些

gVisor supports only x86_64 and requires Linux 4.14.77+ (older Linux)

Kata Containers

kata-runtime creates a QEMU*/KVM virtual machine for each container or pod

kata-agent

  • gRPC server in the guest using a VIRTIO serial or VSOCK interface which QEMU

    exposes as a socket file on the host

  • A kata-agent sandbox is a container sandbox defined by a set of namespaces
  • kata-runtime creates a single container per pod.

Kata Containers proxy (kata-proxy)

代理IO操作

Guest assets

  • Guest kernel
  • Guest image

runtime

  • kata-agent communicates with the other Kata components over gRPC.
  • kata-runtime can run several containers per VM to support container engines

    that require multiple containers running inside a pod.

virtio-fs介紹

  • 在guest之間共享檔案系統的方案
  • virtio-fs把檔案mmap進qemu的程序位址空間并讓不同guest使用DAX通路該記憶體空間
  • DAX資料通路和中繼資料的共享記憶體通路都是通過共享記憶體的方式避免不必要的VM/hypervisor之間通信(在中繼資料沒有改變的情況下)
    1. Kata Containers utilizes the Linux kernel DAX (Direct Access filesystem)

      feature to efficiently map some host-side files into the guest VM space.

支援的virtual machine monitors (VMMs) and hypervisors.

  • QEMU 1.0 upstream QEMU, with support for hotplug and filesystem sharing
  • NEMU 1.4 Deprecated, removed as of 1.10 release. Slimmed down fork of QEMU,

    with experimental support of virtio-fs

  • Firecracker 1.5 upstream Firecracker, rust-VMM based, no VFIO, no FS

    sharing, no memory/CPU hotplug

  • QEMU-virtio-fs 1.7 upstream QEMU with support for virtio-fs. Will be removed

    once virtio-fs lands in upstream QEMU

  • Cloud Hypervisor 1.10 rust-VMM based, includes VFIO and FS sharing through

    virtio-fs, no hotplug

虛拟機螢幕(VMM)

  • Qemu
  • Hyper-V
  • https://github.com/kata-containers/documentation/blob/master/design/virtualization.md

曆史

  • launching in December 2017
  • 1.5開始支援shim-v2的

Floating Topic