laitimes

LinkedIn remote development cloud architecture building path

Written by | Shivani Pai Kasturi, Swati Gambhir

Translated by | Sambodhi

Planned by | Xin Xiaoliang

Imagine developing on your laptop computer, harnessing the computing power of cloud computing! At LinkedIn, we've reduced the initial setup and build time for most of our products from 10 to 30 minutes to 10 seconds, and we've brought a new remote development experience to our users. In this article, we'll take a look at our journey to this point.

As part of the LinkedI Developer Productivity and Happiness team, we often hear from developers about how slow development progress and environmental issues can affect their productivity. LinkedIn has a vast ecosystem of technologies, including Java, Python, C/C++, Go, JavaScript, iOS, and Android, to meet different needs. Having a large ecosystem has its advantages, but the settings for each technology will vary, and new developers often spend a lot of time setting up their own development environment.

During the COVID-19 pandemic, we were all working remotely with limited CPU processing power, memory, and disk capacity of laptop computers, which was even more challenging. Notebook computers have far fewer CPU cores, storage, and disk capacity than desktop or server-class computers, and are limited by thermal throttling. In addition, other software running in the background and imperfect networks can also have a greater impact on the performance of the system, resulting in slow builds. Given the scale of builds that LinkedIn's Continuous integrationCI pipeline deals with on a daily basis, CI build failures and inconsistencies between local and CI builds are also an important issue for support engineers.

LinkedIn's remote development activities are designed to address these issues, and its goal is to provide all developers with a remotely accessible, reliable, consistent, predictable, fast-built, easy-to-set up remote development environment that meets their project needs regardless of their local device and network connectivity. We call this remote development environment RDev (Remote development environments), which is a container set up for a specific product and contains all the tools and packages needed for development. RDev instances are created on powerful hardware in our private cloud, and they require very little latency for services to run on the network, such as cloning and download dependencies (as shown in Figure 1).

LinkedIn remote development cloud architecture building path

Figure 1: The time it takes to download a single dependency in seconds is measured in multiple iterations.

We integrated RDev with developers' favorite IDEs and leveraged remote SSH capabilities to provide a seamless development experience that gave developers a feel of being developing locally. The average build time for a large Play application for LinkedIn is shown in Figure 2 below. Obviously, in RDev, the build time is much shorter.

LinkedIn remote development cloud architecture building path

Figure 2: The average time it takes to build our application for different operating systems/kernels.

In this blog post, we'll cover how we used our existing infrastructure and product lifecycle to complete this container-based remote build and development environment. We'll also share some details with you, including how we leveraged RDev to reduce initial setup time and consistency throughout the development and CI lifecycles.

1

The headline leverages pre-built RDev to forecast the needs of developers

We maintain a pool of pre-built RDev environments that can be assigned to developers on demand based on previous RDev usage patterns. Pre-built RDevs include starting containers, checking out products, setting up the environment, building products, and getting applications running so that developers can start working right away without having to worry about starting their applications. This can save developers a lot of time, as shown in Figure 3 below.

LinkedIn remote development cloud architecture building path

Figure 3: Local clone and build time of an application compared to a pre-built RDev.

The build process will vary depending on the type of product, as some products have a specific ongoing build process that observes the file system through inotify and keeps the build in progress (for example, a JavaScript product built by Ember). Even for a normal product, the build process returns an exit code and needs to record the output of the build. This can be achieved by running the build in a tmux session that developers can access after getting the allocated RDev.

2

Extend the benefits of RDev to continuous integration pipelines

The ability to develop (in RDev), build, and deploy (in CI) can all achieve the added benefits of consistency and repeatability through the same container.

To reap these benefits, we updated the build step in the CI pipeline and delegated it to run existing CI tasks within the container. This CI container is created from images built and maintained by LinkedIn's image infrastructure (explained in the next section), and it can be used for remote development or to build CI workflows. This approach works much like GitHub's "running-on" and "container" directives.

3

How does it work?

Let's take a look at how we used a series of clever tricks to reduce build time by two orders of magnitude.

Figure 4 shows the main components of the remote development ecosystem.

LinkedIn remote development cloud architecture building path

Remote development architecture

4

Base image infrastructure

The base image infrastructure integrates building container images with our CI pipeline and helps developers easily create and publish custom images for their internal LinkedIn container image registry. We have a set of template images for certain technologies, such as Python, Java, and JavaScript, which developers can use or extend directly.

For each CI build of an Imaging product, a dependency graph is created that contains all the RPM information and parent base image information for that image. This dependency diagram supports an image dependency update service that keeps all RDev images up to date. It can accept all available changes from the internal RPM and leverage them to rebuild the image. Any images that contain these RPMs, and any associated images, are directly updated. These images are used in both RDev configuration and CI to create development and CI build containers to support a consistent development and build environment.

5

RDev configuration

We follow VS Code's container configuration format. Basic container configurations, such as image names, environment variables, and ports to forward from within the container, are declaratively described in the devcontainer/devcontainer.json file in the root directory of the product library.

6

RDev CLI

The RDev CLI is a Python CLI that is distributed to all developers' machines with the commands needed to create, connect (via clique or IDE) and manage these remote development environments.

7

RDev server

The RDev server is a Rest.li Python service that acts as a proxy between the CLI and the Kubernetes Operator. It is responsible for forwarding requests to the Kubernetes Operator, querying their results, and interacting with our database of stored developer preferences and metadata such as dotfiles.

8

RDev Operator

We've augmented the Kubernetes API by leveraging the Kubernetes Operator pattern and defining LinkedIn-specific Custom Resource Definitions (CRDs).

We defined two CRDs: Rdev and RdevPool. The Rdev CRD represents a single-instance stateful application whose specification has enough information to recreate itself from scratch. The RdevPool CRD wraps the Cloneset CRD to maintain pre-built RDev pools. RDev Operator leverages the Operator SDK Kubebuilder framework as a controller for these CRDs, reconciling their current state with the desired state.

LinkedIn remote development cloud architecture building path

Pod architecture

As shown in Figure 5, RDev is associated with a service that is required to expose ports outside the Kubernetes cluster. Node ports are used to expose the server.

The Persistent Volume Claim (PVC) is necessary to keep persistent volumes (PV) in order to store non-volatile data; in this case, this data is the home directory of the RDev. This is critical when the Pod described below needs to be moved to another node or is accidentally deleted.

Each RDev is backed up by a Kubernetes Pod consisting of three immutable containers: RDev-init-workspace, RDev-sshd, and RDev-sidecar. It also has two main volume mounts, Home and Rdev Info, and other necessary volumes related to certificates and security.

container:

rdev-init-workspace: This is an init container for preparing the developer's workspace and preferences.

rdev-sshd: The container that provides login services to RDev. This container is created from the image specified in the product's devcontainer.json file, contains all the tools needed for development in the container, and runs sshd.

rdev-sidecar: The container responsible for inspecting and installing dotfiles, while also running the Startup Probe (startup probe, described in the next paragraph). This probe is used to determine if the RDev Pod is fully built and ready for distribution to developers.

Volume mount:

Home volume: As the name suggests, the Home volume is the developer's main volume, which checks out the product, installs the developer's dotfiles, sets environment variables, and configures the user profile for the developer.

Rdev info volume: Contains host and port details populated with tags and comments for Pods, leveraging the down-to-the-line API.

As mentioned earlier, RdevPool is a Cloneset that maintains an RDev pool based on the number of replicas configured. Once the RDev Pod is created, the PostStart container hook triggers the build command in the rdev-sshd container. The startup probe running in the rdev-sidecar container continuously probes to confirm that the build completed successfully. It determines whether the product has been built by looking for a file that records the build output, or by using curl to get the URL provided in the configuration file. After the probe is successfully launched, the RDev Pod is marked as Ready for assignment to developers.

When a developer requests an RDev, the RDev controller looks for a fully built unallocated Pod, takes ownership of the Pod, and removes it from the RdevPool controller. The RdevPool controller will notice that one of its Pods is missing and then create a new Pod to maintain the number of copies provided in the RdevPool specification.

9

Looking to the future

As remote work has permeated our daily lives, we believe that remote development is a huge motivator for a best-in-class development experience for LinkedIn developers, wherever they are.

We are excited about the support we will provide for future remote development, such as:

Regenerate failed CI builds and simplify the debugging experience by providing the appropriate RDev for each failed execution.

By associating RDev with every GitHub RDev, this gives reviewers an intuitive understanding of the changes, which improves the review experience.

About the Author:

Shivani Pai Kasturi is a senior software engineer at LinkedIn.

Swati Gambhir, LinkedIn is a software engineer in charge.

https://engineering.linkedin.com/blog/2021/building-in-the-cloud-with-remote-development

Read on