laitimes

Google's four-year relocation feat: billions of lines of code from Perforce to the new system

Google's four-year relocation feat: billions of lines of code from Perforce to the new system
Write on the front: Around 2013, Google's source control system served more than 25,000 developers a day, all from a single server tucked away in the corner of the stairs.
Google's four-year relocation feat: billions of lines of code from Perforce to the new system

The story begins: it all starts with a single Perforce server

In 2011, Dan Bloch, then the technical lead of Google's Perforce management team, published a paper titled "Still All On One Server: Perforce at Scale." The paper details that Google's source control system, which serves "more than 12,000 users a day," still relies on a single Perforce server located under the stairs of Building 43 on the main campus.

As of 2011, this single server has been running continuously for eleven years in Google's history. It witnessed Google in its infancy and successfully expanded to support Google, a publicly traded company today. In fact, it was around that time that a lucky Google engineer had just completed his 20 million change request. The server is still running stably, executing "11 million to 12 million commands" per day.

In the paper, Bloch describes some of the initiatives that have been successful in achieving server scaling. However, the reality is not so rosy.

Today's Google has long since been reborn and is no longer the fledgling small business. With the growth of its strength, Google has invested heavily in equipping a series of industry-leading hardware facilities. However, even with such strong hardware support, its servers are often put to the test, and the pressure they carry cannot be underestimated. During peak hours when CPU usage peaks, the system occasionally suffers from TCP connection interruptions. As a precautionary measure, Google maintains a hot backup server on standby at all times, while a team of eight professional administrators is on hand around the clock to take emergency measures to ensure the smooth operation of the source control server and avoid any accidents that could affect Google's day-to-day operations.

For years, Google has been aware internally that this is a potential risk, and engineers are looking for alternatives. But at Google's scale — "the busiest single Perforce server on the planet, and one of the largest repositories of any source control system" — there's no clear alternative.

Back in the day, Linus founded git in 2005 because he couldn't find a solution that could effectively cope with the sheer volume of Linux kernel repositories.

In the years following Google's 2011 paper, "Still All on One Server: Perforce at Scale," the tech giant publicly disclosed the staggering scale of code changes in its single repository. In 2014, Google's single repository experienced about 15 million lines of code changes per week, involving about 250,000 file changes.

To help you get a sense of the shock of this number, compare it to having to rebuild the entire 2014 Linux kernel from scratch every week — exactly nine years after Linux first faced the kernel management challenge head-on.

Google's four-year relocation feat: billions of lines of code from Perforce to the new system

The path less traveled: stick to the decision of a single warehouse and choose Piper in a desperate bet

Since 2008, Google's engineers have been considering an alternative to this single server.

They considered the idea of splitting up a single repository for a while, but ultimately rejected the idea — a significant, industry-changing decision in hindsight, as it set the standard for dealing with code complexity at scale for decades to come.

In the years that followed, Google invented and led the development of many large-scale single warehouse tools, and significantly influenced the popularity of single warehouse culture. (See, for example, his 2016 paper, "Why Google Stores Billions of Lines of Code in a Single Repository.") )

However, the decision to continue with a single warehouse was not an obvious choice at the time. Until this moment, Google's single warehouse architecture evolved relatively naturally; For the first time, organizations are forced to make a clear commitment to this architecture. This decision was in stark contrast to the prevailing view in the Git community at the time that people should have "more and smaller repositories", in part due to the high cost of cloning a large single repository (which Google later solved with tools such as SourceFS and Clients in the Cloud (CitC). If Google had chosen to follow the norm instead of explicitly affirming its commitment to a single repository architecture, the situation could be very different today.

Google also briefly considered migrating from Perforce to SVN, believing that SVN might be able to meet the scale they needed. However, when the engineers found that there was no clear migration path, the idea did not work out.

Ultimately, as Linus did in 2005, the only way forward seemed to be to create something completely new.

The new system, known as Piper (derived from some engineers' fondness for airplanes and an acronym for "Piper is Piper expanded recursively"), is still in use today.

Google's four-year relocation feat: billions of lines of code from Perforce to the new system

Migration: A four-year journey to "move mountains".

Once the engineers have decided on a rough idea of what the alternative will look like (Piper will be distributed and built "on top of the standards Google infrastructure, initially Bigtable"), the next task is to actually create and implement the new solution. Once Piper was deployed, they also needed to switch all traffic to the new system and migrate the entire Google single warehouse.

This effort took more than four years.

At first glance, this is a staggering time span, but over the course of eleven years, Perforce has become deeply embedded in Google's software ecosystem, touching nearly every engineering aspect.

At the start of the migration, there were already 300 tools that relied on Perforce's API. Even more strikingly, the production environment's dependency on Perforce continues to emerge. Ideally, the version control system should be strictly internal-facing, and even a crash should not affect real-time traffic; However, the Piper team continues to find that this is not the case. Overall, the engineers performing the migration must be extremely careful not to disrupt the Google end-user experience.

To further complicate matters, in 2010, Oracle filed a lawsuit against Google over its use of the licensed Java API interface in the Android operating system. It was not until 2021 that the case was finally decided by the Supreme Court.

During this transition period, Google's engineering team was deeply concerned about how to make a smooth transition from the Perforce API without fully replicating the original interface. To address this daunting challenge, they turned to an innovative strategy that has been widely admired in the industry – the Clean Room Design approach.

The solution was first carefully developed by technical writers, and then handed over to a team of independent engineers who had no contact with the native API, who built a new system architecture from scratch based on this pure design blueprint, ensuring a seamless and original migration process, and cleverly avoiding the potential legal and technical risks associated with direct replication of the interface.

Over time, the tone of the project changed. At first, Piper was a new and cool concept that engineers were passionate about, and it could be the key to solving Perforce's problems. However, as time passed, their work became increasingly urgent.

Google's development work continued during the migration, and over the years, the load on Perforce continued to increase, increasing the risk. There have been a number of new developments on the Perforce API, including today's pivotal systems such as Blaze (Google's build system, later open sourced as Bazel) and TAP (Google's internal testbed).

What makes this bold and unusual decision is not only that it took years of hard planning and investment from a team of engineers, but also that the nature of the outcome is extremely polarized – either it will be successful and all the hard work will come to fruition in an instant; Either it will be completely unsuccessful, and all efforts will be in vain, and there will be no compromise in the middle.

Given the importance of maintaining a single authoritative version of the source code, Google understands that any improvements will be in vain unless the Piper system is a complete replacement for the existing solution. Piper's fate lies in success or failure: either it will take over perfectly and become a new hub for source control; Either it will be out of the picture, forcing Google to go back to its old ways, or it will face a more difficult Perforce server dilemma than ever.

That's why, when the Piper project team faced the critical moment of implementing the PAXOS algorithm, they decided to seconded experts in the field from the Google Spanner team, even though Spanner itself had not yet fully mastered the use of PAXOS. This move demonstrates the strong mechanism of resource sharing and collaboration within Google, which can mobilize top talent across departments at critical moments to ensure the smooth progress of the Piper project.

As the migration progressed, a key milestone was achieved – submissions were smoothly synced to both Perforce and Piper systems, ensuring a seamless data transition. To verify the robustness of the new system, the Piper team conducted small-scale deployment tests in specific regions, with encouraging results.

The real challenge, however, is to win the hearts and minds of the 25,000 Google engineers and guide them into a new era at Piper. This is not only a technological innovation, but also a journey of people's hearts. The Piper team has a heavy responsibility, especially for those senior engineers, who personally visit the door, communicate patiently, resolve doubts one by one, and strive for the understanding and support of every colleague, so as to clear the last obstacle to the project.

On an ordinary Saturday, the core team of the Piper project, which has grown to 10 people, gathered in the conference room of the campus. Jeff Dean, now Alphabet's Chief Scientist, was also on site, not only to provide a field trip to the project's progress, but also to boost the team's morale.

Although the Piper team had already made thorough preparations, rehearsed countless times, written detailed scripts, arranged multiple surveillances, and developed detailed emergency guidelines, the deep fear that the burden on the shoulders of these ten engineers could determine the fate of Google's operations, and one mistake could lead to unpredictable consequences.

When the migration order is officially issued, within a few minutes, the source control system goes into read-only mode, and everything seems to freeze, paving the way for that momentary data migration. In the conference room, everyone waited with bated breath, and the air was filled with a mood of anticipation and apprehension, and everyone's heartbeat seemed to be clearly audible, witnessing a historic turning point together.

And then...... The migration was completed without a hitch.

No data loss; Google's production instance is not affected.

And just like that, the all-or-nothing gamble that had lasted for years paid off.

Google's four-year relocation feat: billions of lines of code from Perforce to the new system

The dust has settled: Google's golden age alone

Switching to Piper immediately reduced Google's operational risk, getting rid of its reliance on a single overloaded Perforce server. But over time, the migration also unlocked a new set of systems with the new traffic scale that the source control server now supports.

Since the migration in 2012, the number of automated submissions has exploded.

In this day and age, when it comes to Google's internal tooling system, people often associate it with a giant company with abundant resources and strength, imagining a giant company that can carefully polish a series of cutting-edge technology products in the abundant time and space.

However, Google's move from Perforce to Piper in 2012 reveals a very different perspective – it was eight years after Google went public, a golden age of entrepreneurial passion and bold innovation. During this time, Google has shown a fighting spirit that is fearless in challenging and scaling new heights. The Piper migration project is much more than just a technical innovation, it reflects the core values of Google's engineering culture – innovation, collaboration, and excellence.

Reference Links:

https://www.reddit.com/r/programming/comments/1dsf4z3/around_2013_googles_source_control_system_was/

https://graphite.dev/blog/google-perforce-to-piper-migration