How to use Apache Kafka to handle large applications with 100 million users

In the world of big data and high-traffic applications, dealing with a large number of users at the same time is a huge challenge. Many of the world's most popular applications, serving more than 100 million users, rely on robust, scalable architectures to manage the flood of data and requests. A key player in these architectures is Apache Kafka, a distributed event streaming platform known for its high throughput, reliability, and scalability. In this article, we'll explore how large applications can use Apache Kafka to handle 100 million users, focusing on the architecture and features that make this possible.

Apache Kafka is one of the most popular platforms for managing large amounts of data and can handle millions of users. For example

Kafka is used by many companies, including LinkedIn, Uber, and Netflix.

LinkedIn uses Kafka for message exchange, activity tracking, and log metrics processing, processing 7 trillion messages per day across more than 100 Kafka clusters.

Uber uses Kafka to exchange data between users and drivers.

Netflix uses Kafka to track the activity of more than 230 million subscribers, including viewing history, movie likes and dislikes, and what you watch.

PhonePe uses Apache Kafka to manage approximately 100 billion events per day. Kafka is used to build real-time streaming applications and data pipelines that process data and move it between systems. Kafka also operates as a distributed system, can scale to handle many applications, and can store data as needed.

Some of the other companies that use Kafka include Uber, Shopify, Spotify, Udemy, LaunchDarkly, Slack, Robinhood, and CRED.

Kafka is also used for messaging, metrics collection and monitoring, logging, event sourcing, submission logs, and real-time analytics.

Kafka works like a pub-sub message queue, allowing users to publish and subscribe to message streams. It also handles stream processing, dynamically computing derived datasets and streams, rather than just delivering batches of messages.

Kafka is one of the most reliable solutions for building reliable event-driven architectures (EDAs) that provide reliable, real-time banking services. EDA enables data to flow asynchronously between loosely coupled event consumers and event producers.

To determine the size of capacity for Kafka Streams, you can monitor the operating system's performance counters to determine if CPU or network is becoming a bottleneck. If the CPU is the bottleneck, you can use more CPU by adding more threads to the application. If the network is the bottleneck, you can add another machine and run a clone of the Kafka Streams application there. Kafka Streams will automatically balance the load across all tasks running on all machines.

Here are some examples of real-world use cases for Kafka:

Social Media Analytics

A social media platform uses Kafka Streams to process tweets in real-time, conduct sentiment analysis, and provide dashboards with user sentiment, trends, and engagement metrics.

manufacturing

Kafka is used to monitor production lines, equipment status, and inventory levels in real time. This helps optimize production efficiency, reduce downtime, and improve supply chain management.

Website activity tracking

When you visit a website and perform actions such as logging in, searching, or clicking on a product, Kafka captures these events. Kafka then sends the message flow to a specific topic based on the type of event.

The role of Apache Kafka in high-volume applications

Apache Kafka is designed to handle real-time data streams at scale. Its primary role in high-volume applications is as the backbone of data stream processing, enabling efficient processing of millions of messages per second. Kafka's architecture allows it to seamlessly distribute data across multiple servers (or proxies), ensuring high availability and fault tolerance.

Key features of Apache Kafka:

High throughput: Kafka can process millions of messages per second and support high-volume data streams without significant performance degradation.
Scalability: Kafka clusters can be scaled with no downtime to accommodate more producers, consumers, and data, as the user base grows.
Durability and reliability: Kafka ensures that data is not lost, stores messages on disk, and replicates data in the cluster for fault tolerance.
Low latency: Kafka optimizes low-latency messaging, which is critical for real-time applications and services.
Data flow decoupling: Producers and consumers are decoupled, allowing processing applications to scale and evolve independently.

An overview of the architecture that handles 100 million users

To handle 100 million users, an application must have a well-thought-out architecture that leverages Kafka's features. Here's a simplified overview of such an architecture:

1. Data Ingestion Layer:

The data ingestion layer is where data enters the system from various sources. This can include user interactions, application logs, system metrics, and more. Kafka's role at this layer is to efficiently collect and buffer this incoming data, ensuring that it is ready for processing by downstream systems.

2. Stream Processing:

Once the data is in Kafka, it can be streamed. This involves real-time analysis and processing of data streams that can be used for a variety of purposes, such as real-time analysis, monitoring, and triggering automated actions. Kafka Streams is a client-side library for building applications and microservices, where input and output data is stored in Kafka topics, typically used at this layer.

3. Data Integration:

Kafka also facilitates data integration and serves as a central hub for the flow of data between different systems. This is critical in a microservices architecture, where different application components may need to communicate and share data effectively.

4. Scalable Storage:

Kafka provides persistent storage for data streams, allowing applications to retain and reprocess large amounts of data as needed. This is especially important for applications that need to retain historical data for analysis or regulatory compliance.

5. Event-Driven Architecture:

Kafka enables event-driven architecture, where producers publish events (messages) without needing to know the details of the consumer. This decoupling allows for greater scalability, flexibility, and elasticity of the system.

How many messages per second can Apache Kafka process at Honeycomb?

At Honeycomb, easily surpass a million messages.

In this episode, learn how Honeycomb uses Kafka at scale. Liz Fong-Jones (Honeycomb Principal Development Advocate) explains how Honeycomb manages its Kafka-based telemetry ingestion pipeline and scales its Kafka cluster.

So what is Honeycomb? Honeycomb is an observability platform that helps you visualize, analyze, and improve the quality and performance of your cloud applications. Their data volume grew 10-fold during the pandemic, while the total cost of ownership increased by only 20%.

To help understand benchmarking, let me give me a quick recap of what Kafka is and how it works. Kafka is a distributed messaging system originally built on LinkedIn and is now part of the Apache Software Foundation and is used by several companies.

The general setup is very simple. The producer sends the records to the cluster, which keeps them and distributes them to consumers:

The key abstraction in Kafka is the theme. Producers publish their records to a topic, and consumers subscribe to one or more topics. A Kafka topic is just a sharded write-ahead log. The producer appends the records to these logs, and the consumer subscribes to the change. Each record is a key/value pair. The key is used to assign records to a log partition (unless the publisher specifies the partition directly).

Here is a simple example of a single producer and a consumer reading and writing from two partitioned themes.

This image shows a producer process appending logs to two partitions, and a consumer reading the same logs. Each record in the log has an associated entry number, which we call an offset. The consumer uses this offset to describe its position in each log.

These partitions are distributed across a group of machines, allowing a single topic to store more data than any single machine can handle.

Note that, unlike most messaging systems, logs are always persistent. Messages are written to the file system as soon as they are received. Messages are not deleted after they are read, but are retained according to some configurable service-level agreement, such as a few days or a week. This allows for use in situations where the data consumer may need to reload the data. It also makes it possible to support spatially efficient publish-subscribe because there is a shared log no matter how many consumers there are, whereas in traditional messaging systems, there is usually a queue for each consumer, so adding a consumer doubles the size of your data. This makes Kafka ideal for use outside of traditional messaging systems, such as as conduits for offline data systems such as Hadoop. These offline systems may only load at intervals of periodic ETL cycles, or they may be shut down for a few hours for maintenance, during which Kafka is able to buffer even terabytes of unconsumed data if needed.

Kafka also replicates its logs across multiple servers for fault tolerance. An important architectural aspect of our replication implementation compared to other messaging systems is that replication is not an add-on that requires complex configuration and is only used in very specific cases. Instead, replication is assumed to be the default: we treat unreplicated data as a special case with a replication factor of exactly one.

When a producer publishes a message that contains a record offset, they receive an acknowledgment receipt. The first record published to the partition is given an offset of 0, the second record is 1, and so on, in an increasing sequence. The consumer consumes data from a specified offset location and saves their position in the log by periodically committing: save this offset in case the consumer instance crashes and another instance needs to recover from its location.

Test setup

For these tests, I have six machines, each with the following specifications:

Intel Xeon 2.5 GHz processor, six cores
Six 7200 RPM SATA hard drives
32GB RAM
1Gb Ethernet

The Kafka cluster is set up on three of these machines. Six hard disks are mounted directly, no RAID (JBOD style). The remaining three machines I use for Zookeeper and generate loads.

A cluster of three machines isn't huge, but since we'll only test to a replication factor of three, that's what we need. Obviously, we can always add more partitions and spread the data across more machines to scale our cluster horizontally.

These pieces of hardware aren't actually LinkedIn's usual Kafka hardware. Our Kafka machine is better suited to running Kafka, but not quite the "off-the-shelf" spirit of my test this time. Instead, I borrowed these from our Hadoop cluster, which is probably the cheapest of all our persistent systems. The usage pattern for Hadoop is very similar to that of Kafka, so it makes sense to do so.

Okay, without further ado, here are the results!

Challenges and considerations

While Kafka provides the tools and capabilities needed to handle 100 million users, there are several challenges and considerations that must be addressed:

• Data partitioning and sharding: Proper data partitioning is critical for balancing the load and achieving high throughput in a Kafka cluster. • Monitoring and management: A Kafka cluster of this size requires sophisticated monitoring and management to ensure optimal performance and resolve any issues that arise quickly. • Security: Handling the sensitive data of millions of users requires strong security measures, including encryption, access controls, and auditing. • Compliance and data governance: Large-scale systems often have to comply with a variety of regulatory requirements, requiring careful data governance practices.

conclusion

Apache Kafka's architecture and capabilities make it an excellent choice for applications that need to handle 100 million or more users. By effectively managing high-volume data streams, ensuring reliability and scalability, and supporting a decoupled, event-driven architecture, Kafka enables applications to scale to meet the needs of a large user base. However, leveraging Kafka at this scale also requires careful planning, monitoring, and management to address the associated challenges and ensure the resiliency and performance of the system.

How to use Apache Kafka to handle large applications with 100 million users

Here are some examples of real-world use cases for Kafka:

How many messages per second can Apache Kafka process at Honeycomb?

Test setup

Challenges and considerations

conclusion

Read on

A 35-year-old programmer diagnosed with cancer, usually eats a healthy diet and loves to climb, wife: Sometimes I can't stand it

There is no hope of refund! The well-known brand exploded, and the company entered bankruptcy proceedings! The founder is Zhang Yuqi's ex-husband

The punishment was directly imposed on the grounds that it constituted a disturbance of order in a public place, and there was a lack of evidence to support the violation of the procedure

Qibo shared: 25 sets of Mini Program commercial distribution models - global dividend model

Qibo Sharing: 25 sets of Mini Program commercial distribution models - the lottery group of the group

One of the 25 sets of business distribution models of Mini Programs - pre-deposit card slot light investment model

Qibo Sharing: One of the 25 sets of commercial distribution models of Mini Programs - community operation model

Notice of the General Office of the National Health Commission on Printing and Distributing the Qualification Accreditation Procedures and Technical Evaluation Criteria for Occupational Health Technical Service Institutions

Snack applet: enjoy leisure moments

Xuanping Building Materials Mini Program: Quality Builds the World

Apple removed apps from the App Store that could generate nudes using generative AI

The 12,000 yuan cake cutting incident fermented, netizens: Obviously you can grab it directly, but you have to go through the procedure

Chen Hao, a hardcore programmer, died of a sudden heart attack, and he witnessed the 20-year trend of China's Internet industry

No code development is required, one-click generation of mini programs and official accounts

#学霸秘籍解锁单词记忆术: 24 stories of Qisu English easily conquer 3500 vocabulary #Qisu English current reading applet #Qisu English 24 stories #High school English

Unlock the "superpower" of the Mini Program customer service robot: make the service more efficient and accurate

How to deal with "dirty things" encountered by big red sour branch furniture?

The effects of heat treatment on the microstructure and mechanical properties of GH4169 alloy at 650 °C were investigated

How to deal with party members who are found to have violated discipline during the period of probation? | Party discipline study and education

A live duck can be sold for 70 yuan, but it only sells for 18 yuan after being roasted? After reading this article, you will understand the "greasy" in this. Why are live ducks so expensive? It all starts with duck breeding. High quality

Summary of various sewage treatment principles and technologies!

The suspect fled after hitting someone with a car, and tried to resist arrest after being stopped in Tongguan, and the police handled it comfortably

BATCH SPLIT TRACKING NUMBERS IN EXCEL SHEETS TO EASILY SOLVE THE PROBLEM OF TEXT PROCESSING AND EXPRESS INQUIRY

Troubleshoot and handle online Dubbo call exceptions

The production HDFS enters safe mode and troubleshoots

36 industrial accounting case analysis and summary are super detailed!

What should I do if I retire after 15 years of not paying my social security contributions? From now on, everything will be dealt with like this!

On May 2, the female star Zhu Zhu suddenly appeared on the hot search, and it turned out that a netizen revealed that she only spent 299 yuan to buy Zhu Zhu worth 500

An 11-year-old girl peeked at a romance movie, and her parents found out that they had mixed tastes, but the way they handled it was amazing

The last day of Cixi's life: in the morning, she dealt with the funeral affairs of Emperor Guangxu, and in the afternoon, she made an edict and put on a shroud

2024 "New Rules": After the divorce, the house will be treated like this, regardless of whether the name is on the title deed or not

Orangutan was first discovered to treat wounds with medicinal herbs, reaching new heights of wisdom?