{"id":5155,"date":"2018-06-08T21:06:44","date_gmt":"2018-06-09T04:06:44","guid":{"rendered":"http:\/\/softwareengineeringdaily.com\/?p=5155"},"modified":"2018-06-12T09:58:23","modified_gmt":"2018-06-12T16:58:23","slug":"meet-apache-kafka","status":"publish","type":"post","link":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/","title":{"rendered":"Meet Apache Kafka"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Kafka has become a central tool for data at many large organizations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">At data-intensive companies like Fiverr and Netflix, Kafka is used simultaneously as:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">a database<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">a queue for ordered processing<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">a tool for sharing data between different teams<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">the backup tool that can fix any other system if a catastrophic failure occurs<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">a platform to decouple different teams that are dependent on each other<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><b>MEET APACHE KAFKA: WHAT IT IS, ORIGINS<\/b><\/p>\n<p><i><span style=\"font-weight: 400;\">Kafka: a streaming platform &#8212; a central hub for real-time streams of data.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">Apache Kafka is an open-source distributed streaming platform. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">It is a publish-subscribe messaging system rethought as a distributed commit log so producers and consumers can publish messages to each other. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kafka serves as the central repository for data streams in a distributed system. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kafka was originally developed at LinkedIn, and the creators of the project eventually left LinkedIn and started Confluent, a company that builds enterprise products around Kafka.<\/span><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png\"><img fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"5156\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/kafka-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png?fit=580%2C804&amp;ssl=1\" data-orig-size=\"580,804\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Kafka\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png?fit=216%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png?fit=580%2C804&amp;ssl=1\" class=\"alignleft wp-image-5156\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png?resize=300%2C416\" alt=\"\" width=\"300\" height=\"416\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png?resize=216%2C300&amp;ssl=1 216w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Kafka.png?w=580&amp;ssl=1 580w\" sizes=\"(max-width: 300px) 100vw, 300px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<h2><span style=\"font-weight: 400;\">\u201cSystems are giving up correctness for latency, and I\u2019m arguing that stream processing systems have to be designed to allow the user to pick the tradeoffs that the application needs.\u201d <\/span><\/h2>\n<h2><span style=\"font-weight: 400;\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8211; Neha Narkhede<\/span><\/h2>\n<p><b>EVENT SOURCING<\/b><\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/softwareengineeringdaily.com\/2016\/10\/14\/kafka-event-sourcing-with-neha-narkhede\/\">Event sourcing<\/a> is an architectural pattern that allows changes to our application model to be represented as events. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Each event is published to an event queue, and is pulled off of the queue by each of the various services that need to consume that event. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Event sourcing and the related architectural pattern CQRS allow for a flow of information through an application that is easy to reason about, and has several other desirable properties.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Kafka can be used for event sourcing. Related software patterns are improving the architectures of companies like Netflix, Uber, eBay, and Yelp.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>KAFKA ARCHITECTURE<\/b><\/p>\n<p><span style=\"font-weight: 400;\">From Wikipedia: <\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">The major terms of Kafka&#8217;s architecture are topics, records, and brokers. Topics consist of stream of records holding different information. On the other hand, Brokers are responsible for replicating the messages. <\/span><\/i><\/p>\n<p><strong>There are four major APIs in Kafka:<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>Producer API<\/b><span style=\"font-weight: 400;\"> &#8211; Permits the applications to publish streams of records.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Consumer API<\/b><span style=\"font-weight: 400;\"> &#8211; Permits the application to subscribe to the topics and processes the stream of records.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Streams API<\/b><span style=\"font-weight: 400;\"> \u2013 This API converts the input streams to output and produces the result.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Connector API<\/b><span style=\"font-weight: 400;\"> \u2013 Executes the reusable producer and consumer APIs that can link the topics to the existing applications.<\/span><\/li>\n<\/ul>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png\"><img decoding=\"async\" data-attachment-id=\"5157\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/kafkacluster\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?fit=1282%2C882&amp;ssl=1\" data-orig-size=\"1282,882\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"KafkaCluster\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?fit=300%2C206&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?fit=1024%2C704&amp;ssl=1\" class=\"alignnone wp-image-5157\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?resize=500%2C344\" alt=\"\" width=\"500\" height=\"344\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?resize=300%2C206&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?resize=768%2C528&amp;ssl=1 768w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?resize=1024%2C704&amp;ssl=1 1024w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaCluster.png?w=1282&amp;ssl=1 1282w\" sizes=\"(max-width: 500px) 100vw, 500px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><b>WHEN TO USE KAFKA<\/b><\/p>\n<p><span style=\"font-weight: 400;\">You would <a href=\"https:\/\/softwareengineeringdaily.com\/2016\/03\/08\/apache-kafkas-uses-and-target-market\/\">use Kafka<\/a> in scenarios of asynchronous communication and processing pipelines, predominantly in distributed systems, <a href=\"https:\/\/softwareengineeringdaily.com\/2017\/07\/10\/kafka-in-the-cloud-with-neha-narkhede\/\">cloud,<\/a> and big data, including the following cases:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">messaging via publish-subscribe or queue paradigms<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">persistent buffer at systems entry point that can absorb traffic spikes, allowing inner system to react and process in their own optimal pace<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">stream processing and soft real time analytics, paired with a stream processor like Apache Flink, Spark streaming or Storm and with a database like HBase<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">log transport &amp; aggregator: a way to consolidate event streams in their way to backup, analytics<\/span><\/li>\n<\/ul>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png\"><img decoding=\"async\" data-attachment-id=\"5158\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/kafkaworkflow\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?fit=1166%2C616&amp;ssl=1\" data-orig-size=\"1166,616\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"KafkaWorkflow\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?fit=300%2C158&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?fit=1024%2C541&amp;ssl=1\" class=\"alignnone wp-image-5158\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?resize=700%2C370\" alt=\"\" width=\"700\" height=\"370\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?resize=300%2C158&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?resize=768%2C406&amp;ssl=1 768w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?resize=1024%2C541&amp;ssl=1 1024w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaWorkflow.png?w=1166&amp;ssl=1 1166w\" sizes=\"(max-width: 700px) 100vw, 700px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">The beauty of this is that just sending your logs, or event streams to Kafka, gives you all those advantages at once: events consolidation, backup, on-the-fly analytics, pressure absorbing, processing by multiple systems of the same events, and re-processing them at any time while they live in Kafka (a few weeks, configurable). This is why some name Kafka the \u201cnew enterprise bus\u201d for data intensive companies.<\/span><\/p>\n<p><strong>Kafka offers quite a few guarantees, including:<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">at-least once, at most once, exactly once (tunable, configurable)<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">messages total-ordering at shard level:<\/span>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a message M1 is sent by the same producer as a message M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A consumer instance sees messages in the order they are stored in the log.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">By setting a shard policy per topic (partition count + partition function), admins trade off between load balancing and stateful processing<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><b>DETAILED USE CASES FROM THE KAFKA DOCS<\/b><\/p>\n<p><b>Messaging<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.<\/span><\/p>\n<p><b>Website Activity Tracking<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Activity tracking is often very high volume as many activity messages are generated for each user page view.<\/span><\/p>\n<p><b>Metrics<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Kafka is often used for operation monitoring data pipelines. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.<\/span><\/p>\n<p><b>Log Aggregation<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.<\/span><\/p>\n<p><b>Stream Processing<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Many users end up doing stage-wise processing of data where data is consumed from topics of raw data and then aggregated, enriched, or otherwise transformed into new Kafka topics for further consumption. For example a processing flow for article recommendation might crawl article content from RSS feeds and publish it to an \u201carticles\u201d topic; further processing might help normalize or deduplicate this content to a topic of cleaned article content; a final stage might attempt to match this content to users. This creates a graph of real-time data flow out of the individual topics. The Storm framework is one popular way for implementing some of these transformations. Recently, Apache Flink is a more efficient exactly-once stream processing solution. Spark Streaming is also suitable in many scenarios, but does not have a truly streaming semantic.<\/span><\/p>\n<p><b>KAFKA STREAMS<\/b><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"5186\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/kafkastreams-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?fit=1280%2C720&amp;ssl=1\" data-orig-size=\"1280,720\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"KafkaStreams\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?fit=300%2C169&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?fit=1024%2C576&amp;ssl=1\" class=\"alignnone wp-image-5186\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?resize=500%2C281\" alt=\"\" width=\"500\" height=\"281\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?resize=269%2C151&amp;ssl=1 269w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/KafkaStreams.jpg?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 500px) 100vw, 500px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/softwareengineeringdaily.com\/2015\/12\/18\/demystifying-stream-processing-with-neha-narkhede\/\">Kafka Streams<\/a> is a Java stream-processing library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another? <a href=\"https:\/\/softwareengineeringdaily.com\/2016\/10\/07\/kafka-streams-with-jay-kreps\/\">To quote guest Jay Kreps,<\/a> \u201cthe gap we see Kafka Streams filling is less the analytics-focused domain these frameworks focus on and more building core applications and microservices that process data streams.\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Jay is the CEO of Confluent, a company that is building Kafka technology, and he is one of the original authors of Kafka. Kafka evolved to be the message broker of choice for so many data engineering stacks. <\/span><\/p>\n<p><b>MANAGED SOLUTIONS<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Kafka deployments can be a complex to manage. Kafka is very popular, but is not easy to deploy and operationalize. That is why Confluent has built a Kafka-as-a-service product, so that managing Kafka is not the job of an on-call DevOps engineer. There are many complexities to building this system.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Apache Kafka has become the most popular open-source solution for persistent replicated messaging in the Hadoop ecosystem. But some software engineers who are working with big data don\u2019t want to deal with the configuration and setup of Kafka. One way to sidestep this problem is to go with a <a href=\"https:\/\/softwareengineeringdaily.com\/2016\/10\/25\/managed-kafka-with-tom-crayford\/\">managed solution<\/a>, like <a href=\"https:\/\/softwareengineeringdaily.com\/2016\/04\/25\/azure-event-hubs-dan-rosanova\/\">Microsoft Azure Event Hubs.<\/a> Or recently, Heroku developed the Heroku Kafka product, which is another managed version of Apache Kafka.<\/span><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"5188\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/microsoft_azure_event_hubs_architecture\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png?fit=589%2C235&amp;ssl=1\" data-orig-size=\"589,235\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Microsoft_azure_event_hubs_architecture\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png?fit=300%2C120&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png?fit=589%2C235&amp;ssl=1\" class=\"alignnone wp-image-5188\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png?resize=499%2C199\" alt=\"\" width=\"499\" height=\"199\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png?resize=300%2C120&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/Microsoft_azure_event_hubs_architecture.png?w=589&amp;ssl=1 589w\" sizes=\"(max-width: 499px) 100vw, 499px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><b>FUTURE OF KAFKA<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Monitoring Kafka performance at scale is more important than ever, and there are several open source and paid services that do this. Stay tuned to follow what happens next with Kafka.\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kafka has become a central tool for data at many large organizations. At data-intensive companies like Fiverr and Netflix, Kafka is used simultaneously as: a database a queue for ordered processing a tool for sharing data between different teams the backup tool that can fix any other system if a catastrophic failure occurs a platform<\/p>\n","protected":false},"author":10,"featured_media":5159,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"[Article] Meet Apache Kafka #Data #Streaming","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1363,83],"tags":[1272,547,1643,562,47,2171],"class_list":["post-5155","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-all-episodes","category-articles","tag-apache-kafka","tag-confluent","tag-fiverr","tag-neha-narkhede","tag-netflix","tag-streaming-data"],"jetpack_publicize_connections":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Meet Apache Kafka - Software Engineering Daily<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Meet Apache Kafka - Software Engineering Daily\" \/>\n<meta property=\"og:description\" content=\"Kafka has become a central tool for data at many large organizations. At data-intensive companies like Fiverr and Netflix, Kafka is used simultaneously as: a database a queue for ordered processing a tool for sharing data between different teams the backup tool that can fix any other system if a catastrophic failure occurs a platform\" \/>\n<meta property=\"og:url\" content=\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\" \/>\n<meta property=\"og:site_name\" content=\"Software Engineering Daily\" \/>\n<meta property=\"article:published_time\" content=\"2018-06-09T04:06:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-06-12T16:58:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Erika Hokanson\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@erikawh0\" \/>\n<meta name=\"twitter:site\" content=\"@software_daily\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Erika Hokanson\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\"},\"author\":{\"name\":\"Erika Hokanson\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/f2978d8d575ab5101209db18a924d6f1\"},\"headline\":\"Meet Apache Kafka\",\"datePublished\":\"2018-06-09T04:06:44+00:00\",\"dateModified\":\"2018-06-12T16:58:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\"},\"wordCount\":1413,\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1\",\"keywords\":[\"Apache Kafka\",\"Confluent\",\"Fiverr\",\"Neha Narkhede\",\"Netflix\",\"streaming data\"],\"articleSection\":[\"All Content\",\"Exclusive Articles\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\",\"url\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\",\"name\":\"Meet Apache Kafka - Software Engineering Daily\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1\",\"datePublished\":\"2018-06-09T04:06:44+00:00\",\"dateModified\":\"2018-06-12T16:58:23+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1\",\"width\":1200,\"height\":1200},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/softwareengineeringdaily.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Meet Apache Kafka\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"name\":\"Software Engineering Daily\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\",\"name\":\"Software Engineering Daily\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1\",\"width\":549,\"height\":169,\"caption\":\"Software Engineering Daily\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/software_daily\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/f2978d8d575ab5101209db18a924d6f1\",\"name\":\"Erika Hokanson\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/51d5aed3ae76c47e424cdd3e8f76fe84?s=96&d=retro&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/51d5aed3ae76c47e424cdd3e8f76fe84?s=96&d=retro&r=pg\",\"caption\":\"Erika Hokanson\"},\"sameAs\":[\"https:\/\/erikawho.com\",\"https:\/\/x.com\/erikawh0\"],\"url\":\"https:\/\/softwareengineeringdaily.com\/author\/erikahokanson\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Meet Apache Kafka - Software Engineering Daily","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/","og_locale":"en_US","og_type":"article","og_title":"Meet Apache Kafka - Software Engineering Daily","og_description":"Kafka has become a central tool for data at many large organizations. At data-intensive companies like Fiverr and Netflix, Kafka is used simultaneously as: a database a queue for ordered processing a tool for sharing data between different teams the backup tool that can fix any other system if a catastrophic failure occurs a platform","og_url":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/","og_site_name":"Software Engineering Daily","article_published_time":"2018-06-09T04:06:44+00:00","article_modified_time":"2018-06-12T16:58:23+00:00","og_image":[{"width":1200,"height":1200,"url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1","type":"image\/png"}],"author":"Erika Hokanson","twitter_card":"summary_large_image","twitter_creator":"@erikawh0","twitter_site":"@software_daily","twitter_misc":{"Written by":"Erika Hokanson","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#article","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/"},"author":{"name":"Erika Hokanson","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/f2978d8d575ab5101209db18a924d6f1"},"headline":"Meet Apache Kafka","datePublished":"2018-06-09T04:06:44+00:00","dateModified":"2018-06-12T16:58:23+00:00","mainEntityOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/"},"wordCount":1413,"publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1","keywords":["Apache Kafka","Confluent","Fiverr","Neha Narkhede","Netflix","streaming data"],"articleSection":["All Content","Exclusive Articles"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/","url":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/","name":"Meet Apache Kafka - Software Engineering Daily","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1","datePublished":"2018-06-09T04:06:44+00:00","dateModified":"2018-06-12T16:58:23+00:00","breadcrumb":{"@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#primaryimage","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1","width":1200,"height":1200},{"@type":"BreadcrumbList","@id":"https:\/\/softwareengineeringdaily.com\/2018\/06\/08\/meet-apache-kafka\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/softwareengineeringdaily.com\/"},{"@type":"ListItem","position":2,"name":"Meet Apache Kafka"}]},{"@type":"WebSite","@id":"https:\/\/softwareengineeringdaily.com\/#website","url":"https:\/\/softwareengineeringdaily.com\/","name":"Software Engineering Daily","description":"","publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/softwareengineeringdaily.com\/#organization","name":"Software Engineering Daily","url":"https:\/\/softwareengineeringdaily.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1","width":549,"height":169,"caption":"Software Engineering Daily"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/software_daily"]},{"@type":"Person","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/f2978d8d575ab5101209db18a924d6f1","name":"Erika Hokanson","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/51d5aed3ae76c47e424cdd3e8f76fe84?s=96&d=retro&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/51d5aed3ae76c47e424cdd3e8f76fe84?s=96&d=retro&r=pg","caption":"Erika Hokanson"},"sameAs":["https:\/\/erikawho.com","https:\/\/x.com\/erikawh0"],"url":"https:\/\/softwareengineeringdaily.com\/author\/erikahokanson\/"}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/06\/apache-kafka.png?fit=1200%2C1200&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p7GuoD-1l9","_links":{"self":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/5155"}],"collection":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/comments?post=5155"}],"version-history":[{"count":0,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/5155\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media\/5159"}],"wp:attachment":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media?parent=5155"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/categories?post=5155"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/tags?post=5155"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}