Get the latest tech news

Why was Apache Kafka created?


The story behind how LinkedIn created Apache Kafka

LinkedIn used site activity data(e.g. someone liked this, someone posted this) for many things - tracking fraud/abuse, matching jobs to users, training ML models, basic features of the website (e.g who viewed your profile, the newsfeed), warehouse ingestion for offline analysis/reporting and etc. The system would then write these to aggregate files, copy them to ETL servers, parse & transform the XML and finally load it into the warehouse infrastructure consisting of a relational Oracle database and Hadoop clusters. What surprised me precisely in this paper is that it, 13+ years ago, described a problem for which LinkedIn literally invented Kafka and auxiliary systems to solve, then took their time to write out the universal schema solution really well:

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Apache Kafka

Apache Kafka