202302211109 - Apache Kafka Series - Learn Apache Kafka for Beginners V3
Introduction
Used to decouple direct integrations between many different systems, lowering the total complexity of a system. Apache Kafka. This allows more effective scaling.
A distributed, resilient, and fault tolerant streaming platform. Has good Horizontal Scaling and high performance.
- Used as a messaging system
- Activity Tracker
- Aggregate metrics
- Application logs
- Stream processing
- Decoupling system dependencies
- Big data
- Microservices pub/sub
Kafka is only used as a transportation mechanism
Introduction to Apache Kafka
Kafka Configuration and Setup
Data Processing with Kafka Streams
Kafka Connect
Flashcards
To import data from external databases, I should use::: Kafka Connect Source. Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.
You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?:: 2 You cannot have more sink tasks (= consumers) than the number of partitions, so 2.
You are using JDBC source connector to copy data from 3 tables to three Kafka topics. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?:: 2 here, we have three tables, but the max.tasks is 2, so that's the maximum number of tasks that will be created
You are using JDBC source connector to copy data from a table to Kafka topic. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?:: 1 JDBC connector allows one task per table
- Producer Performance Tuning
- Consumer Performance Tuning
- Broker Performance Tuning
- Resource Allocation
- Load Balancing
Fault Tolerance and High Availability
- Failover Mechanisms
- Data Replication Strategies
- Handling Node Failures
Data Management and Retention
- Data Retention Policies
- Log Compaction
- Topic Cleanup Policies
Integrations and Ecosystem
- Kafka Connectors
- Integration with Big Data Tools (Hadoop, Spark, etc.)
- Integration with Cloud Services (AWS MSK, Azure Event Hubs for Kafka, Google Cloud Pub/Sub)
Use Cases and Patterns
- Common Use Cases (Event Sourcing, Messaging, Log Aggregation, Stream Processing)
- Design Patterns for Kafka
Kafka Internals
- Kafka Log Structure
- Internal Data Structures
- Kafka Controller