202302211109 - Apache Kafka Series - Learn Apache Kafka for Beginners V3

Introduction

Used to decouple direct integrations between many different systems, lowering the total complexity of a system. Apache Kafka. This allows more effective scaling.

A distributed, resilient, and fault tolerant streaming platform. Has good Horizontal Scaling and high performance.

Used as a messaging system
Activity Tracker
Aggregate metrics
Application logs
Stream processing
Decoupling system dependencies
Big data
Microservices pub/sub

Kafka is only used as a transportation mechanism

Introduction to Apache Kafka

Kafka Architecture

Producers

Consumers

Kafka CLI Tools

Kafka Configuration and Setup

Data Processing with Kafka Streams

Schema Management

Kafka Connect

Flashcards

To import data from external databases, I should use::: Kafka Connect Source. Kafka Connect Sink is used to export data from Kafka to external databases and Kafka Connect Source is used to import from external databases into Kafka.
You want to sink data from a Kafka topic to S3 using Kafka Connect. There are 10 brokers in the cluster, the topic has 2 partitions with replication factor of 3. How many tasks will you configure for the S3 connector?:: 2 You cannot have more sink tasks (= consumers) than the number of partitions, so 2.
You are using JDBC source connector to copy data from 3 tables to three Kafka topics. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?:: 2 here, we have three tables, but the max.tasks is 2, so that's the maximum number of tasks that will be created
You are using JDBC source connector to copy data from a table to Kafka topic. There is one connector created with max.tasks equal to 2 deployed on a cluster of 3 workers. How many tasks are launched?:: 1 JDBC connector allows one task per table

Cluster Management

Additional Tools and Integrations

Confluent REST Proxy

Security

Authentication
Authorization
Encryption (SSL/SASL)
Kafka ACLs (Access Control Lists)

Monitoring and Metrics

Kafka Monitoring Tools
Metrics Collection
Logging
Kafka JMX (Java Management Extensions)

Performance Tuning and Optimization

Producer Performance Tuning
Consumer Performance Tuning
Broker Performance Tuning
Resource Allocation
Load Balancing

Fault Tolerance and High Availability

Failover Mechanisms
Data Replication Strategies
Handling Node Failures

Data Management and Retention

Data Retention Policies
Log Compaction
Topic Cleanup Policies

Integrations and Ecosystem

Kafka Connectors
Integration with Big Data Tools (Hadoop, Spark, etc.)
Integration with Cloud Services (AWS MSK, Azure Event Hubs for Kafka, Google Cloud Pub/Sub)

Use Cases and Patterns

Common Use Cases (Event Sourcing, Messaging, Log Aggregation, Stream Processing)
Design Patterns for Kafka

Kafka Internals

Kafka Log Structure
Internal Data Structures
Kafka Controller