Producer
Producers write data to Topics. Producers know to which Partition to write to and which Broker has it.
In case of Broker failures, Producers will automatically recover.
Kafka scales well because Topics are split across multiple Partitions.
Each Topic will receive messages from one or more Producers.
Producers can choose to send a Message Key with the Message.
linger.ms forces the producer to wait to send messages, hence increasing the chance of creating batches
References
Flashcards
Producers do this::: write data to Topics
Producers know::: This knows which Partitions to write to and which Broker has that partition
Topics scale by::: being split across multiple Partitions
Producers will automatically recover::: In case of Broker failures
A Topic receives messages from::: one or more Producers
Producers can choose to send a Message Key with the Message
A Kafka producer application wants to send log messages to a topic that does not include any key. What are the properties that are mandatory to configure for the producer configuration? (select three)
- bootstrap.servers
- partition
- value
- key.serializer
- key
- value.serializer
? - bootstrap.servers
- key.serializer
- value.serializer
What setting increases the chance of batching for a Kafka Producer?:: linger.ms
To enhance compression, I can increase the chances of batching by using:: linger.ms forces the producer to wait before sending messages, hence increasing the chance of creating batches that can be heavily compressed.
A customer has many consumer applications that process messages from a Kafka topic. Each consumer application can only process 50 MB/s. Your customer wants to achieve a target throughput of 1 GB/s. What is the minimum number of partitions will you suggest to the customer for that particular topic?:: each consumer can process only 50 MB/s, so we need at least 20 consumers consuming one partition so that 50 * 20 = 1000 MB target is achieved.
Your producer is producing at a very high rate and the batches are completely full each time. How can you improve the producer throughput? (select two)
- Decrease linger.ms
- Increase batch.size
- Disable compression
- Decrease batch.size
- Enable compression
- Increase linger.ms
? - Increase batch.size
- Enable compression
batch.size controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. Enabling compression can also help make more compact batches and increase the throughput of your producer. Linger.ms will have no effect as the batches are already full
What is the risk of increasing max.in.flight.requests.per.connection while also enabling retries in a producer?:: Message order not preserved
You are sending messages with keys to a topic. To increase throughput, you decide to increase the number of partitions of the topic. Select all that apply.
- New records with the same key will get written to the partition where old records with that key were written
- All the existing records will get rebalanced among the partitions to balance load
- Old records will stay in their partitions
- New records may get written to a different partition
? - Old records will stay in their partitions
- New records may get written to a different partition
Increasing the number of partition causes new messages keys to get hashed differently, and breaks the guarantee "same keys goes to the same partition". Kafka logs are immutable and the previous messages are not re-shuffled
A kafka topic has a replication factor of 3 and min.insync.replicas setting of 2. How many brokers can go down before a producer with acks=all can't produce?:: 1 acks=all and min.insync.replicas=2 means we must have at least 2 brokers up for the partition to be available
To prevent network-induced duplicates when producing to Kafka, I should use:: enable.idempotence=true Producer idempotence helps prevent the network introduced duplicates
The rule "same key goes to the same partition" is true unless...:: the number of partition changes. Increasing the number of partition causes new messages keys to get hashed differently, and breaks the guarantee "same keys goes to the same partition". Kafka logs are immutable and the previous messages are not re-shuffled.
If I supply the setting compression.type=snappy to my producer, what will happen? (select two)
- The Consumers have to compress the data
- The Producers have to compress the data
- The Kafka brokers have to compress the data
- The Consumers have to de-compress the data
- The Kafka brokers have to de-compress the data
? - The Producers have to compress the data
- The Consumers have to de-compress the data
Kafka transfers data with zero copy and no transformation. Any transformation (including compression) is the responsibility of clients.
You want to send a message of size 3 MB to a topic with default message size configuration. How does KafkaProducer handle large messages?:: MessageSizeTooLarge exception will be thrown, KafkaProducer will not retry and return exception immediately
To produce data to a topic, a producer must provide the Kafka client with...:: any broker from the cluster and the topic name. All brokers can respond to a Metadata request, so a client can connect to any broker in the cluster and then figure out on its own which brokers to send data to.
What happens if you write the following code in your producer? producer.sendThroughput will be decreased Using Future.get( to wait for a reply from Kafka will limit throughput.
You are receiving orders from different customer in an "orders" topic with multiple partitions. Each message has the customer name as the key. There is a special customer named ABC that generates a lot of orders and you would like to reserve a partition exclusively for ABC. The rest of the message should be distributed among other partitions. How can this be achieved?:: Create a custom partitioner. A Custom Partitioner allows you to easily customise how the partition number gets computed from a source message.
A kafka topic has a replication factor of 3 and min.insync.replicas setting of 2. How many brokers can go down before a producer with acks=1 can't produce?:: 2 min.insync.replicas does not impact producers when acks=1 (only when acks=all)
A topic has three replicas and you set min.insync.replicas to 2. If two out of three replicas are not available, what happens when a produce request with acks=all is sent to broker?:: NotEnoughReplicasException will be returned. With this configuration, a single in-sync replica becomes read-only. Produce request will receive NotEnoughReplicasException.
client.id:: Logical identifier for the client and the application it is used in.
acks:: Controls how many partition replicas must receive the record before the producer can consider the write successful
max.block.ms:: This parameter controls how long the producer may block when calling send() and when explicitly requesting metadata via partitionsFor(). A TimeoutException is thrown when reached
deliver.timeout.ms:: This configuration will limit the amount of time spent from the point a record is ready for sending until either broker responds or the client gives up, including time spent on retries.
request.timeout.ms:: This parameter controls how long the producer will wait for a reply from the server when sending data. Note that this is the time spent waiting on each producer request before giving up, it does not include retries. If this timeout is reached without reply, the producer will retry or complete the callback with a TimeoutException
retries and retry.backoff.ms:: When the producer receives an error message from the server, the error could be transient. In this case, the value of the retries parameter will control how many times the producer will retry sending the message before giving up and notifying the client of an issue. By default, the producer will wait 100ms between retries, but you can control this using the retry.backoff.ms parameter
linger.ms:: Controls the amount of time to wait for additional messages before sending the current batch.
buffer.memory:: This config sets the amount of memory the producer will use to buffer messages waiting to be sent to brokers
compression.type:: Byu default messages are sent uncompressed. The parameter can be set to snappy, gzip, lz4, zstd
batch.size:: When multiple records are sent to the same partition, the producer will batch them together. This parameter controls the amount of memory in bytes that will be used for each batch.
max.in.flight.requests.per.connection:: This controls how many message batches the producer will send to the server without receiving responses. Setting the retries parameter to nonzero and max.in.flgiht.requests.per.connection to more than 1 means that it is possible that the broker will fail on the first batch but complete the second batch, committing the batches out of order
max.request.size:: This setting controls the size of a produce request sent by the producer. It caps both the size of the largest message that can be sent and the number of messages that the produce can send in one request. The default is 1MB
receive.buffer.bytes and send.buffer.bytes:: These are the sizes of the TCP send and receive buffers used by the sockets when writing and reading data. If these are set to -1 the OS defaults will be used.
enable.idempotence:: When enabled, the producer will attache a sequence number to each record it sends. If the broker receives records with the same sequence number, it will reject second copy and the producer will receive the harmless DuplicateSequnceException. Enabling this requires max.in.flight.requests.per.connection to be equal to 5 or less, retries to be greater than 0 and acks=all. If incompatible values are set, a ConfigException will be thrown