Lesson

  1. Kafka Connect is a framework for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large data collections into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.
  2. Kafka Replication means having multiple copies of the data, spread across multiple servers/brokers. This helps in maintaining high availability in case one of the brokers goes down and is unavailable to serve the requests. Data Replication helps prevent data loss by writing the same data to more than one broker. In Kafka, replication means that data is written down not just to one broker, but many. The replication factor is a topic setting and is specified at topic creation time. This replication factor is configured at the topic level, and the unit of replication is the topic partition.
  3. In Apache Kafka, producers and consumers exchange messages in the form of key-value pairs. When working with Kafka, it's essential to serialize the data into bytes before sending it to Kafka and deserialize it back into its original format when consuming messages. Kafka allows us to use custom serializers and deserializers for keys and values even though  Publisher API provides serializers like IntegerSerializer, StringSerializer etc, same sense of deserializer. The serializer is used by the message publisher while deserializer is used by the message consumer. In short-form it refers as Kafka SerDe
  4. Applications that read data from Kafka topics are known as consumers. Applications integrate a Kafka client library to read from Apache Kafka. Excellent client libraries exist for almost all programming languages that are popular today including Python, Java, Go, and others.