0% found this document useful (0 votes)
72 views4 pages

KAAS Design

The document outlines requirements for a fully managed Kafka service (KaaS). It would allow clients to publish and consume data from Kafka without managing the complex infrastructure. The service would support multiple tenants, automatic scaling, encryption of data and communication, and high availability even if some clients misbehave or fail. It separates control and data planes, with the control plane managing provisioning and the data plane handling message publishing and subscriptions.

Uploaded by

hareendrareddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views4 pages

KAAS Design

The document outlines requirements for a fully managed Kafka service (KaaS). It would allow clients to publish and consume data from Kafka without managing the complex infrastructure. The service would support multiple tenants, automatic scaling, encryption of data and communication, and high availability even if some clients misbehave or fail. It separates control and data planes, with the control plane managing provisioning and the data plane handling message publishing and subscriptions.

Uploaded by

hareendrareddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1 Problem statement:

2
3 As an application developer, I want Confluent to provide me with a fully managed Kafka environment
4 (KaaS) so I can point my current Kafka producer and consumer clients at KaaS and forget about the
5 complexities of operating a complex stateful system.
6

7 Functional requirements:
8 1. System has to support multiple tenants.
9 2. Client can register on Confluent with initial information such as topic names, data retention
10 policy and replication policy.
11 3. Clients can tune the provided initial config.
12 4. Clients can create a topic and publish data into it.
13 5. Clients can subscribe to a topic and read data from it.
14 6. Clients can provision more capacity for read and writes.
15 7. Provide clients to view the important metrics via dashboards
16 8. Billing and metering for clients depending on usage.

17 Non- functional requirements:


18 1. Overall onboarding for a producer or client has to be less than < 5 minutes.
19 2. Depending on the traffic increase/decrease, clusters need to be scaled or descaled.
20 3. Data transportation via communication and data storage will have to be encrypted.
21 4. Misbehaving clients and failures for few clients should not affect other clients.
22 5. Horizontally scalable

23 Assumptions:
24 High level architecture:

25
26

27 Components
28
29 Confluent front end:
30 Entry point for the application is these front-end systems acting as a proxy fleet. This is responsible for
31 routing the requests to appropriate service, Authentication of the users , Rate limiting , request
32 validation and request deduplication .
33
34 Control plane:
35 In the current system there are roughly two types of different requests: the regular requests for
36 publish and subscriptions and the occasional requests to change meta information about the system
37 such as registering topics/provisioning more hardware. The latter type of requests will be handled by
38 the components in the control plane and former type of requests will be handled by the components
39 in data plane (Addressed in next section). Since each requests have different types of availability and
40 scalability requirements a clean separation is maintained. For background activities such as resource
41 clean ups , automations will be orchestrated by the control plane. Necessary infrastructure can be
42 expressed in YAML files for easy reproduction of the resources.
43
44 Data plane:
45 Data plane hosts the components which are on the primary paths of publishing and subscribing
46 messages. Components hosted as part of this will be data storage, data ingestion , key management
47 and data serving components.
48
49 Detailed design
50

51
52
53
54 Key decisions
55 Separation of control activities from the core activities
56 Control activities such as provisioning more hardware, registering new topics, registering new
57 subscriptions, deleting topics and cleanup activities to address fault tolerance are separated as they
58 have different scalability and availability requirements. Background activities for cluster cleanup,
59 monitoring health of the machines, security patchups, disaster recovery will be orchestrated by the
60 control plane’s infra management service.
61
62 Security and compliance
63 All communication channels between clients and KAAS will be secure with usage of TLS. Data stored on
64 the file system will be encrypted with keys procured from Key management service. Keys will be
65 periodically rotated.
66
67 Multi tenant clients
68 Depending on the client preferences either clients can prefer hosting their respective data in a
69 separate vpc or share with other tenants. In case of sharing with other tenants, additional reliability is
70 achieved with shuffle sharding. Errors and failures will be contained to the level of part of client’s data
71 or a specific client.
72
73 Redundancy
74 Redundancy will be achieved by enabling additional back up machines. Data will be replicated on all
75 replica machines and by default replication can be enabled up to 3 availability zones. In case of
76 multiple consumers and increased read throughput , additional read replicas can be provisioned.
77
78 APIs
79 Control plane apis:
80 1. createTopic
81 a. Request – userKey, topicName
82 b. Response – Http response code
83 2. updateConfiguration
84 a. Request – userKey , config to change (Throughput , Data retention ) and new value
85 b. Response – Http response code
86 3. deleteTopic
87 a. Request - userKey , topicName
88 b. Response – Http response code
89 Data plane apis:
90 1. publishData
91 a. Request – userKey , topicName , payload
92 c. Response- Http response code
93 2. batchPublish
94 a. Request – userKey , topicName , batch payload
95 d. Response- Http response code
96 3. readData
97 a. Request – userKey , topicName , offset
98 b. Response – Http response code
99
100
101 Data model
102
103 ClientInfo Table
104 1. Client id
105 2. Topic name
106 3. Data retention time
107 4. Other meta information
108
109 Subscriptions Table
110 1. Subscription Id
111 2. Topic name
112 3. Consumer Client Id
113
114 ConsumerInfo Table
115 1. Subscription Id
116 2. Offset
117
118 Data (Embedded database in broker machines)
119 1. Offset
120 2. payload

You might also like