0% found this document useful (0 votes)
28 views66 pages

One Note Rakshit

Uploaded by

Harsh Kataria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views66 pages

One Note Rakshit

Uploaded by

Harsh Kataria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

● ---------------Solution for Clickstream

Migration-----------------------

Artifacts

PRD link
ARD –
Figma –
Teams/POC involved iOS, DP Team, Android, Analytics
GTM Plan Mid September release with XP based
rollout
Gantt –
QA/E2E cases

Sr.No Reviewer Status Date Commen


ts
1 Balvinder Approved
Gambhir
Sambuddha Dhar Approved
2
Arun Sharma Not started
3
Farhan Rasheed Under
4 review

Problem Statement:

Currently, there is a significant loss of impression and click analytics events, resulting in
an average of 15-20% of generated events on the app not reaching the server. In rare
instances, the event loss has reached as high as 70-80% in the past, causing difficulties
for the analytics team in their analysis and preventing the rollout of certain features.

Migrate the medium of events flowing from GTM SDK to Swiggylytics SDK which can
handle batches of 100, 500, 1000, 2000 events at once as well. There is no limitation of
hits during launch and quota getting exhausted post a certain threshold in Swiggylytics.

Further details - GTM to Swiggylytics migration of clickstream events

Figure 1: Current flow of events (Swiggylytics)


Currently, android uses real-time queue for ad-events and batched queue for attribution
events. While on iOS, both ad-events and attribution events are passed via real-time
queue.

Existing system:

The current system of sending events is divided into ways.

FirebaseGTM

All the events inside the app flow via FirebaseGTM. This flow has a major drawback of
rate limit i.e. 60 hits after that it recharges at the rate of 1 event for every 2 seconds. This
often leads to event loss and it requires its own specific handling to manually send
events via GET Api. The data flow for sending data via FirebaseGTM also attributes to
extra cost incurred while transforming data using Rill Job. Another major disadvantage
that has been observed is its event size limit. In a situation where the event size can be
pretty huge, maybe due to large context or object value or maybe we are doing an
internal batching to send multiple events at once in the form of the array, it’s very likely to
fail since it sends data via GET API through query params. Refer to this doc for more on
this.

Swiggylytics

Swiggylytics is our own inhouse SDK to manage the event flow. With the current
implementation, we are only using this to send certain high priority events such as ads
events as well as cart/menu attribution events. The major advantage of this SDK is its
handling of the event lifecycle till it’s finally dispatched successfully. We can use this to
send events in the form of batches hereby reducing the number of API hits significantly.
Another major advantage of this it’s capability where we can have the Rill Job
transformation logic on the client side itself, where we do the mapping from
abbreviations to column names such as ov =====> object_value, calculation of
GeoHash as well as dealing with static values. This helps us skip the Rill Job layer on
the server end and leads to major cost savings.

Proposed Approaches:

Using the existing swiggylytics, we refractor to code to meet the requirement.

Approach 1:

Figure 2: Approach 1 Proposed flow of events


This approach uses the existing table and simply adds on top of it. The changes
required for this are below:

Creating a new object with field names similar to the clickstream table.

Updating the existing event trigger function and adding XP based parameters to
process the config fallback case.

Config bypass mechanism to process all events. Currently, only events specified in the
config file are sent via Swiggylytics and all other events are rejected in ‘Event Validator’.
The logic for the same would be updated to support all events, while prioritizing config
events.

Addition of extra columns in local storage to classify the event priority as config/
non-config. Since, we intend to prioritize config events, we need to classify them
accordingly in local storage.

Failure handling in a serial queue to make sure all events get equal no of retries.

Dispatching events based on their priority.

Updating the existing DB queries to add support for priority configurations.

Changing the existing queue Data structure to PriorityQueue to support config/


non-config separately.

Creating transformers to transform the keys to column names as well as GeoHash


calculation.

Pros:

Faster execution of the project.

Since there will be one single queue, it will be easier to maintain for processes like
batching, purging, etc.

Cons:

The table needs to be maintained properly for config/non-config events as they both
serve different priorities.

Future debugging will be more tedious.


Retry operations/ reading from the table should be done carefully.

Approach 2 (Selected):

Figure 3: Approach 2 - Proposed flow of events (Swiggylytics)

This approach creates a new Database. The changes required for this are below:

Creating a new database which interacts with the table of similar structure and uses the
same DAO.

Updating the existing event trigger function and adding XP based parameters to
process the config fallback case.

Config bypass mechanism to process all events.


A new EventStorage to interact with the NonConfig DB.

Create separate queues for config/ non-config events.

Handling of beacon service(timer based event dispatch)

Creating transformers to transform the keys to column names as well as GeoHash


calculation.

Pros:

Segregation of concern - The code will remain clean and quite segregated making it
easier to maintain in the longer run.

In case we were to add a different priority set of events in future, we can build on top of
this architecture.

Ease of mitigating issues - In case of any issues with the new implementation, it will only
impact the new events moved to Swiggylytics (non-config). Config events will flow as is.

Keeping separate queues would add separation of concern where the older flow won’t
be hampered and the new process would run independently.

Cons:

Having two queues will introduce more complexity and would require their own purging,
batching operation etc.

Android side Solution:

Storage:

We’ll be creating a separate database for the non-config events namely


“swiggylytics-database”(SwiggylyticsDatabase). Current events are stored under
“analytics-database”(AppDatabase). We would use the existing table “event” as well
as the same DAO to interact with the database. This would allow us to reuse most of the
components while maintaining a separate Database. The core functionality of interaction
with the database remains the same even for the non-config events, the real difference
happens with the method of interaction which currently exists under the interface
IEventStorage. We would need to implement that interface in another class which will
hold the handling for the interaction with the non-config database. To accommodate the
needs, we would also add another column to the table is_non_config. This would affect
the existing table in terms of schema but it wouldn’t require any change in the
consumption end as it will only be used while interacting with NonConfig events via
changes in DAO as well.

Event Flow till separation:

The flow of events in the starting phase will remain the same. Currently, we send the
event via SwiggylyticsEventHandler, which further sends the event via
SwiggyLyticsEventHandler which is a custom implementation of HandlerThread. From
here on, the data is processed in the background thread. SwiggyLyticsEventHandler
further sends the data to EventManager. Here, we have the handling for RealTime and
Batched events, with another logging an unsupported event type. Here, we add the
event to a NonConfigEventsQueue. We also update the eventType enum for this as
NonConfig which will help us further down the line when we retrieve data from DB and
to manage the lifecycle more cleanly.
Interface Updates:

IDispatcher:

○ Observable<Event> subscribeNonConfigSendingFailed();
○ void dispatchNonConfigBatch(Batch batch);
○ Batch getDispatchedPendingNonConfigBatchList();
○ Observable<Batch> subscribeNoPendingNonConfigInDispatch();
● IEventManager:

○ Observable<Batch> subscribeNonConfigBatchAvailable();
○ LinkedBlockingDeque<Event> getNonConfigEventsQueue();
○ int getNonConfigEventsCountInQueue();
● IEventStorage:

○ void removeNonConfigOrphans();
○ Observable<List<EventTable>> getNonConfigEvents();
○ Observable<List<EventTable>> getLimitNonConfigEvents(int limit);

The above mentioned functions are additions on the already existing interfaces. The
idea is to reuse most of the existing codebase and update wherever required. These
functions are basically counterparts to already existing RealTime and NonRealTime
functions with very similar implementations, the only difference is in the manipulation
front of events, which queues we’re dealing with, which DB to interact with, which states
are the updates made. For more context please look at the below example.

In the below function, we get all the real time events from AppDatabase for the given
limit and also update the database by setting it’s property of is_in_memory as true for
the extracted events.

public Observable<List<EventTable>> getLimitRealTimeEvents(int limit) {

return Observable.fromCallable(() -> {

List<EventTable> eventTableList = new ArrayList<>();

eventTableList.addAll(appDatabase.eventDao().getLimitRealTime(limit));
updateEventsInMemory(eventTableList, true);

return eventTableList;

}).subscribeOn(Schedulers.io());

A similar implementation for getLimitNonConfigEvents(int limit); will happen in


NonConfigEventStorage which deals with the NonConfigDatabase.

public Observable<List<EventTable>> getLimitNonConfigEvents(int limit)


{

return Observable.fromCallable(() -> {

List<EventTable> eventTableList = new ArrayList<>();

eventTableList.addAll(nonConfigDatabase.eventDao().getLimitNonConfig(lim
it));

updateEventsInMemory(eventTableList, true);

return eventTableList;

}).subscribeOn(Schedulers.io());

}
As it can clearly be seen, the implementation remains very similar with the only change
being the point of interaction. All the other functions are very similar to this where the
core logic is retained and only the point of interaction/ scope is updated.

Beacon Service:

At all instances, two timers run the application for realTime and nonRealTime events,
with its debounce value configured via the config.json file. For our use case, we are
using the config values the same as RealTime, hence to reduce the complexity, we will
be using the realTimeTimer to trigger the batch. Unlike the usual implementation where
we trigger the batch when we reach the batch limit, we have a limiting of 1 when events
are triggered from the timer, i.e. as long as we have even a single event in the queue
and the timer has reached it’s trigger point we will send that solo event.
Event Dispatch:
Rill Job:

A Job runs on the server end which consumes the events we send, transforms them
into DB readable format, inserts some values and pushes the data further. For e.g.
please look at the below event request from FirebaseGTM.

https://100.64.1.11/client/metric/event/gtm?e=impression&sn=HUD-related_
screen&on=launch-api-response&ov=%7Bstatus_code%3D0%2C-has_track
able_orders%3Dfalse%2C-has_feedback%3Dtrue%7D&op=9999&ui=608348
20&us=8nj14621-0368-497a-935d-eb16c84304b5&ud=d7fc6f8611e28784&p=
an&cx=-&av=1999&sqn=6563&sc=direct&rf=-&lt=30.7370204&lg=76.639873
6&ts=1691553316775&itd=true&exp=-&gtmcb=1539541086

The data that's being sent is in the format of abbreviation, such as event_name is
denoted as e, screen_name is denoted as sn, user_id as ui, etc. The Rill job has a
transformation logic written where it transforms this data to appropriate format.

func createGTMEvent(o RequestParams) interface{} {

return gtmEvent{

Version: "1",

Component: "GTM",

AppVersionCode: o.getParamM([]string{"av", "app-version-code"}, ""),

UserID: o.getParamM([]string{"ui", "user-id"}, ""),

Sid: o.getParamM([]string{"us", "user-sid"}, ""),

Tid: o.getParamM([]string{"user-tid"}, ""),

Context: o.getParamM([]string{"cx", "context"}, ""),

Referral: o.getParamM([]string{"rf", "referral"}, ""),


Latitude: o.getParamM([]string{"lt", "latitude"}, ""),

Longitude: o.getParamM([]string{"lg", "longitude"}, ""),

DeviceID: o.getParamM([]string{"ud", "user-deviceID"}, ""),

Platform: o.getParamM([]string{"p", "platform"}, ""),

IPAddress: o.getHeaderM([]string{"HTTP_X_FORWARDED_FOR",
"Remote_Addr"}, ""),

UserAgent: o.getHeaderM([]string{"User-Agent"}, ""),

EventName: o.getParamM([]string{"e", "event"}, ""),

ClientTimeStamp: o.getParamM([]string{"ts", "timestamp"}, "0"),

ObjectName: o.getParamM([]string{"on", "object-name"}, ""),

ObjectValue: o.getParamM([]string{"ov", "object-value"}, ""),

ObjectPosition: o.getParamM([]string{"op", "object-position"},


"9999"),

ScreenName: o.getParamM([]string{"sn", "screen-name"}, ""),

Source: o.getParamM([]string{"sc", "source"}, ""),

SequenceNumber: o.getParamM([]string{"sqn", "sequence-number"},


"-9999"),

ExtraParams: o.getParamM([]string{"exp"}, ""),

ServerTimeStamp: o.getParamM([]string{},
time.Now().Format(serverTimestampFormat)),

SystemTime: nanosToMillis(time.Now().UnixNano())} //converting nano


to millis

As it can be seen in the above function, it picks up the data by the abbreviation and
transforms it into their own struct and further transforms it into the column name format
as seen below.

SELECT

CastAnythingToString(header.uuid) AS uuid,

SidDecoder(event.sid, header.`timestamp`) AS time_stamp,

header.eventId AS event_id,

CastAnythingToString(event.version) AS version,

CastAnythingToString(event.component) AS component,

CastAnythingToString(event.appVersionCode) AS app_version_code,

CastAnythingToString(event.appVersionCode) AS appVersionCode,

CastAnythingToString(event.userId) AS user_id,

CastAnythingToString(event.userId) AS userId,

CastAnythingToString(event.sid) AS sid,

CastAnythingToString(event.tid) AS tid,

CastAnythingToString(event.context) AS context,

CastAnythingToString(event.referral) AS referral,

CastAnythingToString(event.deviceId) AS device_id,

CastAnythingToString(event.deviceId) AS deviceId,

CastAnythingToString(event.platform) AS platform,

CastAnythingToString(event.ipAddress) AS ip_address,

CastAnythingToString(event.ipAddress) AS ipAddress,

CastAnythingToString(event.userAgent) AS user_agent,
CastAnythingToString(event.userAgent) AS userAgent,

case

when event.eventName = 'swiggy_screen_view' then 'screen-view'

when event.eventName = 'device_details' then 'device-details'

else CastAnythingToString(event.eventName)

end AS eventName,

CastAnythingToString(event.eventName) AS event_name,

ClientTSConversion(event.clientTimeStamp) AS client_timestamp,

CastAnythingToString(event.clientTimeStamp) AS clientTimeStamp,

CastAnythingToString(event.sequenceNumber) AS sequenceNumber,

CastAnythingToString(event.objectName) AS object_name,

CastAnythingToString(event.objectName) AS objectName,

CastAnythingToString(event.screeName) AS screen_name,

CastAnythingToString(event.screeName) AS screeName,

CastAnythingToString(event.objectValue) AS object_value,

CastAnythingToString(event.objectValue) AS objectValue,

ConvertToLong(event.objectPosition, Cast(9999 AS bigint)) AS


object_position,

CastAnythingToString(event.objectPosition) AS objectPosition,

CastAnythingToString(header.schemaVersion) AS schema_version,

CastAnythingToString(header.schemaVersion) AS schemaVersion,
ConvertToLong(event.sequenceNumber, Cast(-9999 AS bigint)) AS
sequence_number, event.systemTime AS systemTime,

CastAnythingToString(event.serverTimeStamp) AS server_timestamp,

CastAnythingToString(event.serverTimeStamp) AS serverTimeStamp,

CastAnythingToString(event.source) AS source,

CastAnythingToDouble(event.latitude) AS latitude,

CastAnythingToDouble(event.longitude) AS longitude,

GetGeoHash(

CastAnythingToDouble(event.latitude),

CastAnythingToDouble(event.longitude)

) AS geo_hash,

header._server_time_stamp AS _server_time_stamp,

event.extraParams as extra_params

from

GTMEvent

If looked at it carefully, it can be seen that certain columns such as geo_hash,


component, version. Are being either statically populated or computed at the Rill Job
end. Since, our upgrade removes the usage of Rill Job, we would be required to
compute these values on our end itself. This is where our transformer comes in.

Transformer:

The final step of sending events via the new Pipeline is the transformation job where we
transform the column name abbreviation to original column names as well as the header
data to make sure it gets populated in the new table only. This is to be done as the final
step as there are no dependencies to it earlier and the only point of its requirement is
when we send the event via API.
Testing:

Client side testing:

The major concern while moving forward to this change is to make sure that the data
loss is kept to a minimum and the system survives any level of stress situations where
we pass in bulk of events and make sure none of it is lost. For this we have a temporary
class which will be initialized as a singleton and would fire sets of events every 100 ms
for which we would test scenarios like network connectivity loss, app kill, app
background/foreground, etc.

@Singleton

class SwiggylyticsMigrationTest @Inject constructor(

private val swiggyEventHandler: ISwiggyEventHandler,


){

void fun stressTestEvents() {

withContext(Dispatchers.IO) {

async {

for(a in 0..50) {

swiggyEventHandler.handleOnClickEvent(

swiggyEventHandler.getGtmEventData("screenName $a", "objectName $a",


"objectValue $a", "position $a", )

delay(100)

async {

for(a in 51..100) {

swiggyEventHandler.handleOnClickEvent(

swiggyEventHandler.getGtmEventData("screenName $a", "objectName $a",


"objectValue $a", "position $a", )

delay(100)
async {

for(a in 101..200) {

swiggyEventHandler.handleOnClickEvent(

swiggyEventHandler.getGtmEventData("screenName $a", "objectName $a",


"objectValue $a", "position $a", )

This will allow us to stress test events that are being rapidly fired, even concurrently
over a very short period of time. We would be able to validate multiple batches being
created and them being sent successfully or to debug failure cases as well.

Analytics Testing:

This further needs to be validated at the analytics end as well where we would send
events via both FirebaseGTM pipeline as well as the Swiggylytics pipeline. A view will be
created conjoining the two tables dp_clickstream and dp_clicktream_v2 which will
help us merge the data and combine for analysis. There would be in basic, two rounds of
validation:

Pre-release analysis:

We would be triggering events to both the tables where the analytics POC would
analyze the data in both the tables to look for the volume as well as the uniqueness of
events in both the columns where we would expect an equal number if not more number
of events via the new pipeline. We would also add debug level logs to look for broken
ends and data loss.

Post-release analysis:

Once we release the app, we would run another set of similar exercise as above on real
production volume of data and depending on the results we would further scale up the
XP accordingly.

Rollout Strategy:

The target release plan for this is September mid where we would start the dev testing
as early as 21st of August where we start sending data and observe the data flow. The
feature will be rolled out via XP and will gradually be scaled up based on the results
observed.

XP Link:

https://xp.swiggy.in/experiment/1315/instance/3058

Snowflake query:

select count(*) from STREAMS.PUBLIC.DP_CLICKSTREAM_COMBINED where


component = 'Swiggylytics’;

References:

Existing clickstream flow :

Proposed flow

Proposal doc

Pipeline Matching
DP Table creation Ticket

Benchmarking and Testing:

Geohash testing Sheet

Size bump computation

Metric calculation sheet

Bulk sending event comparison sheet

HAR Comparison Sheet

Macrobenchmark Sheet

Low End Device Metric Sheet Mid End Device Metric Sheet

Clickstream migration GTM

Artifacts

Clickstream migration HLD

XP Doc

Solution doc

Dp clickstream flow optimisation doc from DP side

GTM steps

Both Consumer Android app and Consumer iOS App side changes will be shipped with
mid Sep release. Changes include Completed :
Send GTM (Google tag manager) driven events via Swiggylytics SDK as well along with
an XP.

For test variant, events will be sent via both GTM and Swiggylytics SDK

For control variant, events will be sent via GTM SDK only

Till the XP is up, all the Swiggylytics events will be sent to


STREAMS.PUBLIC.DP_CLICKSTREAM_V2 table and view and all GTM events will be
sent to STREAMS.PUBLIC.DP_CLICKSTREAM (happening currently as well without
any change here).
This change is needed so that Analytics team clickstream dependent 100+ scheduled
jobs are not impacted and they don’t need to add a where clause based on SDKs(GTM
vs Swiggylytics) in their queries.

XP will be rolled out gradually at 1% -> 5% -> 10% -> 20% and so on and the analytics
team will keep an eye on data.In Progress [currently running at 5%]

During this XP period, the Analytics team needs to perform an analysis for the event
loss percentage i.e events received by Swiggylytics SDK should be higher as compared
to the ones received via GTM SDK by X%(to be determined).In Progress
First cut of the data and analysis has been shared by Shivangi Sharma which says :

We filter out for those Object_names where Swiggylytics events count was non
zero i.e. events were flowing through Swiggylytics pipeline.

For Android, Swiggylytics events increased by ~47% to that of GTM events

For iOS, Swiggylytics events are increased by ~3% to that of GTM events

Once this analysis is completed and results are good as per the expectation and the XP
is rolled out to 100%, apps will disable the flag to send events via GTM SDK and enable
the flag to always send the events via Swiggylytics SDK for 100% of the events. X days
of hold up period will be there where GTM and Swiggylytics SDK will be sending 100%
events from the apps. After X days we can deprecate GTM SDK flows.
Currently Portal and Instamart web teams also send events to dp_clickstream table and
their migration is a blocker to completely deprecate the legacy pipeline of RILL job
conversion. Blocked [It has been prioritized by Portal web team at least in their OND
tech goals]

Currently in the dp_clickstream table, serverTimeStamp gets populated in a human


readable format like 2023-09-14 01:14:27.000 because this conversion is handled by the
RILL job during ingestion of events itself, however with the migrated flow which is a
generic flow serverTimeStamp is populated in epoch format only like 1694634265113
which is getting converted by the DP team during the query time to human readable
format in dp_clickstream_v2. Now since this run time conversion of the timestamp is
costly in terms of querying bulk data, eventually during the merge process of
dp_clickstream and dp_clickstream_v2 into a combined dp_clicksteam view, DP
team will remove this conversion from their end which is happening at run time and for
this we need entire Analytics team consensus.

Not Started [AI on Padma Sruthi Godavarthi]


After this migration of sending events via messageSet API is completed across
platforms and XP is rolled out to 100% population, there will still be some older
appVersions who will be sending events via older GTM pipeline. The Apps team will try
to go over the criteria of initially enabling soft nudge update crouton for those users and
then eventually force update users based on the daily number of sessions occurring and
OPDs. Still after performing this exercise, some users will be there who will not update
their app, for these set of users, we need a final consensus and approval from the entire
Analytics team and Product team then we can deprecated the ingestion of events via old
RILL job driven legacy pipeline meaning no events will be ingested via these older
appVersions.

Blocked [AI on Padma Sruthi Godavarthi Raj Gohil, PM leader]

One alternative over here we can do is have a proxy layer at DP’s end to intercept,
transform and send those events via Swiggylytics migration pipeline, this way we’ll not
be losing out any single ounce of the data. [AI on DP team : Anshuman Singh Deepak
Jindal]

Post scale up decision and > 90-95% app adoption, there has to be an overlap period
for both the data to co exist (in respective table
STREAMS.PUBLIC.DP_CLICKSTREAM_V2 and
STREAMS.PUBLIC.DP_CLICKSTREAM), so as to provide time for alignment in
metrics baseline shifts with Business teams.

Post 100% roll out of the changes Anshuman Singh to combined the tables
STREAMS.PUBLIC.DP_CLICKSTREAM_V2 and
STREAMS.PUBLIC.DP_CLICKSTREAM accordingly into
STREAMS.PUBLIC.DP_CLICKSTREAM only so that BAU flow keep working as is. Not
Started [AI : Anshuman Singh]

Post 100% XP roll out plan


In Progress

○ Before 100% XP roll out, CFD is attached below


Today’s flow :
■ Analytics team and their respective pipeline is consuming events from
dp_clickstream table and view only. No change here.
■ dp_clickstream_v2 table and view is created to perform the pre-post
analysis for the increase in the number of events only. No analytics team
is consuming it for now.
■ Portal and IM generated events will keep getting ingested into the
dp_clickstream table.

○ After 100% XP roll out, CFD is attached below


● Flow will be :

○ Let's assume the date is 11th Nov 2023 when XP will be 100% rolled out, post
that there will be a hold off period of X(7) days where both the GTM and the
Swiggylytics SDK events will be sent 100% from both the platforms.
○ Within this hold off period DP team will make the sufficient changes to combine
both dp_clickstream and dp_clickstream_v2 views into a single view
dp_clickstream. The Combined view should contain the data from
dp_clickstream and dp_clickstream_v2 from 9th Nov 2023 so that 2 days of
repeated data we get just in case to not miss any data during this transition
period.
○ Once this hold off period is over, Apps team will disable the feature flag put from
the clickstream migration App version onwards (Android: 1173 , iOS:4.8.5(6)) so
that from that app version onwards events flow from the GTM SDK will be
stopped.
○ From this date onwards, clickstream events generated via Swiggylytics SDK only
will be sent.
○ Portal and IM generated events will keep getting ingested into the dp_clickstream
table.
○ So, from here onwards the combined table will contain events generated from
Apps Swiggylytics SDK and portal, IM team’s generated events as well.
○ After this migration is completed there will still be older app version users before
clickstream migration who will not be updating their apps event after enabling soft
and hard update nudge, for those less than ~5% users, events will keep getting
generated via GTM sdk and the combined table will contain those events as well.

PS :
○ Post this migration itself the legacy DP pipeline cost will be significantly reduced.
○ S3 storing raw data will be deprecated post this migration as well.
○ Post 100% roll out and 3-6 months after that the older app version users should
significantly reduce and after that we all including the Analytics team can take a
call to completely deprecate the RILL pipeline. However, if the older app version
users absolute number is significant then we’ll have to take a call of routing the
traffic of RILL job ingesting data to awz_s3_SwiggyticEvent instead of
awz_s3_GTMEvent.

Final clickstream migration GTM Results

Clickstream Migration- Ashu

Android

iOS
AIs for increasing iOS events post Clickstream migration

○ Creating separate background sync tasks for non-synced events.


○ Increasing TTL time for discarding non synced events exceeding queue limit
which is currently 24hrs.
● Open queries :

○ If the DP team doesn’t send the response back, will it hamper any app’s flow or
not ?
Because 6k$ /month cost is involved over here. AI on Raj Gohil In Progress

Objective:

Since we have started modularizing our code base, this is another contribution towards
the umbrella project. In this we will be creating a framework that will be responsible for
driving analytics related logics. This framework will also push the events to the
respective destination either via GTM or SwiggyLytics API.

Why we should be using KMM?

As the idea was to create a single module source code that will drive the entire business
logic of the system, there are multiple tech stack available in the market that supports
cross platform development e.g: React native, Flutter & KMM. Out of these 3 the idea of
going with KMM was based upon the following reasons:
○ Kotlin has higher performance as its a compiled language.
○ Learning curve is less for the iOS developers as the language paradigm matches
with swift.
○ Injection of kmm module is much easier than of other stacks for the existing
projects.
○ Kmm has an advantage of having fast development cycle over RN that uses
runtime javascript.
● Current state:

The flow of analytics events currently driven in little scattered way. There were two
provisions were maintained to push an event to the clickstream pipeline.

GTM queue for non-priority events.

Swiggylytics for priority events.

GTM queue is nothing but google tag manger which helps as a mediator that transmit
an event to the BE data dump. The abstractions of the whole GTM framework resides at
the Application side.

Swiggylytics is meant for sending high priority events to the Backend. This submodule
was developed in order to tackle the losses (~20%) & latency in GTM queue.
Shortcomings:

With the current implementation we have faced issues as follows:

○ Parity maintenance issue on metadata that lead to prod anomaly.


○ Investing dev effort in maintaining same logic at 2 places for individual platforms.
○ Parity on data validation logics.
● Module segregation:

Swiggy-App env:

From the application side there will be only a single way of communication with
Analytics framework. App will project a type of event that needs to get generated and
synced with BE. From that point it will be the responsibility of Analytics framework to
acknowledge and proceed.

The event data structure can be as follows:

○ Screen name.
○ Object value (bannerId, widgetId, etc)
○ Extra-params (requestId, etc)
○ Context values which are dynamic wrt to the type of event.
○ Current sids & other global level values.
● Analytics module env:

Analytics module on receiving an event it will communicate with KMM module to get the
respective metadata of the event. The responsibilities of the module are as follows:

○ Will listen to any upcoming events from Swiggy App-Environment.


○ On receiving the event it will communicate with KMM module for the metadata.
○ On getting metadata it will segregate the event between GTM & Swiggylytics.
○ Hold instances of GTM & Swiggylytics and send the events via them.
● KMM module env:

KMM framework will hold crude business logic codes. This module will be responsible
for crafting metadata in the required format. The crafted metadat will be formed using the
values that we passed from the app side at the first place. App will only pass values that
are dynamic in nature like id, screenName, etc.

Approach 1:

As we are segregating major sub-modules that can work independently wrt Application
side. We can move the whole analytic codes into a separate framework. This can help
us in the followings:

○ Independent development & code merging.


○ Better build time.
○ Use of single business logic driving codes using KMM for cross platform use.
● In order to solution it we will create high level segregated environment that will work as
per their responsibility.
Pros:

○ App should not bother about the event validation as it will be fire & forget
mechanism for app. Rest will be handled via individual module.
○ Better segregation with the responsibility sharing.
○ Can create logging & alerting system for individual module.
○ Easy migration in case we will move everything to KMM.
● Cons:

○ Need to create a layer that will responsible for communicating with kmm module,
which can increase dev effort to some extent.
● Approach 2:

In this approach we can create the same module layers such as Kmm module, Analytics
module & App. Here the communication will be achieved via Application where App will
be act like a mediator between the two modules.
Pros:

○ Can use the existing pipeline for syncing events with swiggylytics & GTM.
○ Comparatively can be lesser time to conclude the development.
● Cons:

○ App will be a dependency factor which can restrict for scaling this to other large
scope.
○ We will need to import dependencies of both Kmm & Analytics across the app for
a single sub-system.
○ For future migration to move everything to kmm will be effort heavy as there is no
direct communication between kmm & analytics module.
● Communication contract:
Since there will be multiple static & dynamic data involved in the the process we need to
define some well defined classifications & provisions for passing dynamic data from app
to KMM module.

Static data: Those data that will not change at runtime and can be kept as a static
declaration in the KMM module. Ex: impression-brand-carousel-item-ad,
click-collection-restaurant-item, etc.

Dynamic data: Those data that will can change at runtime, and will be fetched from BE.

Ex: sid, bannerId, requestId, etc.

Communication channel: We can expose dedicated functions & class instances for a
particular component which can be an UI element, General events (app-launch), etc.
inside the KMM module. These functions will be parameterized with dynamic data and
will always return a ready to sync metadata.

Metadata:

This is will be the end result data that will be consumed by GTM & Swiggylytics.

data class KlyticsEventOutputData(

val syncType: EventSyncType,


val gtmMap: Syncable?,

val swiggyLytics: SwiggylyticsOutputData?,

val eventName: String

data class SwiggylyticsOutputData(

val headerData: Syncable,

val bodyData: Syncable

Ex:

From App-side to Analytics framework

fun sendRestaurantImpressionEvent(restId: String, exp: ExpObj,


ad: AdObj) {}

From Kmm to Analytics framework

fun getRestaurantImpressionEvent(restId: String, exp: ExpObj, ad:


AdObj): KlyticsEventOutputData{}
Some basic unified object examples are as follows:

Ad-Object:

data class AdDataObject(var adTrackingId: String, var promoted:


Bool) {}

Exp-Object:

data class ExtraParamObject(requestId: String, otherData:


Map<String, Any>) {}

<Keep on adding other required objects>

Pros:

○ Only dynamic part will be injected into KMM module.


○ Segregation at component level.
○ Can write dedicated UTs in the KMM module for the end result metadata.
● Cons:

○ We need to create mapper functions in the Analytics framework till we are not
moving data sync mechanism in KMM.

LLD:

The system will be designed in such a way that the atomicity at each abstraction will be
maintained to the highest possible level. The single responsibility will be kept in order to
handle an event in and out.

Base folder structure:


Constants: This will contain all the constant files within the scope of the whole project
module.

Event Objetcs: This will contain folder sections of each type of event. The nested folder
will contain its respective Interface, Manager & Constant files that will be responsible for
handling the functionality of the event.

Extensions: This will contain all the extensions that will be used for the project module.

Interface: This will hold high-level common interface files. These interfaces will be used
used across the project and can be implemented at all horizontal levels of the module.

Model: This will contain all the data models that will be used for the common purpose
use across the project module. Two dedicated folders will be maintained input & output
which will hold the models that will be used for data injection (input) and data
outsourcing (output) for the project module.

Singleton: This will hold singleton classes that will be used across the project.
Utils: This will hold supporting files for driving adhoc functionality for the project module.

CommonTest: This will contain entire unit test suit for the project.

Design pattern:

In order to resolve data flow in the module we need some base concept of simple but
effective design solution. Hence we will be using the widely known Repository pattern for
providing the vital data flow mechanism in the module.

Dependency injection:

Dependency injection will play a vital role in passing necessary data to the individual
constructors. For the purpose of resolving we will be using the following types:
Constructor injection:

In this case the data will be injected directly into the classes by passing arguments in
the constructor itself.

class BottomBarEventManager(

private val dependency1: Dependency1,

private val dependency2: Dependency2

) {

// Class implementation

// Given some classes

class ModuleRepository()

// Inject via constructor

class BottomBarEventManager(val repository: ModuleRepository

// declare it

val myModule = module {

singleOf(::MyRepository)
factoryOf(::BottomBarEventManager)

ModuleRepository:

The abstract class will act as an repository data provider for the other event managers.
The responsibility of the class will be to hold mutable & immutable application & config
data in a thread-safe environment and provide required injection for other classes that
needs those data.

ThemeDropEventManager & BottomBarEventmanager:

These classes were the actual event manager classes that will abstract the creation,
validation & outsourcing of a particular event. These classes will be injected with other
critical app data in order to prepare a final event. These classes will also implement
interfaces for driving the event creation & validation proces. For every dedicated event
type a new manager class will be created.

AnalyticsSyncOutputData:
This data class will be the final output object that will be ready for getting synced with
the BE. This class will abstract 2 data property i.e. GTMData & SwiggylyticsData. Both of
them will be constructed via dedicated functions in the manager and get synced with BE
via dedicated channels.

Interfaces:

Interfaces will be defined at various levels in order to create an outer blue print of a
implementation. These interfaces will contain mandatory fields & functions that a
implementor must have to implement in order to keep development integrity along all the
classes. Interfaces can also hold optional functionality that can be implemented based
on the requirements.

IGTMEvent: This will provide support interface for GTM event data with property holding
GTMData & function to prepare for the same.

ISwiggylyticsEvent: This will provide support interface for swiggylytics event data with
property holding SwiggyLyticsData & function to prepare for the same.

IEventValidation: This will provide support for validating an event. We will inclued more
functions if needed.
IDynamicAppData: This will provide interface for updating mutable app properties via
dedicated channels. This interface will mostly implemented by dedicated central classes
in the repository wher the data will be stored.

Future state:

With the evolvement of KMM we will evaluate the tech stack at all perspectives:

○ Handling network calls using ktor.


○ Logging & alerting systems.
○ Handling coroutines gracefully.
○ Migration policy needed for the production ready system.
● This will help us in the following aspects:

○ Mitigating major risks of failures for complex executions using beta version which
might impact on critical data & revenue.
○ Migration of major complex things may get avoided both with Thirdparty &
In-house codes.
○ May be we can use the stable system at the first place without thinking about the
old KMM stuff.

Cross platform dev guidelines:

○ On analytics & app environment development will be carried out on an individual


basis by creating respective jiras.
○ For KMM module development a single jira will be created that will hold all the
changes specified and any of the dev can pick it up.
○ Pr will be created separately for KMM module.
○ Pr should need atleast single review from each platform team dev along with the
KMM module owner & feature module owner.
● Roll out strategy:

○ Will keep dedicated kill switch with respective platform.


○ Will keep dedicated kill switch for Ad and non-Ad event flow via KMM.
○ Will roll out using XP with 1% userbase and monitor for any event drop and
gradually increase adoption.

Sign-offs team involved:

○ Apps team Balvinder Gambhir Mitansh Mehrotra Sambuddha Dhar Priyam Dutta
Nihar Ranjan Chadhei Agam Mahajan
○ QA team Suresh Thangavelu Vijay S
○ Analytics team Shreyas M Kumar Keshav
○ DP team <poc to be added>
○ Ads team <poc to be added>
● Appendix:

App build size:

Build size has been calculated by creating archived build and validating framework
executable & combined app package size.

Note: The size has been evaluated with the basic foundation files & codes. This may
increase with future development.

iOS:

Without koin dependencies:

iPA size: 117.9 MB

Swiggy.app package size: 224.2 MB

Framework executable size: 3.6 MB

With koin dependencies:

iPA size: 118.1 MB

Swiggy.app package size: 224.6 MB

Framework executable size: 4 MB


Final with code merged with 4.8.0:

Universal: 130 MB

Install size: 236 MB

Android: These tests were done using a sample app with R8 enabled.

Without koin and klytics dependencies: 1 MB

With koin and klytics dependencies: 1.1MB

GTM to Swiggylytics migration of


clickstream events
Problem statement

Currently, there is a significant loss of impression and click analytics events, resulting in
an average of 15-20% of generated events on the app not reaching the server. In rare
instances, the event loss has reached as high as 70-80% in the past, causing difficulties
for the analytics team in their analysis and preventing the rollout of certain features.

Root cause

At first, during the initial eye test, it seemed that the app was successfully transmitting
all the events, and the data loss was occurring on the backend. However, upon
conducting a thorough investigation into the matter, we discovered that the Firebase
SDK(GTM - Google Tag Manager), which we utilize to queue and send the events, was
discarding some of the events from its queue. This was primarily due to the high
volume of events being sent within a short timeframe.

Functioning of GTM SDK and limitation

At the time of initialisation of the SDK, the app gets 60 available hits that are
replenished at a rate of 1 hit every 2 seconds for events.

This means that initially the app only gets to send 60 events and after exhausting the
available hits, the app can send only 1 event every 2 seconds.

GTM Quotas

Currently what happens

So if a user navigates within the app as follows: Home -> Food -> Search -> Menu

Then the app triggers around 190 events in under 60 seconds.

Home ~ 30 events, Food ~ 30 events, Search ~ 120 events, Menu ~10 events

Since the app does not have 190 hits, SDK does not allow the app to send 190 events
but instead sends around 80-90 events and drops the rest.

This is the major cause of the data loss of analytics events.


Proposed fix

○ Migrate the medium of events flowing from GTM SDK to Swiggylytics SDK which
can handle batches of 100, 500, 1000, 2000 events at once as well. There is no
limitation of hits during launch and quota getting exhausted post a certain
threshold in Swiggylytics.
● Testimony from DE app

○ We have a testimony of Swiggylytics migration from GTM by the DE app team.


○ GTM vs Swiggylytics data comparison
○ Post migration they started receiving 50M more events and 7% increase in daily
events.
● Testimony from Cx App team for Menu and Cart attribution

Harsh Kataria to add the pre post analysis for the same.

Similar analysis was done for Menu and Cart attribution migration from GTM to
Swiggylytics and there too we saw a gain of 7-10% increase in the events daily.

Efforts from Apps team

Similar efforts which were there for Menu and Cart attribution events migration will incur
for dp_clickstream migration too.

~3-4 weeks per Cx app platform.

Prerequisites

DP Clickstream flow details

The DP team needs to migrate the conventional and complex dp_clickstream schema
to the schema registry.

Need efforts from the DP team side and alignment too.


Currently the schema registry doesn't even show the schema of dp_clickstream.

Getting a column addition via conventional dp_clickstream flow itself took a lot of time
during the Widget Ranking project.

DP Clickstream Flow Optimisation


● Owned by Anshuman Singh

Last updated: Aug 02, 2023

3 min read48 people viewed

○ Problem Statement
○ Current Data Flow
○ Issues with current flow
○ Proposed Data Flow
○ Optimisation/Saving
● Problem Statement

Currently the client app team sends around ~2 Billion events per day for the clickstream data via
the legacy event ingestion data flow and we then transform this data 1:1. (So we have 2 kinds of
data :- Raw & transformed).

In this flow we are ingesting the clickstream data in kafka twice(raw and transformed) and storing
the above data again twice in S3(raw & transformed) and have an additional layer of
transformation for 1:1 event.

The proposal is to directly send the transformed data from the client end which can save a large
amount of cost and can make the pipeline generic.

Current Data Flow


Open Screenshot 2023-04-18 at 11.00.12 AM.png

○ Data comes from GTM to event collector via GET endpoint(/client/metric/event/gtm).


Topic :- GTMEvent ( http://dp-p-confluent_kafdrop.swiggyops.de/topic/GTMEvent)
S3 path :- s3://data-platform-json/json_logs/daily/GTMEvent/
Schema Registry :-
https://dp-event-onboard.swiggyops.de/schemaStore/eventschema/?q=GTMEvent
○ Then rill jobs does some transformation over this data.
Input topic:- GTMEvent
Output topic :- awz_s3_GTMEvent (
http://dp-p-confluent_kafdrop.swiggyops.de/topic/awz_s3_GTMEvent)
Rill job :- GTMEvent Transformation
○ Then data is persisted at s3 via spark job.
cdc_Kafka_DP_awz_s3_GTMEvent
Hive table details :-

View :- default.dp_clickstream
Delta table :- streams_delta.dp_clickstream
Orc table :- streams_orc.dp_clickstream
Above view points to delta & orc table.
○ Then data is synced to snowflake via snowpipe
Snowflake table :- streams.public.dp_clickstream

More details regarding custom changes being done at event-collector can be found here :- DP
Clickstream Flow Details | Understanding flow

Issues with current flow

○ We are ingesting the clickstream data in 2 kafka topics :- GTMEvent &


awz_s3_GTMEvent
Both of these topics are one of the highest throughput streams in the DP Confluent kafka
cluster.
○ Both topics combined comprise around ~20-22% throughput of whole DP
Confluent kafka cluster.
Below is the topics wise throughput distribution :-
Open Screenshot 2023-04-17 at 7.54.34 PM.png
○ We are storing the clickstream data at 2 places in S3.
○ The raw data is stored here :- s3://data-platform-json/json_logs/daily/GTMEvent/
○ This is json data
○ The daily incoming data is ~500-600GB
○ This is not being used
○ The transformed data is stored here :-
s3://data-platform-delta/delta_logs/awz_s3_GTMEvent/
○ This is delta data
○ The daily incoming data is ~200-300GB
○ We are transforming the data at rill end (Transformation Logic)
○ We are populating time_stamp field using a custom UDF which is :- “SID Decoder”
○ Within this UDF, Base 36 Decoder is used to decode hours from first 3 characters of SID.
It is then used to add this number of hours to Swiggy Epoch to finally get the client
timestamp at which event was generated
UDFs Repository: https://github.com/swiggy-private/dp-ingesttransformations
○ Rill job :- GTMEvent Transformation
○ We are reading the data from kafka and persisting it to S3 twice.
○ Spark job for persisting raw json data :- cdc_Kafka_DP_GTMEvent
○ Spark job for persisting delta data :- cdc_Kafka_DP_awz_s3_GTMEvent
○ Currently this data comes via legacy flow :-
○ It has custom code and it doesn’t support schema-evolution.
○ The events comes via GET endpoint and it consists of 1 message per event, which
impacts the batching at kafka, more requests at ALB etc.
○ whereas the generic pipeline supports POST endpointswhere more messages can be
sent in 1 event, and there would be support for “schema-evolution”
● ProposedData Flow

Open Screenshot 2023-04-18 at 12.52.16 PM.png

Optimisation/Saving

○ If we start sending the transformed data directly from the client via current generic
pipeline then we can save approximately 5K- 7K monthly.
○ Below are the aprox cost savings :-

Componen expected Reas


t savings o
n

Compon expecte Reason
ent d
sav
ing
s
Kafka ~2.5K Given that we are currently spending total ~25K for kafka cluster. So if we
mo remove one topic then around 10% can be saved.
nth
ly
Spark ~1.5 K Currently the total cost for all stream spark jobs would be ~15K(without
Cons mo discount). So if we remove one consumer it could save upto ~1.5K (Taking
umer nth 10% of overall cost)
jobs ly
S3 cost ~1K Including storage, S3 operations etc.
mo Given that we can save ~600GB per day data ingestion
nth
ly
Rill ~1.5K This can be removed
Trans mo
form nth
ation ly
Event ~500- If the data can be sent via POST in batches then it will decrease the cost at ALB
Inges 1K and event collector end as well
tion mo
nth
ly

○ The pipeline would be more streamlined, generic, will have support for schema-evolution.

From <https://swiggy.atlassian.net/wiki/spaces/DP/pages/3862069249/DP+Clickstream+Flow+Optimisation>

Klytics SDK Goals and Objectives:


1. Introduction & Objectives

1.1 Overview

○ Klytics Library: A cross-platform KMM library designed for Android and


iOS.
● 1.2 Objectives

○ Eliminating Parity Issues: Addressing challenges of multiple parity issues


between platforms by centralizing payload generation.
○ Reducing Development Effort: Streamlining the analytics event
implementation process to reduce development effort across platforms.
● 2. How did we solve the problem?

2.1 Problem Statement

○ Previous Approach: Platforms handled payload generation for analytics


events independently, leading to disparity issues.
○ High Development Effort: Each platform required individual
implementation efforts for analytics events.
○ Limited Scalability Across Apps: The existing approach was not easily
scalable for integration into different apps as each app has their own
implementation for generating events.
● 2.2 Solution [Ref]

○ Klytics Payload Generation: Klytics library now handles payload


generation centrally.
○ Platform Integration: Platforms send standardized payloads to the
analytics table.
○ Parity Elimination: Standardized payloads prevent parity issues between
platforms.
○ Reduced Development Effort: Only one developer is needed for adding
new events; other platform dev will need a minor event consumption effort
therefore eventually reducing the efforts.
● 3. Anecdotes

○ Our QA’s have found multiple disparities in events when testing the events
via the Ard Automator tool.
For example
1. https://swiggy.slack.com/archives/GBZMDADPZ/p1706709761046829
2. https://swiggy.slack.com/archives/GBZMDADPZ/p1690894571135269
○ In these disparities, the most common issue is difference in key, for
example android is sending “Favourite”, ios is sending “favourite”. These
issues would never come if there was a common sdk for generating the
payload of these events.
● 4. GTM

In the first milestone, we have migrated all the events of the accounts page to
this SDK.

We are currently running an xp and monitoring whether there is any data loss or
not.

If all goes well , we will scale the XP to 100%

XP link - https://xp.swiggy.in/experiment/1467/instance/3701

5. Suggestion and SOP to be followed

After the ARD is provided by the Analyst and walkthrough is done, all the devs
must add their ARD events via this SDK only going forward once we move to
100% adoption of the Klytics SDK

6. Future Plans

○ Migrate all the existing analytics events to the Klytics SDK


○ Migrate attribution events to the Klytics SDK

ARD Automator:

ARD Automation

SOP - ARD Automation - SOP

Session deck - ARD Automation

Feature Requests and WIP - Trello Board

Slack Channel for Bug reports, feedback, announcements and releases : #android-toolkit

Release Notification - ARD Automation

POC : Anik Raj C

ARD automation walkthrough (2023-03-28 14:09 GMT+5:30)

Problem

Our current Analytics workflow has been plagued by manual and fragmented processes,
resulting in numerous challenges, such as:

Lack of a Single Source of Truth: ARD specifications scattered across various documents
make it challenging to maintain a clear understanding of expected data.

Absence of Change History: Ad hoc changes to ARD go undocumented, leading to


inconsistencies among platforms. The divergence in event data between platforms creates
substantial workloads for the analytics team and accumulates technical debt.

Manual Verification and Regression Testing: The need for manual verification and
regression testing consumes valuable time and can overlook crucial checks due to the volume
of events.

Limited Coverage: Due to the sheer number of events flowing through, it wasn't possible to
cover the entire Analytics events in regression testing.

Solution

We are proposing an improved and automated workflow to once and for all resolve all these
issues. The following workflow ensures we have a simple yet effective way to write contracts
between teams and how to verify this in an automated way.

Key Features

A Single Source of Truth: We leverage Git to establish a central repository for contracts. This
repository will serve as the authoritative source for ARD specifications, ensuring clarity and
consistency.

Easy contract Maintenance: We ensure creation and modification of contracts are easy with
tooling with GUI forms, bulk imports, and local testing workflows.

Complex Payload Validators: The payload validators allow for matching complex structures,
enabling us to cover almost all type of events.

Automated Contract Verification: Our workflow includes tooling and processes that automate
the verification of ARD contracts, reducing manual effort and increasing accuracy. This will
significantly enhance our efficiency and reliability in verification and regression testing. Thanks
to the automated process, we can now efficiently handle a significantly larger number of events
than we could before.

Powered by Kotlin Multiplatform: Written entirely in Kotlin, backed by multiplatform support,


we can deploy the tooling on any platform. Currently, we have it on all desktop platforms, but we
have plans to expand to CI and Mobile to improve testing automation soon.
A source of truth

We already have the best tool for this, Git. So we made a repository to hold consumer app
contracts

https://github.com/swiggy-private/consumer-app-contracts

Incremental improvements on the tool so far based on the usage feedback

Since the initial M1 release, the tool has undergone significant modifications to address various
requirements put forth by different teams. Some of the major changes are

Support for complex validators - Originally limited to exact value matches, the tool now
accommodates complex validators. This includes pattern-based regex and nested JSON
validators, to cover almost any payload.

Transition from GTM to Generic Payload Verification - The initial focus on GTM events for
the Consumer App has been broadened to cover generic event verification, allowing for
validation of any event from any team.

Migration from YAML based contract to fully automated JSON contracts - The contracts
underwent a transition to a fully automated workflow utilizing JSON instead of the original,
handwritten YAML one. This shift was prompted by the need to mitigate human errors,
especially with the introduction of intricate, nested validators. A user-friendly GUI facilitates ease
of contract creation and editing.

Local Contract Testing - Capability to load and test local contract files before uploading them
to the server.

Bulk Event Import - To improve the execution speed of creating contracts, we introduced bulk
event imports. This fast tracked the adoption significantly.

Folder Structure Implementation - Given the tool's widespread adoption across various
teams, the incorporation of a simple folder structure was deemed essential. This structure aids
in organizing contracts based on teams or workflows, thereby streamlining the adoption
process.

Request data exporter - Provision for exporting parsed properties to CSV to aid in debugging
and analyzing instances where requests do not match. This feature facilitates seamless sharing
and swift identification of discrepancies.

Replay of Requests - We had originally only supported live events coming from an emulator or
real device, but some teams preferred to use a HAR file (dump of all requests) to verify instead.
We quickly added this in to enable replaying of the requests from the dump, enabling more
teams and improving the speed of execution.

These modifications collectively represent a significant evolution of the tool, enhancing its
versatility, usability, and efficiency in meeting diverse requirements and facilitating seamless
integration across teams and workflows.

Adoption

Team Status
Consumer Tech Onboarded with 220+ events created (50% of
total)
Dineout Onboarded with 26 events created (17% of total)
Minis Onboarded with 14 events created
DE Onboarding
IM Onboarding
Vendor Onboarding
Insanely Good Onboarding

Validators

String validator - Case sensitive string comparison

Regex Validator - Matches if regex matches

Exhaustive List Validator - Matches if the value is in the list

JSON Validator - Can have complex object and array objects with each node being any other
validator

See ARD Automation - SOP for detailed information

Onboarding - TLDR

Get a copy of the latest Desktop Android Toolkit from #android-toolkit

Add rewrite rules to Charles - link

Generate some events on App and tool will receive the events

See ARD Automation - SOP for detailed information

Workflow

ARD creation

Ard will be shared by Analytics POC

Dev POC to raise contract PR


Analytics and Dev POC will finalize the ARD as before

Dev POC will raise a PR to create a contract - Example Contract

PR needs to be approved by both IOS and Android POC. Analytics POC can verify but due to
access limitations, might not be able to review PRs. Analytics POC can verify the contract on
the toolkit tool after the PR is merged in.

Once PR is merged in, the contracts are published automatically.

Verification / QA regression

Install the Android Toolkit tool available on Mac and Windows

Goto the ARD section

Select a contract to verify

Setup charles to rewrite analytics requests to the tool - See below

Now run the workflow and verify coverage of both platforms

You can also see all the requests flowing in to understand extra or unwanted events

Demo

https://drive.google.com/file/d/1T0wjZnZxg2mG1r-IAp-w_b6_zOtYGaHH/view?usp=sharing

Sample PR

https://github.com/swiggy-private/consumer-app-contracts/pull/22

Issues where ARD automation was useful

https://swiggy.slack.com/archives/GBZMDADPZ/p1712153844159739

You might also like