LTI Data As Service Data Mesh Architecture - v2 1
LTI Data As Service Data Mesh Architecture - v2 1
Whitepaper
O
ver the past decade, the role of data has undoubtedly changed within businesses globally.
Organizations such as Facebook, Alphabet (the parent company of Google and its subsidiaries),
LinkedIn, and Twitter, began harnessing the power of data by getting insights, finding correlations,
and using artificial intelligence imparted with continuous learning. Together with the technology-
enablers and hyper-scalers such as Amazon, Google, and Microsoft, these companies were able to
store data on an unprecedented scale and harness the tremendous power from it. The evolution of
these businesses has proved how data has become the primary asset in the digital economy.
Data as an Asset
Data, without a doubt, is the most vital part of a digital economy, where the entire business
foundations are based on data platforms with cutting edge innovative arrangements; Amazon CTO
Werner Vogels said, “data are (sic) at the core of value creation, whereas physical assets are losing
their significance in business models.” Data centric business models are the key to unlock potential
across various business verticals. Every industry depends on insights from data for key decisions.
For a business to accomplish a venture wide data-driven culture, it is crucial to get a buy-in from
the C-Suite executives and those at the entry-level. Awareness must be created within the business
culture, and the leadership team should clarify the reasoning behind utilizing data.
Data as a Service Powered by Data Mesh Architecture | 3
De-Coupled Pipeline
Traditional data processing consists of processes involving ingestion, cleansing, aggregation, storing,
and serving. To meet the needs of adding new data sources, architects scale the system by breaking
the process into further smaller deployable components interacting with one another to achieve a
functional objective. The motivation behind breaking a system down into its architectural components
is to create independent teams able to build and operate the quantum. In a typical monolithic
architecture that caters to one primary data domain, adding a sub-type can involve reusing
the majority of components with the addition of a few new elements to meet the objective. This
capability could potentially reduce the velocity and scale of the data in response to new consumers.
In addition to timed events, source data domains should also provide easily consumable historical
snapshots of the relevant datasets, aggregated over a time interval that closely reflects the interval of
change for their domain. Data sets should be indexed based on different dimensions and organized
for consumption with very low latency and higher throughput so that the consumer can stitch it to
many other data sources to join, filter, and aggregate for intelligent analysis.
The Data Mesh platform is a distributed data architecture, used under centralized governance
and interoperability standards and empowered by a common and orchestrated self-served data
infrastructure. It is a long way from a scene of fragmented silos of inaccessible data.
The needs are real, and tools are ready. It is up to the engineers and leaders to realize that the
existing paradigm of big data and one true big data platform or data lake will only repeat past
failures, only this time, just using new cloud-based tools.
In the above architecture, the operational data and the external data become immutable points of
data access, and where data transformation, if needed, is pulled, massaged, and stored as a data mart.
Whether internal or external, every data point becomes a source for a next higher level of abstraction.
The vital thing to note here is the various data domains that serve diverse users with data pulled from
various sources. The datalake to store monolithic data takes second preference to the ecosystem of
various well-cataloged data, orchestrated for access through discovery process makes it simpler to
manage it and scale for the future. Use cases for the domains of data identified by a cross-functional
team of analysts and data are logically separated and owned by many teams. All sources of various
data then become a node and are distributed when the need arises.
Data as a Service Powered by Data Mesh Architecture | 5
HTTPS
Persona Powered by
Architects,
Data Governance
Business & Legal
Business
Cross Funtional Teams, Domain Owmers, Analysts
Analysts
Data Domain
Data Domains
SMEs
The Mesh and cataloging together with access control and governance makes the difference here
as it moves away from a fragmented silos of inaccessible data.
Data Mesh claims that for big data to fuel innovation, its ownership must be federated among
domain owners, accountable for providing their data as products. Decentralization and
interoperability among data from various domains focusing on an end-user experience are key
to data democratization. The Data Mesh idea is born out of modern distributed architecture:
considering domains as the first-class data repositories, integrating data across many platforms,
and providing self-serve data.
A Data Mesh holds the advantage over a traditional data warehouse as a central ETL pipeline gives
teams less control over increasing volumes of data. Also, with different data use cases, it becomes
difficult to cope with data transformations tilting the balance to relying on providers that specialize
in a specific data set, pulling them on demand, and doing a mashup before putting it to use.
Data as a Service Powered by Data Mesh Architecture | 6
Technology Enablers
Hyper-Scalers like AWS, Azure, and Google Cloud can provide cloud infrastructure for storage, and
infinite compute on-demand and pay-as-you-use, which can substantially reduce the operating cost
and provide the ability to scale operations where needed and adjust domains and data as needed.
Security is paramount to the organization in the context of strict regulations for financial institutions,
insurance, and others that maintain sensitive data: PII, PHI, and PCI. Hence, it becomes important to
safeguard them. AWS offers KMS and HSM to encrypt data and role-based access for sensitive data
using IAM policies.
For a business to achieve an enterprise-wide data-driven culture, data technology awareness must
be carefully fostered within the business culture to truly democratize data science.
Amazon S3 / Amazon S3 Glacier provides the data foundation as it is an efficient and cost-
effective way to store data. It can scale data based on the needs and archive data to retrieve it
with an SLA of a few hours.
On-Demand computes by AWS EMR provide virtually infinite computing for data-intensive
operations based on parallel processing using spark.
AWS Glue for ETL operation involving cleansing, formatting, mapping, and storing to the target
system. AWS Glue DataBrew provides a valuable tool for data science users to manage data. AWS
Glue Elastic Views can power the relevant analytical insights a Data as a Service (DaaS)-using-
enterprise requires – with high performance.
Amazon Kinesis can live stream at high velocity and perform analysis in real-time.
Amazon Sagemaker is the tool for AI / ML, a key offering for any AWS-powered DaaS platform, as
it is their one-stop solution for all things AI / ML on AWS.
Data Enrichment via third-party data pulls using APIs expands the scope for adding unlimited data
domains and using the join, filter, and aggregate processes.
API Gateways for scalable and secure data exchange expose data through API gateways powered
by Lambda to perform data processing.
The below sections explain the steps for a bot to provide recommendations upon receipt of the
loan application.
Transactions
On-Prem Cloud Loan Credit Card ATM Card Unauthorized
Database Database Application Usage Theft Net Banking
ML Analysis
Insights
Data Customer Customer Individual
Inferences
Algorithms Segmentation Profiling Risk Score
API
Data Consumer
The above diagram is an implementation of a Data Mesh architecture providing a service with
data coming from different domains. After mashup, the value-added data is served over an API. The
following section details the various pieces of the Data Mesh specific to this use case.
01 03 05 07
Web Crawling: Web crawls made through social media data to gather time device footprint for the
applicant using the device id.
Data Analysis: Analytics added to traditional methods to enhance fraud detection capabilities.
Using statistics and ML techniques, data is analyzed, and anomalies are detected. By analyzing the
category of sites, duration visited, related links visited, the interest expressed by the data subject over
a gamut of metrics is obtained using a proprietary algorithm.
Inference: Signs of randomization and other anomalies on devices are identified, deviation from the
virtual user’s behavior is scrutinized to remove false positives. Using an integrated case management
system leveraging social media, a trained engineer can derive relevant findings for analysis that can
be either from structured or unstructured data.
Capture data from similar Applications: Data mining could be used to classify, cluster, segment
data, and automatically find associations and rules to signify interesting patterns related to fraud.
New business rules and pattern recognition can be set to identify fraudulent behavior. This valuable
data helps organizations detect fraud more efficiently than those that rely on traditional methods.
User Profiling: A user’s behavior, network place, etc., allows effective assessment of applicants who
are hard to evaluate through conventional data sources, enhancing the resolution of the decision-
making process, identifying low-risk groups for comprehensive product offers and increasing the
approval rates in general. With the help of data mining tools, customer behavior can be analyzed
to derive patterns from extensive customer records that can be used as a predictive tool for future
behavior of customers for fraud detection.
Decision Making: Data-Driven Decision making (DDDM) is the most critical element of success in an
organization. After the data has been transformed into information, customers will be profiled into
different buckets according to their risk score, thus allowing the underwriter or relevant department
responsible to make an informed decision.
Data as a Service Powered by Data Mesh Architecture | 10
References:
Marr, Bernard; 2017; What Is Data Democratization? A Super Simple Explanation And The Key
Pros And Cons; Available from: www.forbes.com/sites/bernardmarr/2017/07/24/what-is-data-
democratization-a-super-simple-explanation-and-the-key-pros-and-cons/?sh=402c68fc6013
(Accessed 20th Jun 2021)
Debanjan Saha; 2020; How the world became Data-Driven and what’s next. Available from:
www.forbes.com/sites/googlecloud/2020/05/20/how-the-world-became-data-driven-and-
whats-next/?sh=45f487be57fc (Accessed 19th June 2021)
Deghani, Zhamak; 2020; Data Mesh Principles and Logical Architecture; Available from:
www.martinfowler.com/articles/data-mesh-principles.html ( Accessed 20th June 2021)
LTI (NSE: LTI) is a global technology consulting and digital solutions Company helping more than 435 clients succeed in a converging
world. With operations in 31 countries, we go the extra mile for our clients and accelerate their digital transformation with LTI’s Mosaic
platform enabling their mobile, social, analytics, IoT and cloud journeys. Founded in 1997 as a subsidiary of Larsen & Toubro Limited, our
unique heritage gives us unrivalled real-world expertise to solve the most complex challenges of enterprises across all industries. Each
day, our team of more than 36,000 LTItes enable our clients to improve the effectiveness of their business and technology operations
and deliver value to their customers, employees and shareholders. Find more at http://www.Lntinfotech.com or follow us at @LTI_Global