Dedicated Semantic Layer Architecture For Effective Data Analytics and Visualization: A Case Study
Dedicated Semantic Layer Architecture For Effective Data Analytics and Visualization: A Case Study
A. Advantages
Allows business matrix and KPI to be shared with
disparate platforms including BI and data science tools.
Abstracted from the data sources, they act as data
virtualization for business users allowing them to access
data from multiple sources.
Complex business logics and calculations can be defined
once and reused many times by various sub departments in
the organization.
“Fig. 1, BI Tools as Semantic Layer” [6] Decoupling of data lake and business matrix allows for
scalability.
The solution for the above stated is a dedicated layer for Helps track the lineage of business matrix columns since
semantics which is not limited within the boundaries of they are integrated with existing data platform.
multiple BI tools but also reachable by a variety of tech Simplifies governance and compliance processes in the
stakeholders as per their needs and use cases. In the later dedicated approach as compared to federated silos.
years, there has been much increased focus on the “Dedicated Easy onboarding of new integrations on existing KPIs or
Semantic Layer” model. In 2018, Jinja templates and dbt matrix.
introduced transformation layer into semantic layer. In 2019, This can act as application layer through data APIs.
Looker and LookML were branded as the first real semantic
Enhanced query performance and cost efficiency due to
layer. By 2022, there were more additions like MetriQL,
elimination of redundant processes.
Minerva and dbt.
B. Challenges
II. DEDICATED SEMANTIC LAYER (DSL) Besides a lot of apparent advantages, DSL also brings in
some challenges to the picture. Introducing a new layer adds
DSL is implemented as a dedicated layer between data
to the complexity of the existing data platform. Another
sources and all the consumers including BI tools. Irrespective
downside is that if newer technologies or solutions are being
of the BI tool users choose, DSL allows them to work with the
leveraged to build this layer, it requires operations support
standard semantics and underlying data layer, ensuring that
leading to increased cost of maintenance. Opening access to
the data insights and reports are consistent across all the
the layer using APIs can also introduce performance issues as
integrations. With clear advantages over the predecessor
the data grows.
approaches, DSL has found a critical position within the
modern data stack.
Nevertheless, in-memory semantic layers require Monitoring and alerting techniques involve setting up
significant memory, which can be expensive. The available thresholds for each metric and alerting if they are exceeded.
RAM limits the amount of data that can be stored in memory. ETL Monitoring tools, DB monitoring tools, Log analytics,
Furthermore, horizontal scaling of in-memory semantic layers data quality tools etc. can be leveraged for this purpose.
might be more challenging compared to other types. Proactive monitoring, regular review and automated alerts are
some of the best practices that can be followed to help the metadata functionality. Popular API approaches are
semantic layer to work seamlessly providing continuous Rest/Graphql.
support without interruptions. Following this approach will
ensure that the production support process is streamlined. The API layer allows seamless integration with machine
learning frameworks. Machine learning models can access
E. Data Archival and Retention data from the semantic layer for training and prediction.
This involves systematic storage and cleanup of data Developers can build custom applications that leverage the
based on the usage and data retention policies. Data lifecycle data and insights provided by the semantic layer. Data can be
management is critical to ensure that data is only stored for integrated with other systems, such as ERP, CRM, and
the time it is required. If the data is ready for archival it can be marketing automation tools.
first moved into a less frequently accessed storage for some
period after which it is permanently deleted. Data can be G. Integration with Enterprise Data Catalog and
hence categorized into frequently accessed, less frequently Governance.
accessed, unused. The unused data qualified for data purging DSL can be integrated with data catalog and governance
which will handled by this component. Determine the for enhanced data security and compliance. Users can interact
necessary retention period for different data types based on with data catalog to read all the sematic layer tables by
their business value. leveraging existing compute engines in the platform. The
semantic layer can directly access metadata from the data
Data Archival strategies can be carefully planned based catalog and create metadata in the catalog for users of DSL.
on the storage requirements. This unit also need to have data
restoration mechanism in place to cater to restoration needs as This is a powerful combination as the semantic layer
it comes. There should be inbuilt audit and compliance will be fully secure and enriched with all relevant features due
validation mechanism to ensure that the data access and to this integration. The data catalog serves as a central
compliance is intact. repository for metadata, including data definitions,
relationships, and usage guidelines including storage and
F. API Access tagging guidelines. Integrating DSL with catalog
API access allows DSL to serve as an application layer automatically inherits all these principles without needing to
by letting disparate systems or applications connect to this reinvent the wheel for the DSL. Another advantage of this
layer. API enables users to query metrics and dimensions approach is that both DSL and data catalog can enforce
using the JDBC protocol, while also providing standard consistent data definitions and usage standards.
A. Open-Source Tools
The most cost-efficient ones which when integrated with
the existing data platform will be good for DSL. For many
tools, there are active communities providing support and
documentation to help troubleshoot issues.
B. Software As Service
With dedicated support SaaS products help reduce the
operation and maintenance overhead.
The use cases of DSL are like the use cases of BI and
reporting except that introduction of dedicated layer can
ensure that the data set can be reused by multiple business “Fig. 4, uMetric at Uber” [9]
units.
VI. AI AND FUTURE OF SEMANTIC LAYER
Sample use case #1: A retail company wants to provide its
executives and business analysts with a unified view of Leading organizations incorporate data science and
sales performance across regions, product categories, and enterprise AI into everyday decision-making in the form of
customer segments. augmented analytics. A semantic layer can be helpful in
successfully implementing augmented analytics.
DSL can be used in this case to abstract underlying sales
data from multiple databases and systems. It can be used to An AI-ready universal semantic layer can also power
define the standardized dimensions, and hierarchies, allowing customer-facing applications that enable organizations to
both executives and BI users to explore the same data set from make the most of their data and their customer interactions.
two different views like a portal application vs BI tool. This can be achieved through the integration of DSL with
customer facing application via APIs in contrast to BI tools
On top of this some of this data when integrated with which has been the primary use case so far.
ERP data can be used to help KPI metrics for supply chain
managers to help them identify the demand forecast and be A universal semantic layer that is AI-ready is needed to
prepared for the demand. This shows that the same layer can connect and work with diverse data platforms, protocols, and
be reused across multiple business domains and areas without consumption tools. This decouples the data from
compromising data silos, processing similar datasets multiple consumption, thereby enabling the democratization of data
times and thereby introduce data quality issues. analytics and AI in the enterprise [7].
Sample use case #2: At Uber, business metrics are crucial Semantic layers can be used to publish AI/ML-generated
for understanding their performance, evaluating new insights to business users using the same analytics tools they
products, and making informed decisions. These metrics use to analyze historical data. Thus, we see that in future there
are used for various purposes, ranging from are chances for convergence of analytical tools while
troubleshooting specific issues (like a fare problem) to maintaining DSL as its meaningful business ready metadata
powering complex machine learning models that optimize
pricing strategies on a global scale. Semantic layers will expand into more new industries,
and they offer potential for all industries that rely on data,
Uber realized that standardizing metrics was essential including logistics, energy and agriculture. Their
when democratizing access to data and insights. Without implementation could bring significant benefits by extracting
standardization, multiple versions of the same metric could insights from different data sources [8].
VII. SUMMARY
REFERENCES