0% found this document useful (0 votes)
40 views6 pages

Dedicated Semantic Layer Architecture For Effective Data Analytics and Visualization: A Case Study

Organizations these days increasingly rely on fast growing data for their critical business decision making. To leverage full potential of data, petabytes of data are being ingested into central data lakes mostly powered by cloud. They also realize that it is not enough to just collect huge amounts of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views6 pages

Dedicated Semantic Layer Architecture For Effective Data Analytics and Visualization: A Case Study

Organizations these days increasingly rely on fast growing data for their critical business decision making. To leverage full potential of data, petabytes of data are being ingested into central data lakes mostly powered by cloud. They also realize that it is not enough to just collect huge amounts of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 9, Issue 10, October – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1676

Dedicated Semantic Layer Architecture for Effective


Data Analytics and Visualization: A Case Study
Ramla Suhra
Department of Digital Data Solutions
H-E-B
Texas, USA

Abstract:- Organizations these days increasingly rely on A. Evolution of Approaches


fast growing data for their critical business decision The concept of semantic layer is not entirely new as it is
making. To leverage full potential of data, petabytes of dated back to early 1990s where it was patented by Business
data are being ingested into central data lakes mostly Objects and then subsequently challenged by MicroStrategy
powered by cloud. They also realize that it is not enough in 2003[1][2]. While the semantic layer’s origins lie in the
to just collect huge amounts of data. To derive value from days of OLAP[3], the concept is even more relevant now in
this data, it must be cleansed, interconnected and modern data stack.
translated from its complex technicalities into an easily
interpretable and more familiar business terminology.
Building a semantic view of the data enriched with One of the initial approaches towards building the
business metrics enable users to query, analyze and business-friendly representation was by using Data Cubes [4].
visualize information as quickly as the business demands. Data was pre-aggregated and stored in multi-dimensional
While the semantic layer is perceived as the cornerstone array structure for efficient storage and retrieval. This
in modern data architecture, there are different approach was particularly well-suited for online analytical
perspective towards where or how this should be processing (OLAP) operations, as it enabled rapid slicing and
implemented. Additionally, understanding the evolution dicing of data along various dimensions. In this approach new
of semantic layer over the years can help choose the right reports needed new cubes thus making it less flexible and time
architecture when attempting to build one in an consuming to build.
organization. With the advent of Artificial Intelligence
(AI), it is imperative that we discuss the impact of AI on Increasing complexity of data and requirements for more
this topic. This research delves into the evolution of the flexibility naturally progressed into building a newer
need and significance of semantic layer, exploring their approach, i.e. Semantic layers within Business Intelligence
architecture, benefits. It also analyzes the challenges (BI) tools (“Fig. 1”). With wider adoption of business
faced during semantic layer adoption and the outlook. intelligence techniques, different departments started using
different BI tools as found by the survey by 360Suite which
Keywords:- Data Analytics; Artificial Intelligenc;, Data states that 67% of respondents say they can take advantage of
Architecture; Semantic Layer. multiple Business Intelligence solutions within their
organization [5]. This led to data silos, resulting in
I. INTRODUCTION inconsistent business logic, diverse metrics, and varying
interpretations of the same data within the organization. The
A semantic layer is a layer of abstraction that separates analytics team could have a fully defined model locked inside
the physical representation of data from what is to be viewed their preferred BI tool, which would make it inaccessible or,
by business users. It provides a logical view of the data that rather, inoperable for any other team or tool sitting outside
helps end users access the data using common business this ecosystem.
terminologies which are easier to understand for them. By
providing a more business-friendly representation of data, it
acts as a bridge between the raw data and the business users.

IJISRT24OCT1676 www.ijisrt.com 2575


Volume 9, Issue 10, October – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1676

“Fig. 2, Semantic Layer” [6]

A. Advantages
 Allows business matrix and KPI to be shared with
disparate platforms including BI and data science tools.
 Abstracted from the data sources, they act as data
virtualization for business users allowing them to access
data from multiple sources.
 Complex business logics and calculations can be defined
once and reused many times by various sub departments in
the organization.
“Fig. 1, BI Tools as Semantic Layer” [6]  Decoupling of data lake and business matrix allows for
scalability.
The solution for the above stated is a dedicated layer for  Helps track the lineage of business matrix columns since
semantics which is not limited within the boundaries of they are integrated with existing data platform.
multiple BI tools but also reachable by a variety of tech  Simplifies governance and compliance processes in the
stakeholders as per their needs and use cases. In the later dedicated approach as compared to federated silos.
years, there has been much increased focus on the “Dedicated  Easy onboarding of new integrations on existing KPIs or
Semantic Layer” model. In 2018, Jinja templates and dbt matrix.
introduced transformation layer into semantic layer. In 2019,  This can act as application layer through data APIs.
Looker and LookML were branded as the first real semantic
 Enhanced query performance and cost efficiency due to
layer. By 2022, there were more additions like MetriQL,
elimination of redundant processes.
Minerva and dbt.
B. Challenges
II. DEDICATED SEMANTIC LAYER (DSL) Besides a lot of apparent advantages, DSL also brings in
some challenges to the picture. Introducing a new layer adds
DSL is implemented as a dedicated layer between data
to the complexity of the existing data platform. Another
sources and all the consumers including BI tools. Irrespective
downside is that if newer technologies or solutions are being
of the BI tool users choose, DSL allows them to work with the
leveraged to build this layer, it requires operations support
standard semantics and underlying data layer, ensuring that
leading to increased cost of maintenance. Opening access to
the data insights and reports are consistent across all the
the layer using APIs can also introduce performance issues as
integrations. With clear advantages over the predecessor
the data grows.
approaches, DSL has found a critical position within the
modern data stack.

IJISRT24OCT1676 www.ijisrt.com 2576


Volume 9, Issue 10, October – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1676

III. ARCHITECTURE  Relational Database Semantic Layers


Relational database semantic layers store data in a
The proposed architecture for DSL comprises of various relational database, such as SQL Server, PostgreSQL, or
components as described below. Oracle. These are ideal for organizations with substantial data
volumes and demanding data management requirements.
A. Data Modeling Relational databases offer robust data management features,
Data modeling is the creation of business-oriented, including data integrity and security. They can scale
logical data models that are directly mapped to the physical horizontally by adding more servers to the cluster. Being a
data structures in the data warehouse. Data is de-normalized well-established technology, relational databases integrate
and standardized with definitions of hierarchical dimensions seamlessly with other systems.
that are used in business analysis, for example organization’s
definition of fiscal year to quarter to months. However, relational databases can be slower than in-
memory semantic layers due to disk-based data retrieval.
This unit is critical as it will take care of definitions of They can be complex to set up and manage, and maintaining
key performance indicators (KPIs) metrics used in analytics. them, especially with high availability and redundancy, can be
Even though every quantifiable data point captured from the costly.
business is a metric, not all metrics are equal in value from a
decision-making perspective.  Graph Database Semantic Layers
Graph database semantic layers utilize graph databases
In this layer, we can have metrics serving feature as well like Neo4j, AWS Neptune for data storage. They are suitable
in addition to metrics creation and management. for organizations handling complex and interconnected data.
Graph databases are flexible and can handle intricate,
B. Data Transformation interconnected data. Graph databases can outperform
Data transformation based on the models is the integral relational databases for specific queries. Graph databases can
unit in the DSL which helps with curation of business-ready scale horizontally by adding more servers to the cluster.
data. The semantic layer must be able to orchestrate
transformations in the data platform. This data transformation However, graph databases might not offer the same
workflows needs to be orchestrated using standard capabilities as relational databases. Graph databases can be
orchestration engines used within the organization. The complex to set up and manage. Maintaining graph databases,
semantic layer dynamically translates incoming queries from especially with high availability and redundancy, can be
consumers (using the metrics layer logical constructs) to expensive.
platform-specific SQL (rewritten to reflect the logical to
physical mapping defined in the semantic model). Managing  Cloud Storage
performance and cost requires the capability to materialize Distributed storage can be cheaper and easy to integrate
certain views into physical tables. The output of this layer with the enterprise governance and catalog They need to be
must be stored in a low latency storage for quicker access. integrated with low latency caching or view like abstractions
to improve query performance. Setting up the infrastructure
C. Data Storage will be easier as this is already setup in the platform.
While DSL is more about the semantics, the underlying
storage is also critical while designing the serving layer for Inbuilt governance capabilities make it more reliable.
DSL. Data storage can leverage existing storage components This can scale horizontally as needed and is cost efficient.
within the data platform or add an additional layer of Adding a consumption layer on top of this storage will help
structured low latency storages. faster reads based on access patterns.

 In-Memory Semantic Layers D. Monitoring and Alerting


In-memory semantic layers load data into the server's All the orchestration and transformation workloads need
memory, allowing faster retrieval and analysis. They are to be monitored for reliability. Job failures, resource
helpful for organizations that require real-time or near-real- utilization latency are some key areas to be included. Data
time analysis of data. quality monitoring is very important too and needs to be part
of the monitoring component. Data accuracy, completeness
In-memory semantic layers leverage high-speed memory and validity can be monitored and if required reported to the
for fast data access, enabling quick loading and real-time stakeholders in case of any anomalies. Performance
analysis. This makes them ideal for time-critical applications. monitoring is another aspect that needs to be taken care of in
As data is stored in memory, there is no need to retrieve it this component. Query performance, API response time, error
from disk, resulting in low latency. rates are some of the factors to be considered.

Nevertheless, in-memory semantic layers require Monitoring and alerting techniques involve setting up
significant memory, which can be expensive. The available thresholds for each metric and alerting if they are exceeded.
RAM limits the amount of data that can be stored in memory. ETL Monitoring tools, DB monitoring tools, Log analytics,
Furthermore, horizontal scaling of in-memory semantic layers data quality tools etc. can be leveraged for this purpose.
might be more challenging compared to other types. Proactive monitoring, regular review and automated alerts are

IJISRT24OCT1676 www.ijisrt.com 2577


Volume 9, Issue 10, October – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1676

some of the best practices that can be followed to help the metadata functionality. Popular API approaches are
semantic layer to work seamlessly providing continuous Rest/Graphql.
support without interruptions. Following this approach will
ensure that the production support process is streamlined. The API layer allows seamless integration with machine
learning frameworks. Machine learning models can access
E. Data Archival and Retention data from the semantic layer for training and prediction.
This involves systematic storage and cleanup of data Developers can build custom applications that leverage the
based on the usage and data retention policies. Data lifecycle data and insights provided by the semantic layer. Data can be
management is critical to ensure that data is only stored for integrated with other systems, such as ERP, CRM, and
the time it is required. If the data is ready for archival it can be marketing automation tools.
first moved into a less frequently accessed storage for some
period after which it is permanently deleted. Data can be G. Integration with Enterprise Data Catalog and
hence categorized into frequently accessed, less frequently Governance.
accessed, unused. The unused data qualified for data purging DSL can be integrated with data catalog and governance
which will handled by this component. Determine the for enhanced data security and compliance. Users can interact
necessary retention period for different data types based on with data catalog to read all the sematic layer tables by
their business value. leveraging existing compute engines in the platform. The
semantic layer can directly access metadata from the data
Data Archival strategies can be carefully planned based catalog and create metadata in the catalog for users of DSL.
on the storage requirements. This unit also need to have data
restoration mechanism in place to cater to restoration needs as This is a powerful combination as the semantic layer
it comes. There should be inbuilt audit and compliance will be fully secure and enriched with all relevant features due
validation mechanism to ensure that the data access and to this integration. The data catalog serves as a central
compliance is intact. repository for metadata, including data definitions,
relationships, and usage guidelines including storage and
F. API Access tagging guidelines. Integrating DSL with catalog
API access allows DSL to serve as an application layer automatically inherits all these principles without needing to
by letting disparate systems or applications connect to this reinvent the wheel for the DSL. Another advantage of this
layer. API enables users to query metrics and dimensions approach is that both DSL and data catalog can enforce
using the JDBC protocol, while also providing standard consistent data definitions and usage standards.

“Fig. 4, Proposed Architecture for DSL”

IJISRT24OCT1676 www.ijisrt.com 2578


Volume 9, Issue 10, October – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1676

IV. IMPLEMENTATION APPROACHES exist, leading to inaccurate or misleading information and


ultimately poor decision-making.
The architecture outlines the components and details of
each of them. This section talks about the implementation To this problem, Uber built a solution called uMetric
approaches that can followed for building DSL. which is a unified data sematic layer [9]

A. Open-Source Tools
The most cost-efficient ones which when integrated with
the existing data platform will be good for DSL. For many
tools, there are active communities providing support and
documentation to help troubleshoot issues.

B. Software As Service
With dedicated support SaaS products help reduce the
operation and maintenance overhead.

C. Build Your Own


With the advancement of AI and ML technologies,
major data platform products are also trying to get all the
required features including Sematic layer capabilities
packaged within it. Thus, building a layer will be very easy
and cost efficient rather than buying a new product.

V. USE CASES OF DSL

The use cases of DSL are like the use cases of BI and
reporting except that introduction of dedicated layer can
ensure that the data set can be reused by multiple business “Fig. 4, uMetric at Uber” [9]
units.
VI. AI AND FUTURE OF SEMANTIC LAYER
 Sample use case #1: A retail company wants to provide its
executives and business analysts with a unified view of Leading organizations incorporate data science and
sales performance across regions, product categories, and enterprise AI into everyday decision-making in the form of
customer segments. augmented analytics. A semantic layer can be helpful in
successfully implementing augmented analytics.
DSL can be used in this case to abstract underlying sales
data from multiple databases and systems. It can be used to An AI-ready universal semantic layer can also power
define the standardized dimensions, and hierarchies, allowing customer-facing applications that enable organizations to
both executives and BI users to explore the same data set from make the most of their data and their customer interactions.
two different views like a portal application vs BI tool. This can be achieved through the integration of DSL with
customer facing application via APIs in contrast to BI tools
On top of this some of this data when integrated with which has been the primary use case so far.
ERP data can be used to help KPI metrics for supply chain
managers to help them identify the demand forecast and be A universal semantic layer that is AI-ready is needed to
prepared for the demand. This shows that the same layer can connect and work with diverse data platforms, protocols, and
be reused across multiple business domains and areas without consumption tools. This decouples the data from
compromising data silos, processing similar datasets multiple consumption, thereby enabling the democratization of data
times and thereby introduce data quality issues. analytics and AI in the enterprise [7].

 Sample use case #2: At Uber, business metrics are crucial Semantic layers can be used to publish AI/ML-generated
for understanding their performance, evaluating new insights to business users using the same analytics tools they
products, and making informed decisions. These metrics use to analyze historical data. Thus, we see that in future there
are used for various purposes, ranging from are chances for convergence of analytical tools while
troubleshooting specific issues (like a fare problem) to maintaining DSL as its meaningful business ready metadata
powering complex machine learning models that optimize
pricing strategies on a global scale. Semantic layers will expand into more new industries,
and they offer potential for all industries that rely on data,
Uber realized that standardizing metrics was essential including logistics, energy and agriculture. Their
when democratizing access to data and insights. Without implementation could bring significant benefits by extracting
standardization, multiple versions of the same metric could insights from different data sources [8].

IJISRT24OCT1676 www.ijisrt.com 2579


Volume 9, Issue 10, October – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24OCT1676

VII. SUMMARY

By bridging the gap between technical data and business


understanding, semantic layers empower business
stakeholders to derive quality KPI and metrics. Thus, it can be
summarized that DSL is a vital component of modern data
architecture. By providing a simplified, business-focused view
of data, it empowers organizations to extract actionable
insights and make informed decisions, driving innovation and
growth.

REFERENCES

[1]. J.-M. Cambot, B. Liautaud, and S. F. Sa,


“US5555403A - Relational database access system
using semantically dynamic objects - Google
Patents.” https://patents.google.com/patent/US5555403
[2]. “Microstrategy, Inc. v. Business Objects, S.A., 661 F.
Supp. 2d 548 | Casetext Search + Citator.”
https://casetext.com/case/microstrategy-inc-v-business-
objects-3
[3]. Wikipedia contributors, “Online analytical processing,”
Wikipedia, Oct. 08, 2024.
https://en.wikipedia.org/wiki/Online_analytical_process
ing
[4]. M. 2023 7 M. Read, “What is a data cube?,” 365 Data
Science, May 02, 2023.
https://365datascience.com/trending/data-cube/
[5]. Emilien, “Business Intelligence Trends 2020,”
Wiiisdom | Analytics Governance Solutions, Feb. 09,
2023. https://wiiisdom.com/ebook/business-
intelligence-trends-2020/
[6]. A. Kumar, “The semantic layer movement: the rise &
current state,” Modern Data 101, May 02, 2024.
https://moderndata101.substack.com/p/the-semantic-
movement-the-story-of
[7]. A. Keydunov, “Real-time AI experiences can’t advance
without a universal semantic layer,” RTInsights, Mar.
03, 2024. https://www.rtinsights.com/real-time-ai-
experiences-cant-advance-without-a-universal-
semantic-layer/
[8]. A. Schwanke, “Semantic layer — one layer to serve
them all - Axel Schwanke - medium,” Medium, Aug.
29, 2024. [Online]. Available:
https://medium.com/@axel.schwanke/semantic-layer-
one-layer-to-serve-them-all-d0ef7eff1ffa
[9]. “The Journey towards metric Standardization | Uber
Blog,” Uber Blog, Jan. 12, 2021.
https://www.uber.com/blog/umetric/

IJISRT24OCT1676 www.ijisrt.com 2580

You might also like