0% found this document useful (0 votes)
34 views5 pages

Cloud-Based Data Warehouse Model Creation, Loading, and Performance Evaluation: A Comparative Analysis

This research paper aims to investigate the process of creating and loading a data warehouse model on cloud platforms and evaluate its performance. With the increasing adoption of cloud computing, organizations are leveraging cloud platforms to store and process large volumes of data for analytics purposes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

Cloud-Based Data Warehouse Model Creation, Loading, and Performance Evaluation: A Comparative Analysis

This research paper aims to investigate the process of creating and loading a data warehouse model on cloud platforms and evaluate its performance. With the increasing adoption of cloud computing, organizations are leveraging cloud platforms to store and process large volumes of data for analytics purposes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Cloud-based Data Warehouse Model


Creation, Loading, and Performance
Evaluation: A Comparative Analysis
1 2
Pooja D. Kavishwar (Research Scholar) Dr. S. R. Pande (HOD)
Department of Computer Science Shivaji Science College, Department of Computer Science, Shivaji Science College,
Congress Nagar, Nagpur, Maharashtra, India Congress Nagar, Nagpur, Maharashtra, India

Abstract:- This research paper aims to investigate the II. METHODOLOGY


process of creating and loading a data warehouse model
on cloud platforms and evaluate its performance. With A. Selection of Cloud Platforms for Evaluation
the increasing adoption of cloud computing,
organizations are leveraging cloud platforms to store  Google Big Query :
and process large volumes of data for analytics purposes. It is a serverless, highly scalable, and cost-effective
By examining the data warehouse model creation, cloud data warehouse service from Google Cloud Platform
loading procedures, and performance metrics on (GCP). BigQuery uses the Google File System (GFS) for
different cloud platforms, this study aims to provide storage and the BigQuery Query Engine for querying data.
insights into the strengths and weaknesses of various Big Query is a good choice for businesses that need to
platforms in supporting efficient and scalable data analyze large amounts of data quickly and easily.[4]
warehousing solutions.
 Snowflake :
Keywords:- Data Warehouse models; Conceptual Data It is a cloud-based data warehouse that is designed to
Models; Physical Data Models; Analytics Query; Query be highly scalable, secure, and easy to use. Snowflake uses a
Response Time. unique architecture that separates storage from compute,
which allows it to scale horizontally and provide high
I. INTRODUCTION performance. Snowflake is a good choice for businesses that
need to analyze large amounts of data quickly and easily.[5]
Data warehousing is a process of collecting,
organizing, and storing large volumes of structured and/or B. Data Warehouse Model Creation Process
unstructured data from various sources to support decision- The foremost step to start working on data warehouse
making and business intelligence activities. modeling is to understand the business for which the data
warehouse is to be created and used for.[6] Business and
Cloud-based data warehousing refers to the practice of business requirements play a vital role in data modeling.[7]
hosting and managing data warehouses on cloud computing The process of Building data-driven decision-making and
platforms.Cloud platforms offer many features like data performance analysis analytical systems to support a range
scalability and flexibility to handle varying data volumes. of business queries is stated in this research paper. The
Cloud-based data warehousing reduces infrastructure costs, business domain selected for the experimental work is Retail
as they pay for resources on a usage basis.[1] Cloud business domain. The research paper focuses on analytics
providers employ robust security measures to protect data, like checking products orders status, local supplier revenue
including encryption, access controls, and data backups. generation, Predicting Revenue generation, checking
Cloud platforms offer integration capabilities with various shipping Volume. Thus addressing these requirements, a
data sources and analytics tools, allowing seamless data scheme for the warehouse model is to be designed.
ingestion and integration. [2]
The unstructured data which was considered for the
Data warehouse models serve as the foundation for study had following aspects present in it.
decision support and business intelligence activities.
Accurate and well-designed data warehouse models,  Sales Data:
combined with efficient loading processes and optimal Products details, highest purchase consumer details,
performance, enable organizations to extract meaningful Revenue and profit margins over a product, over the time
insights and make data-driven decisions more effectively. sales performance data.
Evaluating performance on cloud platforms ensures that
decision-makers can access timely and accurate information  Consumer Segmentation Details:
for analysis.[3] Consumers Demographics, Purchasing Behavior, and
Preferences.

IJISRT23JUL414 www.ijisrt.com 783


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Inventory Data:  Nation:This entry represents names of the nations in the
Total available products, its quality information, respective regions. It has attributes as key and name of
supplier details, demographics data etc. nation with region key.
 The relationships between these entities can be
 Supplier Data: represented as follows:
Details of the vendor with its delivery times, product  Consumers can place multiple orders, so there is a one-
quality, and pricing. to-many relationship between the consumer and Order
entities.
 Order Data:  Orders can contain multiple line items, so there is a
Consumers Purchase Data Details. Product Sold Its one-to-many relationship between the Order and
Details and the Supplier which Sold those Product Details. OrderItem entities.
I.  Products can be associated with multiple line items, so
 Objectives of Designing a Schema of a Data Warehouse there is a one-to-many relationship between the Product
Model. and OrderItem entities.
Based on the above unstructured form of data  Suppliers can provide multiple products, so there is a
following analysis can be made. These analysis criteria are one-to-many relationship between the Supplier and
the objectives of designing a schema of a data warehouse Product entities.
model.  Region entries have many Nations entries.

 Analysis can be made in identifying target markets,  Logical Data Model


developing personalized marketing strategies, and The logical data model for the snowflake schema
improving consumer retention. would further refine the conceptual data model by
 Assessing supplier performance based on criteria such specifying the attributes, data types, and relationships
as delivery times, product quality, and pricing. This between entities. Here is a representation of the logical data
analysis can help in vendor selection, negotiation, and model for the snowflake schema:
maintaining healthy supplier relationships.
 Order Fulfillment Analysis can be done to improve  Entities:
consumer satisfaction by reducing delivery times and
ensuring timely order processing.  Consumer {ConsumerID (Primary Key), Name,
 Financial analysis by examining revenue, expenses, and Address, City, State, Zip, Phone, …}
profitability across different dimensions such as  Supplier { SupplierID (Primary Key), Name, Address,
products, regions, or time periods. This analysis can City, State, Zip, Phone, …}
provide insights into cost structures, profitability  Product { ProductID (Primary Key), Name,
drivers, and financial trends Description, Price, …}
 ProdSupp {ProductID (Foreign Key),SupplierID
 Based on these Objectives Conceptual Design can be (Foreign Key),AvailQty, Supplycost}
Drawn  Order { OrderID (Primary Key), consumerID (Foreign
Key), OrderDate, …}
 Conceptual Model of the Data Warehouse Model  OrderItem { OrderID (Foreign Key), ProductID
(Foreign Key), Quantity, Price}
 Consumer: This entity represents information about the  Region {RegionKey(Primary Key), Name}
consumers, including their unique identifier, name,  Nation {NationKey(Primary Key) ,RegionKey(Foreign
address, contact details, and demographic information. Key), Name}
 Supplier: This entity captures details about the
suppliers, such as their identifier, name, address, and  Relationships:
contact information.
 Product: This entity represents the products offered by  One Consumer can place multiple Orders (One-to-
the business. It includes attributes like a unique Many)
identifier, product name, description, price, and other  Many Supplier can supply multiple Products (Many-to-
relevant product details. Many)
 Order: This entity represents consumer orders. It  One Product can be included in multiple
includes attributes such as an order identifier, order OrderOrderItems (One-to-Many)
date, consumer identifier, and additional information  One Order can have multiple OrderItems (One-to-
related to the order. Many)
 OrderItem: This entity captures individual line items
within an order. It includes attributes such as the order  Physical Model
identifier, product identifier, quantity, price, and other A physical model for the snowflake schema refers to
details specific to each line item. the way data is stored and organized on disk in a database
 Region: This entry represents regions of the globe. It system to optimize query performance. It involves several
includes only attributes as key and name of the region, components, including tables, indexes, and partitions.[8]
The model consists of a set of tables that represent different

IJISRT23JUL414 www.ijisrt.com 784


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
entities in a decision support system, such as Consumer, indexes used in this model. Partitioning improves query
orders, order items, suppliers, and more. These tables store performance by reducing the amount of data that needs to be
the actual data that is used to generate the analytics queries. scanned for a particular query. Orders table is partitioned
Each table has a defined schema with columns that into two tables orders and order items where order items
correspond to specific attributes or characteristics of the store individual order details. The physical model involves
entity it represents. Indexes are data structures that help designing and configuring these components in a way that
speed up query processing by allowing faster access to optimizes query execution and resource utilization. [10]
specific data based on selected columns.[9] Keys are the

Fig1 Physical Model ER Diagram

C. Loading Data on to the Snowflake and GCP Cloud Platforms

Fig 2 Load Time of Different Tables on the Two Platforms

IJISRT23JUL414 www.ijisrt.com 785


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
D. Analytics Queries Fired on Snow Flake and GCP Cloud Tables used: Consumer, Orders, Order Item, Supplier,
Platform Nation, Region

 Order Priority Checking Query  Forecasting Revenue Change Query


The Order Priority Checking Query counts the number The Forecasting Revenue Change Query considers all
of orders ordered in a given quarter of a given year in which the OrderItems shipped in a given year with discounts
at least one Order Item was received by the consumer later between DISCOUNT-0.01 and DISCOUNT+0.01. The
than its committed date. The query lists the count of such potential revenue increase is equal to the sum of
orders for each order priority sorted in ascending priority [l_extendedprice * l_discount] for all OrderItems with
order. discounts and quantities in the qualifying range.

Tables used: Orders, Order Item Tables used: Order Item

 Local Supplier Volume Query  Volume Shipping Query


The Local Supplier Volume Query lists for each The Volume Shipping Query finds, for two given
Nation in a Region the revenue volume that resulted from nations, the gross discounted revenues derived from Order
Order Item transactions in which the Consumer ordering Items. The query lists the supplier nation, the consumer
parts and the supplier filling them were both within that nation, the year, and the revenue from shipments that took
nation. Revenue volume for all qualifying Order Items in a place in that year.
particular nation is defined as sum(l_extendedprice * (1 - Tables used: Order Item, Supplier, Order Item, Orders,
l_discount)). Consumer, Nation

III. RESULTS

 Presentation of Findings Related to Data Warehouse Analytics Queries Response Time on Different Cloud Platform.

Fig 3 Data ware house Response Time for the Analytics Queries

IV. CONCLUSION self-hosted platforms because they have access to more


resources and can scale more easily.
 Comparison of the results across different cloud
platforms REFERENCES
The load time for each table also varies depending on
the size and complexity of the table. For example, the [1] Niu, Y., Ying, L., Yang, J., Bao, M., &
fact_OrderItem table is the largest table in the dataset, and it Sivaparthipan, C. B. (2021). Organizational business
has the longest load time. intelligence and decision making using big data
analytics. Information Processing & Management,
The data shows that GCP BigQuery is significantly 58(6), 102725.
faster than Snowflake SnowSQL for all tables. This is likely [2] Díaz, M., Martín, C., & Rubio, B. (2016). State-of-
due to the fact that GCP BigQuery is a cloud-based the-art, challenges, and open issues in the integration
platform, while Snowflake SnowSQL is a self-hosted of Internet of things and cloud computing. Journal of
platform. Cloud-based platforms are typically faster than Network and Computer applications, 67, 99-117.

IJISRT23JUL414 www.ijisrt.com 786


Volume 8, Issue 7, July – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[3] Oketunji, T., & Omodara, O. (2011). Design of Data
Warehouse and Business Intelligence System: A case
study of Retail Industry.
[4] Bisong, E., & Bisong, E. (2019). Google bigquery.
Building Machine Learning and Deep Learning
Models on Google Cloud Platform: A Comprehensive
Guide for Beginners, 485-517.
[5] Dageville, B., Cruanes, T., Zukowski, M., Antonov,
V., Avanes, A., Bock, J., ... & Unterbrunner, P.
(2016, June). The snowflake elastic data warehouse.
In Proceedings of the 2016 International Conference
on Management of Data (pp. 215-226).
[6] Manjunath, T. N., Hegadi, R. S., & Ravikumar, G. K.
(2010). Analysis of data quality aspects in
datawarehouse systems. International Journal of
Computer Science and Information Technologies,
2(1), 477-485.
[7] Simsion, G., & Witt, G. (2004). Data modeling
essentials. Elsevier.
[8] Lightstone, S. S., Teorey, T. J., & Nadeau, T. (2010).
Physical Database Design: the database professional's
guide to exploiting indexes, views, storage, and more.
Morgan Kaufmann.
[9] Chaudhuri, S., & Dayal, U. (1997). An overview of
data warehousing and OLAP technology. ACM
Sigmod record, 26(1), 65-74.
[10] Polyvyanyy, A., Ouyang, C., Barros, A., & van der
Aalst, W. M. (2017). Process querying: Enabling
business intelligence through query-based process
analytics. Decision Support Systems, 100, 41-56.

IJISRT23JUL414 www.ijisrt.com 787

You might also like