Data Warehousing Summary SET A
Data Warehousing Summary SET A
In the digital landscape, where users expect easy access and businesses
compete fiercely for attention, performance is essential. It is no longer just a
technical aspect; it has turned into a strategic imperative that directly impacts
user satisfaction, operational costs, scalability, and ultimately a company's
edge over its competitors. PERFORMANCE TUNING, the meticulous
process of refining systems and applications to achieve optimal efficiency
and responsiveness, has become an essential component in achieving
success in this dynamic environment. At its core, performance tuning
consists of identifying and resolving bottlenecks within a system or
application. Performance tuning seeks to resolve these problems in order to
optimize workflows, reduce delays, and guarantee that systems can manage
growing workloads without experiencing decreases in performance. The
advantages are numerous and have a direct effect on a company's financial
performance.
ILLUSTRATION 1.1
ILLUSTRATION 1.2
ILLUSTRATION 1.5
Examples:
7. Overloading Query Interfaces- Attackers flood the data
warehouse with excessive queries to exhaust resources.
8. API Abuse– Malicious bots repeatedly hit warehouse APIs,
causing slowdowns or crashes.
ILLUSTRATION 2.1
ILLUSTRATION 2.2
This flow diagram illustrates the best practices for ensuring data
warehousing security in a clear, step-by-step format. It begins with Access
Control and Authentication, emphasizing the importance of restricting data
access to authorized users only. Next, it highlights Data Encryption, which
secures sensitive information both at rest and in transit. The flow then moves
to Monitoring and Auditing, showing the need for continuous tracking of
activities to detect and respond to potential threats. Finally, it concludes with
Data Masking and Tokenization, which help protect sensitive data by
rendering it unreadable or substituting it with non-sensitive equivalents.
Together, these steps form a comprehensive and effective security strategy
for data warehouses.
ILLUSTRATION 3.1
The Core Characteristics of Big Data:
ILLUSTRATION 3.2
Big data is typically characterized by the 3 Vs, which help define the
magnitude of the data challenges and opportunities:
1. Volume: This refers to the sheer scale of data being produced. In the
digital age, data is being generated in unprecedented amounts, from
sources such as online transactions, social media, sensors in devices
(IoT), and more. For example, companies like Google or Facebook
manage data at the scale of petabytes (1 petabyte = 1 million gigabytes)
or even exabytes, which would overwhelm traditional data storage
systems.
2. Velocity: The speed at which data is being generated and needs to be
processed is another crucial element. With the advent of real-time data
streams from devices, transactions, and sensors, businesses need to
analyze and act on data almost instantaneously to gain actionable
insights. For instance, financial markets rely on high-frequency trading
systems that process millions of data points in real time.
ILLUSTRATION 3.3
ILLUSTRATION 3.4
When harnessed properly, big data can lead to several major benefits:
1. Improved Decision-Making: By analyzing large volumes of data,
organizations can identify patterns, trends, and correlations that lead to
more informed and data-driven decisions. For example, retail companies
can forecast demand with greater accuracy, and manufacturers can
predict maintenance needs to reduce downtime.
ILLUSTRATION 3.5
While big data offers tremendous opportunities, it also comes with several
challenges:
Big data is reshaping the business world by providing deeper insights, driving
innovation, and improving operational efficiency. To successfully leverage
big data, organizations need a robust strategy that includes data integration,
management, and analysis, along with the necessary tools and talent. While
there are significant challenges to overcome—such as data quality, security,
and infrastructure—those who effectively manage big data will be better
positioned to make informed decisions, foster growth, and remain
competitive in an increasingly data-driven world.
LESSON 4: DATA WAREHOUSING RETOOLING
Data warehousing is a critical aspect of business intelligence, allowing
companies to store and analyze large amounts of data from various sources
to make informed decisions. As technology evolves, companies are
increasingly looking to retool their data warehousing systems to keep pace
with the demands of the modern world.
Why Retool?
Scalability and Performance: Traditional data warehouses may struggle
to handle the increasing volume and velocity of data generated by modern
businesses. Retooling can address this by adopting cloud-based solutions
or optimizing existing infrastructure for better scalability and performance.
Benefits of Retooling:
▪ Improved Data Insights: Modern data warehouses can deliver richer
and more actionable insights, enabling businesses to make better
decisions.
Examples of Retooling:
❖ Retail Companies: Retailers are using cloud-based data warehouses
to analyze customer behavior, optimize pricing strategies, and
personalize marketing campaigns.
ILLUSTRATION 4.1
ILLUSTRATION 4.2
3. Data Migration and Integration:
Retooling involves migrating data from legacy systems to a new platform.
This requires careful planning and execution to ensure data integrity and
seamless integration with other systems.
ILLUSTRATION 4.3
ILLUSTRATION 4.4
ILLUSTRATION 4.5
LESSON 5: DATA WAREHOUSING TOOLS
7. Microsoft Azure also offers data warehousing capabilities. If you have data
stored in Azure Blob storage or in a data lake, you can introduce analytical
capabilities using Azure Synapse, or with Azure HDInsight. If you want to
move data from the source to the data warehouse, you can do it using
through Azure Data Factory or Oozie on Azure HDInsight.
Warehouse Storage
Software products are also needed to store warehouse data and their
accompanying metadata. Relational database management systems are
well-suited to large and growing warehouses.
ILLUSTRATION 6.1
ILLUSTRATION 6.2
ILLUSTRATION 6.3
3. SCALABILITY - As data grows exponentially, traditional data
warehouses can struggle to maintain performance or handle larger
datasets efficiently.
SOLUTION: Set clear data retention policies, archive old data into
lower-cost storage, and use tiered storage solutions to balance
cost and accessibility.