0% found this document useful (0 votes)
12 views3 pages

Chapter 7

Uploaded by

ho7fgkcml6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Chapter 7

Uploaded by

ho7fgkcml6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

HM411: Data Analytics in Hotel Industry

DATA EXPLORATION PROCESS


is the first step in the journey of extracting insights from raw datasets. Data
exploration serves as the compass that guides data scientists through the vast sea of
information. It involves getting to know the data intimately, understanding its structure,
and uncovering valuable nuggets that lay hidden beneath the surface.

Data exploration - is the initial step in data analysis where you dive into a dataset to
get a feel for what it contains. It's like detective work for your data, where you uncover
its characteristics, patterns, and potential problems.

Why is it Important?
Data exploration plays a crucial role in data analysis because it helps you
uncover hidden gems within your data. Through this initial investigation, you can start to
identify:
 Patterns and Trends: Are there recurring themes or relationships between different
data points?
 Anomalies: Are there any data points that fall outside the expected range,
potentially indicating errors or outliers?

How Data Exploration Works or Process


1. Data Collection - This phase emphasizes recognizing data formats, structures, and
interrelationships. Comprehensive data profiling is conducted to grasp fundamental
statistics, distributions, and ranges of the acquired data.

2. Data Cleaning - This step involves employing methodologies like standardizing


data formats, identifying outliers, and imputing missing values. Data organization and
transformation further streamline data for analysis and interpretation.

3. Exploratory Data Analysis (EDA) - This phase involves the application of various
statistical tools such as box plots, scatter plots, histograms, and distribution plots.
Additionally, correlation matrices and descriptive statistics are utilized to uncover links,
patterns, and trends within the data.

4. Feature Engineering - This phase focuses on enhancing prediction models by


introducing or modifying features. Techniques like data normalization, scaling,
encoding, and creating new variables are applied. This step ensures that features are
relevant.

5. Model Building and Validation - During this stage, preliminary models are
developed to test hypotheses or predictions. Regression, classification, or clustering
techniques are employed based on the problem at hand. Cross-validation methods are
used to assess model performance and generalization.

Data visualization
HM411: Data Analytics in Hotel Industry
is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible
way to see and understand trends, outliers, and patterns in data. Additionally, it
provides an excellent way for employees or business owners to present data to non-
technical audiences without confusion.

DATA SUMMARIZATION, VISUALIZATION AND NORMALIZATION


Data summarization is the science and art of conveying information more
effectively and efficiently. Data summarization is typically numerical, visual or a
combination of the two. It is a key skill in data analysis - we use it to provide insights
both to others and to ourselves.

Building Block of Data Analysis Data storage


A vital component of real-time analytics architecture, data storage should be
able to handle large amounts of data quickly and scale.

 Data governance - a successful data analytics strategy is built on data


governance, which ensures that data is protected, managed, and used effectively.

 Data models - a foundational element of analytics and software development,


data models provide a standardized method for formatting and defining database
contents across systems.

 Reference data - data used to characterize or relate to other data, such as code
lists and authority tables, are fundamental building blocks of most information
systems.

 Statistics - a building block of data science, statistics helps explain data and is
essential for understanding and interpreting it.

 Data transformations - data transformation helps make better data-driven


decisions by extracting data and flattening the curve of its types.

 Business continuity - maintaining business continuity is a building block for a


successful data program. It's important to understand potential data gaps and how
to mitigate their risks.

 Visual analytics - visual analytics is a critical building block of healthcare


analytics, and refers to the science of analytical reasoning facilitated by interactive
visual interfaces.

 Investment and ROI - using AI to innovate and leverage data can significantly
impact the ROI of implementing a data strategy.
HM411: Data Analytics in Hotel Industry

Graphs and Tables for Summarizing and Organizing Data


Tables and graphs are tools used to organize and summarize data, and to
communicate research findings. Tables are used to group data together, while graphs
are visual representations that help people make inferences about the data.

Types of graphs and tables that can be used to organize and summarize data:
1. Tables - can be used to organize complex data into a format that's easy to
understand. Tables are ideal for presenting numerical comparisons or categorical
information.

2. Bar graphs - can be used to compare the relative size of categories.

3. Pie charts - can be used to summarize and organize data.

4. Stem-and-leaf plots - can be used to identify the center of data, or where most
data values are located.

5. Frequency distributions - can be used to organize and summarize data.

Summarization and Visualization of Bivariate Relationships

Bivariate Analysis - is the study of the relationship between two variables, and it can
help researchers identify patterns or trends that may not be obvious when examining
each variable separately. The analysis can also help determine if one variable causes
the other.

Bivariate relationships can be summarized and visualized in a variety of ways,


including:

 Scatter plots - a visual representation of the relationship between two variables,


where data points are plotted to show if there is a positive, negative, or no
relationship.

 Violin plots - a comparison of a qualitative variable with a quantitative variable,


where a kernel density estimation (KDE) is displayed for each category.

 Thematic maps - a visualization of variations in values of a variable across


geographical space, where the variable is encoded as colors, sizes, shapes, or
symbols.

 Local Bivariate Relationships tool - a tool in ArcGIS Pro that quantifies the
relationship between two variables on a map. It calculates an entropy statistic to
determine if the values of one variable are dependent on the other.

You might also like