Chapter 7
Chapter 7
Data exploration - is the initial step in data analysis where you dive into a dataset to
get a feel for what it contains. It's like detective work for your data, where you uncover
its characteristics, patterns, and potential problems.
Why is it Important?
Data exploration plays a crucial role in data analysis because it helps you
uncover hidden gems within your data. Through this initial investigation, you can start to
identify:
Patterns and Trends: Are there recurring themes or relationships between different
data points?
Anomalies: Are there any data points that fall outside the expected range,
potentially indicating errors or outliers?
3. Exploratory Data Analysis (EDA) - This phase involves the application of various
statistical tools such as box plots, scatter plots, histograms, and distribution plots.
Additionally, correlation matrices and descriptive statistics are utilized to uncover links,
patterns, and trends within the data.
5. Model Building and Validation - During this stage, preliminary models are
developed to test hypotheses or predictions. Regression, classification, or clustering
techniques are employed based on the problem at hand. Cross-validation methods are
used to assess model performance and generalization.
Data visualization
HM411: Data Analytics in Hotel Industry
is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible
way to see and understand trends, outliers, and patterns in data. Additionally, it
provides an excellent way for employees or business owners to present data to non-
technical audiences without confusion.
Reference data - data used to characterize or relate to other data, such as code
lists and authority tables, are fundamental building blocks of most information
systems.
Statistics - a building block of data science, statistics helps explain data and is
essential for understanding and interpreting it.
Investment and ROI - using AI to innovate and leverage data can significantly
impact the ROI of implementing a data strategy.
HM411: Data Analytics in Hotel Industry
Types of graphs and tables that can be used to organize and summarize data:
1. Tables - can be used to organize complex data into a format that's easy to
understand. Tables are ideal for presenting numerical comparisons or categorical
information.
4. Stem-and-leaf plots - can be used to identify the center of data, or where most
data values are located.
Bivariate Analysis - is the study of the relationship between two variables, and it can
help researchers identify patterns or trends that may not be obvious when examining
each variable separately. The analysis can also help determine if one variable causes
the other.
Local Bivariate Relationships tool - a tool in ArcGIS Pro that quantifies the
relationship between two variables on a map. It calculates an entropy statistic to
determine if the values of one variable are dependent on the other.