Data Analytics
Data Analytics
1.1 Data Analytics and Data Science: Introduction, Characteristics, and Need
Introduction:
• Data Analytics refers to the process of examining datasets to draw conclusions about the
information they contain. It involves various techniques and tools to analyze raw data for
insights.
• Data Science is a broader field that uses scientific methods, processes, algorithms, and systems
to extract knowledge and insights from structured and unstructured data.
Characteristics:
• Data Analytics focuses on specific queries and provides direct insights to support decision-
making. It is often used in business intelligence.
• Data Science involves a more complex process that includes predictive modeling, machine
learning, and data engineering. It aims to discover new questions that data can answer.
Need:
• Businesses require Data Analytics to make informed decisions, improve operational efficiency,
and gain a competitive edge.
• Data Science is essential for discovering patterns and trends in large datasets, leading to new
innovations, product development, and strategic planning.
• Nominal: Categories without any inherent order (e.g., gender, color). It’s used for labeling
variables without any quantitative value.
• Ratio: Similar to interval data but with a true zero point (e.g., height, weight). It allows for the
computation of ratios.
• Interval: Numerical data with meaningful intervals between values but no true zero (e.g.,
temperature in Celsius). You can add and subtract values, but multiplying or dividing them isn’t
meaningful.
1.3 Data Analytics Life Cycle
1. Discovery: Identify business problems and objectives. Understand the data requirements and
define the project’s scope.
2. Data Preparation: Clean and preprocess data, handle missing values, and transform data to be
ready for analysis.
3. Model Planning: Select algorithms and techniques to model the data. Prepare for the modeling
phase.
4. Model Building: Develop models based on the chosen methods. Fine-tune and validate models.
5. Communicate Results: Interpret the model outcomes and present findings to stakeholders using
visualizations and reports.
6. Operationalize: Deploy the model in a production environment. Ensure the model is accessible
and usable by end-users.
7. Data Analytics Principles: Focus on accuracy, consistency, validity, and reliability in the entire
process to ensure the quality of insights.
• Healthcare: Predictive analytics for patient outcomes, improving clinical decision-making, and
optimizing operations.
• Data Acquisition is crucial for gathering raw data necessary for analysis. Without accurate and
relevant data, the analytics process is ineffective.
• Web Scraping is used to extract large amounts of data from websites for further analysis, which
is especially useful for research, competitive analysis, and price monitoring.
Process:
• Secondary Data Sources: Data collected from existing sources such as reports, books, or
databases. It is cost-effective and easily accessible but may not be as specific.
Repositories: Data repositories include databases, data lakes, and cloud storage, where vast amounts of
data are stored and accessed for analytics.
Approaches: Approaches to data acquisition can range from automated scripts for web scraping to
manual collection methods like surveys and interviews.
• Data Scraping: Automated extraction of data from websites using scripts or tools.
• Biometric Techniques: Collecting data from biological attributes like fingerprints or facial
recognition.
• Sensing: Collecting data through sensors, like IoT devices in smart cities or wearables.
• Web Scraping: Extracting data from websites using automated bots or scripts.
• Report Mining: Extracting data from reports or documents using text mining techniques.
Data Transformation
• Data Transformation is crucial for converting raw data into a format that can be easily analyzed.
It involves cleaning, normalizing, and structuring data.
Impacts:
• Transformed data leads to more accurate analysis, better insights, and improved decision-
making. It also enhances the performance of data models.
• Imputations: Replacing missing values with substitutes like the mean, median, mode, or using
more sophisticated methods like regression or K-nearest neighbors (KNN).
• Restructuring Data: Adjusting the data's structure to align with the requirements of the
analytical process, such as aggregating, filtering, or splitting data sets.
• Feature Extraction: Reducing the dimensionality of the data by identifying and selecting the
most important features, using methods like Principal Component Analysis (PCA) or Linear
Discriminant Analysis (LDA).