1.data Analytics Overview and Variables Disruptive System
1.data Analytics Overview and Variables Disruptive System
asdfghjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmqwerty
uiopasdfghjklzxcvbnmqwertyuiopasdfghj
klzxcvbnmqwertyuiopasdfghjklzxcvbnmq
Data Analytics
wertyuiopasdfghjklzxcvbnmqwertyuiopas
Disruptive Technology Innovation
dfghjklzxcvbnmqwertyuiopasdfghjklzxcv
Mr. Arindam Ghosh
bnmqwertyuiopasdfghjklzxcvbnmqwertyu
iopasdfghjklzxcvbnmqwertyuiopasdfghjkl
zxcvbnmqwertyuiopasdfghjklzxcvbnmqw
ertyuiopasdfghjklzxcvbnmqwertyuiopasd
fghjklzxcvbnmqwertyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnmrtyuiopasd
fghjklzxcvbnmqwertyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnmqwertyuio
pasdfghjklzxcvbnmqwertyuiopasdfghjklz
xcvbnmqwertyuiopasdfghjklzxcvbnmqwe
rtyuiopasdfghjklzxcvbnmqwertyuiopasdf
Overview of Data Analytics
Data Analytics refers to the process of examining large datasets to uncover hidden patterns,
correlations, trends, and insights that can be used to make informed decisions. It combines tools,
techniques, and algorithms to analyze raw data, transforming it into actionable information. The
primary goal is to support better decision-making by identifying valuable insights that can
improve efficiency, productivity, and innovation.
Data Analytics refers to the process of systematically analyzing raw data to extract valuable
insights, trends, and patterns that inform decision-making. It involves using various tools,
techniques, and methodologies to process large sets of data, enabling organizations to make data-
driven decisions. Data analytics can be applied across industries for optimizing operations,
improving customer experiences, and predicting future trends.
There are four primary types of data analytics, each serving a specific purpose in the analysis
process:
1. Descriptive Analytics
o Purpose: To understand and summarize past events or data.
o Function: It focuses on answering the question, "What happened?" by analyzing
historical data to identify trends, patterns, or performance metrics.
o Example: A sales report showing how much revenue was generated last quarter.
2. Diagnostic Analytics
o Purpose: To explain why something happened.
o Function: This type of analysis digs deeper into data to determine the cause of a
specific outcome. It answers the question, "Why did it happen?" by identifying
relationships and correlations within the data.
o Example: Analyzing why a specific product’s sales dropped by examining factors
like seasonality, competitor actions, or changes in customer preferences.
3. Predictive Analytics
o Purpose: To forecast future outcomes based on historical data.
o Function: It uses statistical models, algorithms, and machine learning techniques
to predict what is likely to happen in the future. Predictive analytics answers the
question, "What might happen?"
o Example: Predicting future demand for a product based on past sales trends and
current market conditions.
4. Prescriptive Analytics
o Purpose: To suggest the best course of action for achieving a desired outcome.
o Function: This type of analysis goes beyond predicting future events and
provides recommendations on how to take advantage of potential opportunities or
mitigate risks. It answers the question, "What should we do?"
o Example: Recommending optimal pricing strategies or inventory management
decisions to maximize profit or minimize costs.
Data analysis involves various tools and techniques that help transform raw data into meaningful
insights. These tools and techniques are selected based on the complexity of the data, the type of
analysis required, and the objectives of the analysis. Below is an overview of commonly used
tools and techniques.
1. Spreadsheet Tools
o Microsoft Excel: A widely-used tool for basic data analysis, offering
functionalities like pivot tables, charts, and built-in statistical formulas.
o Google Sheets: Similar to Excel, with cloud-based collaboration features. It
allows basic data manipulation, formula application, and data visualization.
2. Programming Languages
o Python: A powerful, flexible programming language popular for data
manipulation and analysis. Libraries like Pandas, NumPy, and Matplotlib
simplify data handling, statistical analysis, and visualization.
o R: A language specifically designed for statistical analysis and visualization. It’s
widely used for complex data modeling, predictive analysis, and creating detailed
visual reports.
3. Business Intelligence (BI) Tools
o Tableau: A visualization tool that transforms data into interactive dashboards and
reports. It helps businesses make decisions by presenting insights visually.
o Power BI: A Microsoft tool for creating detailed dashboards and reports. It
integrates well with Excel and other Microsoft tools, making it easy to use for
business analysis.
o QlikView: A business intelligence and data visualization tool used for creating
interactive dashboards and data visualizations for data discovery.
4. Statistical Tools
o SPSS (Statistical Package for the Social Sciences): A tool commonly used for
statistical analysis in social sciences, market research, and healthcare. It provides
functionalities for descriptive statistics, regression, and hypothesis testing.
o SAS (Statistical Analysis System): A comprehensive tool for advanced analytics,
multivariate analysis, and data management, often used in large organizations for
data warehousing and predictive analytics.
5. Big Data Tools
o Apache Hadoop: A framework used to store and process large datasets in
distributed computing environments. It is useful for analyzing massive amounts of
structured and unstructured data.
o Apache Spark: A fast big data processing engine that can handle both batch and
real-time data processing. It’s used for machine learning, data streaming, and
graph analytics.
6. Machine Learning Tools
1. Data Cleaning
o Purpose: To remove errors, inconsistencies, and missing values from the dataset,
ensuring the data is accurate and ready for analysis.
o Techniques: Handling missing data, removing duplicates, correcting outliers, and
transforming data into a consistent format.
2. Exploratory Data Analysis (EDA)
o Purpose: To summarize the main characteristics of the data using statistical
graphics and visualization methods, allowing the analyst to uncover patterns, spot
anomalies, and formulate hypotheses.
o Techniques: Descriptive statistics (mean, median, standard deviation),
visualizations (histograms, scatter plots), and correlation analysis.
3. Data Visualization
o Purpose: To visually represent data in charts, graphs, and dashboards to help
users understand patterns, relationships, and trends.
o Techniques: Line charts, bar charts, pie charts, scatter plots, heatmaps, and
dashboards for presenting data insights.
4. Statistical Analysis
o Purpose: To apply statistical methods to analyze data, identify relationships, and
test hypotheses.
o Techniques:
Descriptive statistics (mean, median, mode, standard deviation)
Inferential statistics (hypothesis testing, regression analysis, ANOVA)
Correlation and covariance analysis to understand relationships between
variables.
5. Regression Analysis
o Purpose: To determine the relationship between dependent and independent
variables, and to predict future values.
o Techniques:
Linear regression for continuous outcomes.
Logistic regression for binary outcomes (e.g., yes/no predictions).
6. Time Series Analysis
o Purpose: To analyze data points collected or recorded at specific time intervals,
helping predict future trends.
o Techniques:
ARIMA (AutoRegressive Integrated Moving Average) for forecasting.
Seasonal decomposition for identifying trends, seasonality, and residuals.
In data analysis, variables are the characteristics or properties of data that are measured,
observed, or recorded. Understanding the types of variables is crucial because different types of
variables require different methods of analysis. Variables can be broadly classified into two main
categories: Quantitative and Qualitative (or Categorical). Each of these has subtypes,
described below:
Quantitative variables represent measurable quantities and can be expressed as numbers. They
allow for mathematical operations such as addition, subtraction, and averaging.
Discrete Variables: These variables take specific, separate values. They are often
countable and have a finite number of possible values.
o Example: Number of students in a class, number of cars in a parking lot, number
of goals scored in a game.
Continuous Variables: These variables can take an infinite number of values within a
given range. They are measurable and can include decimals or fractions.
o Example: Height of individuals, temperature, time taken to complete a task,
distance traveled.
Qualitative variables represent categories or characteristics that cannot be measured but can be
classified or labeled. They describe qualities or attributes.
Nominal Variables: These are variables that categorize data without any inherent order
or ranking between the categories. The categories are mutually exclusive.
3. Binary Variables
Binary variables are a specific type of categorical variable that can take only two possible values,
typically representing the presence or absence of a particular attribute.