100% found this document useful (1 vote)

167 views17 pages

2 Da

The document provides an overview of data analytics including: 1) Definitions of data analytics, data analysts, and data scientists and their key differences. 2) The four main types of data analytics: descriptive, diagnostic, predictive, and prescriptive. 3) An overview of the responsibilities of data analysts and the typical data analysis process.

Uploaded by

sanjaykt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

167 views17 pages

2 Da

Uploaded by

sanjaykt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT - II

Data Analytics: Introduction to Analytics, Introduction to Tools and Environment, Application of

Modeling in Business, Databases &Types of Data and variables, Data Modeling Techniques, Missing
Imputations etc. Need for Business Modeling.

Introduction to Data Analytics:

• Data analytics is the process of analyzing raw data in order to draw out meaningful,
actionable insights, which are then used to inform and drive smart business decisions.
• It’s all about finding patterns in a dataset which can tell you something useful and relevant about
a particular area of the business. Ex: how certain customer groups behave how employees
engage with a particular tool.
Difference between Data Analyst and Data Scientist:
• One key difference between data scientists and data analysts lies in what they do with the data
and the outcomes they achieve.
• A data analyst will seek to answer specific questions or address particular challenges that have
already been identified and are known to the business.
• To do this, they examine large datasets with the goal of identifying trends and patterns.
• They then “visualize” their findings in the form of charts, graphs, and dashboards.
• These visualizations are shared with key stakeholders and used to make informed, datadriven
strategic decisions.
• A data scientist considers what questions the business should or could be asking.
• They design new processes for data modeling, write algorithms, devise predictive models, and
run custom analyses.
• Data analysts tackle and solve discrete questions about data, often on request, revealing insights
that can be acted upon by other stakeholders, while data scientists build systems to automate and
optimize the overall functioning of the business.
• Data analysts are typically expected to be proficient in software like Excel, querying and
programming languages like SQL, R, SAS, and Python.
• Data Analysts need to be comfortable using such tools and languages to carry out data mining,
statistical analysis, database management and reporting.
• Data scientists are to be proficient in Hadoop, Java, Python, machine learning, and object-
oriented programming, together with software development, data mining, and data analysis.
Different Types of Data Analytics:
1. Descriptive Analytics - Reveal what happened in the past
2. Diagnostic Analytics -Answer why something happened
3. Predictive Analytics -Tell what will probably happen in the future
4. Prescriptive Analytics- Show what actions should be taken to make progress or avoid problems
in future
Descriptive Analytics:

• Descrip ve analy cs is a simple, surface-level type of analysis that looks at what has happened
in the past.
• The two main techniques used in descrip ve analy cs are data aggrega on and data mining. 
so, the data analyst ﬁrst gathers the data and presents it in a summarized format (that’s the
aggrega on part) and then “mines” the data to discover pa erns.
• The data is then presented in a way that can be easily understood by a wide audience (not just
data experts).
• It’s important to note that descrip ve analy cs doesn’t try to explain the historical data or
establish cause-and-eﬀect rela onships
• At this stage, it’s simply a case of determining and describing the “what Happened”.
Diagnostic Analytics:

• While descrip ve analy cs looks at the “what”, diagnos c analy cs explores the “why”.
• When running diagnos c analy cs, data analysts will ﬁrst seek to iden fy anomalies within the
data—that is, anything that cannot be explained by the data in front of them.
• For example: If the data shows that there was a sudden drop in sales for the month of March,
the data analyst will need to inves gate the cause.
• To do this, they will iden fy any addi onal data sources that might tell them more about why
such anomalies arose.
• Finally, the data analyst will try to uncover causal rela onships.
• For example, looking at any events that may correlate or correspond with the decrease in sales.
 At this stage, data analysts may use probability theory, regression analysis, ﬁltering, and
meseries data analy cs.
Predictive Analytics:

• Predic ve analy cs tries to predict what is likely to happen in the future.

• This is where data analysts start to come up with ac onable, data-driven insights that the
company can use to inform their next steps.
• Predic ve analy cs es mates the likelihood of a future outcome based on historical data and
probability theory.
• It can never be completely accurate; it does eliminate much of the guesswork from key business
decisions.
• Predic ve analy cs can be used to forecast all sorts of outcomes Ex: what products will be most
popular at a certain me how much the company revenue is likely to increase or decrease in a
given period.
• Predic ve analy cs is used to increase the business’s chances of “hi ng the mark” and taking
the most appropriate ac on.
Prescriptive Analytics:

• Building on predictive analytics, prescriptive analytics advises on the actions and decisions
that should be taken.

• Prescriptive analytics shows you how you can take advantage of the outcomes that have been
predicted.
• When conducting prescriptive analysis, data analysts will consider a range of possible scenarios
and assess the different actions the company might take.
• Prescriptive analytics is one of the more complex types of analysis, and may involve working
with algorithms, machine learning, and computational modeling procedures.
• However, the effective use of prescriptive analytics can have a huge impact on the company’s
decision-making process.

Responsibilities of Data Analyst:

• Manage the delivery of user satisfaction surveys and report on results using data visualization
software
• Work with business line owners to develop requirements, define success metrics, manage and
execute analytical projects, and evaluate results
• Monitor practices, processes, and systems to identify opportunities for improvement
• Proactively communicate and collaborate with stakeholders, business units, technical teams and
support teams to define concepts and analyze needs and functional requirements
• Translate important questions into concrete analytical tasks
• Gather new data to answer client questions, collating and organizing data from multiple sources
• Apply analytical techniques and tools to extract and present new insights to clients using reports
and/or interactive dashboards
• Collaborate with data scientists and other team members to find the best product solutions
• Establish data processes, define data quality criteria, and implement data quality processes
• Take ownership of the codebase, including suggestions for improvements and refactoring
• Build data validation models and tools to ensure data being recorded is accurate
• Work as part of a team to evaluate and analyze key data that will be used to shape future business
strategies
The Data Analysis Process:

Step 1: Define the Question

• The first step is to identify why you are conducting analysis and what question or challenge
you hope to solve.
• At this stage, you’ll take a clearly defined problem and come up with a relevant question or
hypothesis you can test.  You’ll then need to identify what kinds of data you’ll need and where
it will come from.  For example: A potential business problem might be that customers aren’t
subscribing to a paid membership after their free trial ends. Your research question could then be
“What strategies can we use to boost customer retention?”

Step 2: Collect the data

• Data analysts will usually gather structured data from primary or internal sources, such as CRM
software or email marketing tools.
• They may also turn to secondary or external sources, such as open data sources.
• These include government portals, tools like Google Trends, and data published by major
organizations such as UNICEF and the World Health Organization.

Step 3: Clean the data

• Original dataset may contain duplicates, anomalies, or missing data which could distort how the
data is interpreted, so these all need to be removed.
• Data cleaning can be a time-consuming task, but it’s crucial for obtaining accurate results.

Step 4: Analyze the data

• How you analyze the data will depend on the question you’re asking and the kind of data you’re
working with, but some common techniques include regression analysis, cluster analysis, and
time-series analysis

Step5: Visualize and share your findings

• This final step in the process is where data is transformed into valuable business insights.
• Depending on the type of analysis conducted, you’ll present your findings in a way that others
can understand—in the form of a chart or graph
Data Analytics Techniques:

The most common data analytics techniques are:

• Regression analysis:
o This method is used to estimate or “model” the relationship between a set of variables.
o Regression analysis is mainly used to make predictions.
• Factor analysis: (Dimension Reduction) o This technique helps data analysts to uncover the
underlying variables that drive people’s behavior and the choices they make. o Ultimately, it
condenses the data in many variables into a few “super-variables”, making the data easier to
work with. o For example: If you have three different variables which represent customer
satisfaction, you might use factor analysis to condense these variables into just one all-
encompassing customer satisfaction score.
• Cohort analysis:
o A cohort is a group of users who have a certain characteristic in common within a specified
time period
o For example, all customers who purchased using a mobile device in March may be
considered as one distinct cohort.
o In cohort analysis, customer data is broken up into smaller groups or cohorts; so, instead
of treating all customer data the same, companies can see trends and patterns over time
that relate to particular cohorts.
o In recognizing these patterns, companies are then able to offer a more targeted service.
• Cluster analysis:
o This technique is all about identifying structures within a dataset. o Cluster analysis
essentially segments the data into groups that are internally homogenous and externally
heterogeneous
o In other words, the objects in one cluster must be more similar to each other than they are
to the objects in other clusters. o Cluster analysis enables you to see how data is distributed
across a dataset where there are no existing predefined classes or groupings.
o In marketing, for example, cluster analysis may be used to identify distinct target groups
within a larger customer base.
• Time-series analysis:
o Time-series data is a sequence of data points which measure the same variable at different
points in time.
o Time-series analysis, then, is the collection of data at specific intervals over a period of
time in order to identify trends and cycles, enabling data analysts to make accurate
forecasts for the future. o If you wanted to predict the future demand for a particular
product, you might use time-series analysis to see how the demand for this product
typically looks at certain points in time.
Data Analytics Tools:

• Microsoft Excel o It is a software program that enables you to organize, format, and
calculate data using formulas within a spreadsheet system. o Microsoft Excel may be used by
data analysts to run basic queries and to create pivot tables, graphs, and charts.
o Excel also features a macro programming language called Visual Basic for Applications
(VBA).
• Tableau
o It is a popular business intelligence and data analytics software which is primarily used as
a tool for data visualization.
o Data analysts use Tableau to simplify raw data into visual dashboards, worksheets, maps,
and charts. o This helps to make the data accessible and easy to understand, allowing data
analysts to effectively share their insights and recommendations.
• SAS (Statistical Analysis Software) o It is a command-driven software package used for
carrying out advanced statistical analysis and data visualization.
o SAS is one of the most widely used software packages in the industry.
• RapidMiner
o It is a software package used for data mining (uncovering patterns), text mining, predictive
analytics, and machine learning.
o Used by both data analysts and data scientists. o RapidMiner comes with a wide range of
features—including data modeling, validation, and automation.
• Power BI o It is a business analytics solution that lets you visualize your data and share
insights across your organization.
o Similar to Tableau, Power BI is primarily used for data visualization. o While Tableau is
built for data analysts, Power BI is a more general business intelligence tool.

Skills required for Data Analysts:

• Mathematical and statistical ability

• Knowledge of programming languages such as SQL, Oracle, R and Python
• An analytical mindset
• Keen problem-solving skills
• Excellent communication skills
Types of Data and Variables:

Qualitative Data or Categorical Data:

• Qualitative is data that can’t be measured or counted in the form of numbers.  These types of
data are sorted by category, not by number.
• These data consist of audio, images, symbols, or text.
• The gender of a person, i.e., male, female, or others, is qualitative data.
The Qualitative data are further classified into two parts :

Nominal Data:

• Nominal Data is used to label variables without any order or quantitative value.
• The colour of hair can be considered nominal data, as one colour can’t be compared with another
colour.

Examples of Nominal Data:

• Colour of hair (Blonde, red, Brown, Black, etc.)

• Marital status (Single, Widowed, Married)
• Nationality (Indian, German, American)
• Gender (Male, Female, Others)
• Eye Color (Black, Brown, etc.)
Ordinal Data:

• Ordinal data have natural ordering where a number is present in some kind of order by their
position on the scale.
• These data are used for observation like customer satisfaction, happiness, etc., but we can’t do
any arithmetical tasks on them.
• The ordinal data is qualitative data for which their values have some kind of relative position.
• These kinds of data can be considered as “in-between” the qualitative data and quantitative data.
• The ordinal data only shows the sequences and cannot use for statistical analysis.

Examples of Ordinal Data :

• When companies ask for feedback, experience, or satisfaction on a scale of 1 to 10

• Letter grades in the exam (A, B, C, D, etc.)
• Ranking of peoples in a competition (First, Second, Third, etc.)
• Economic Status (High, Medium, and Low)
• Education Level (Higher, Secondary, Primary)

Difference between Nominal and Ordinal Data

Nominal Data Ordinal Data

Nominal data can’t be quantified, neither they Ordinal data gives some kind of sequential order by
have any intrinsic ordering their position on the scale
Nominal data is qualitative data or categorical Ordinal data is said to be “in-between” qualitative
data data and quantitative data
They don’t provide any quantitative value, They provide sequence and can assign numbers to
neither we can perform any arithmetical ordinal data but cannot perform the arithmetical
operation operation
Nominal data cannot be used to compare with Ordinal data can help to compare one item with
one another another by ranking or ordering
Examples: Eye colour, housing style, gender,
Examples: Economic status, customer satisfaction,
hair colour, religion, marital status, ethnicity,
education level, letter grades, etc
etc
Quantitative Data or Numeric Data:

• Quantitative data can be expressed in numerical values, which makes it countable and includes
statistical data analysis.
• It answers the questions like, “how much,” “how many,” and “how often.”
• For example, the price of a phone, the computer’s RAM, the height or weight of a person, etc.,
falls under the quantitative data.
• Quantitative data can be used for statistical manipulation
Examples of Quantitative Data :

• Height or weight of a person or object

• Room Temperature
• Time

The Quantitative data are further classified into two parts :

Discrete Data:

• The term discrete means distinct or separate.

• The discrete data contain the values that fall under integers or whole numbers.  The total
number of students in a class is an example of discrete data.
• These data can’t be broken into decimal or fraction values.  The discrete data are countable and
have finite values  Their subdivision is not possible.
• These data are represented mainly by a bar graph, number line, or frequency table.
Examples of Discrete Data :

• Total numbers of students present in a class

• Cost of a cell phone
• Numbers of employees in a company
• The total number of players who participated in a competition  Days in a week Continuous
Data:

• Continuous data are in the form of fractional numbers.

Examples of Continuous Data :

• Height of a person
• Speed of a vehicle  “Time-taken” to finish the work
• Wi-Fi Frequency
• Market share price
Difference between Discrete and Continuous Data
Discrete Data Continuous Data
Discrete data are countable and finite; they are Continuous data are measurable; they are in the
whole numbers or integers form of fraction or decimal
Discrete data are represented mainly by bar graphs Continuous data are represented in the form of a
histogram
The values cannot be divided into subdivisions into The values can be divided into subdivisions into
smaller pieces smaller pieces
Continuous data are in the form of a continuous
Discrete data have spaces between the values
sequence
Examples: Total students in a class, number of Example: Temperature of room, the weight of a
days in a week, size of a shoe, etc person, length of an object, etc

Data Modelling Techniques:

• Based on different business goals and data sets, there are three learning models for algorithms.
Each machine learning algorithm settles into one of the three models:

• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning

Supervised Learning is further divided into two categories:

• Regression
• Classification

Unsupervised Learning is also divided into below categories:

• Clustering
• Association Rule
• Dimensionality Reduction
Supervised Learning Models:
• Supervised Learning is the simplest machine learning model to understand in which input data
is called training data and has a known label or result as an output.
• It works on the principle of input-output pairs.
• It requires creating a function that can be trained using a training data set, and then it is applied
to unknown data and makes some predictive performance.
• Supervised learning is task-based and tested on labeled data sets.

Regression

• In regression problems, the output is a continuous variable.

Some commonly used Regression models are as follows:

Linear Regression:
• Linear regression is the simplest machine learning model in which we try to predict one output
variable using one or more input variables.
• The representation of linear regression is a linear equation, which combines a set of input
values(x) and predicted output(y) for the set of those input values.
• It is represented in the form of a line:
Y = bx+ c.

• The main aim of the linear regression model is to find the best fit line that best fits the data points.
• Linear regression is extended to multiple linear regression (find a plane of best fit) and
polynomial regression (find the best fit curve).
Decision Tree

• Decision trees are the popular machine learning models that can be used for both regression and
classification problems.
• A decision tree uses a tree-like structure of decisions along with their possible consequences and
outcomes.
• In this, each internal node is used to represent a test on an attribute  Each branch is used to
represent the outcome of the test.
• The more nodes a decision tree has, the more accurate the result will be.
• The advantage of decision trees is that they are intuitive and easy to implement, but they lack
accuracy.
• Decision trees are widely used in operations research, specifically in decision analysis,
strategic planning, and mainly in machine learning.

Random Forest

• Random Forest is the ensemble learning method, which consists of a large number of decision
trees.
• Each decision tree in a random forest predicts an outcome, and the prediction with the majority
of votes is considered as the outcome.
• A random forest model can be used for both regression and classification problems.
• For the classification task, the outcome of the random forest is taken from the majority of votes.
• Whereas in the regression task, the outcome is taken from the mean or average of the predictions
generated by each tree.

Classification:

Classification models are the second type of Supervised Learning techniques, which are used to
generate conclusions from observed values in the categorical form. For example, then classification
model can identify if the email is spam or not; a buyer will purchase the product or not, etc.
Classification algorithms are used to predict two classes and categorize the output into different groups.

In classification, a classifier model is designed that classifies the dataset into different categories, and
each category is assigned a label.

There are two types of classifications in machine learning:

• Binary classification: If the problem has only two possible classes, called a binary classifier.
For example, cat or dog, Yes or No,
• Multi-class classification: If the problem has more than two possible classes, it is a multi-class
classifier.
Some popular classification algorithms are as below:

a) Logistic Regression

Logistic Regression is used to solve the classification problems in machine learning. They are similar
to linear regression but used to predict the categorical variables. It can predict the output in either Yes
or No, 0 or 1, True or False, etc. However, rather than giving the exact values, it provides the
probabilistic values between 0 & 1.

b) Support Vector Machine

Support vector machine or SVM is the popular machine learning algorithm, which is widely used for
classification and regression tasks. However, specifically, it is used to solve classification problems.
The main aim of SVM is to find the best decision boundaries in an N-dimensional space, which can
segregate data points into classes, and the best decision boundary is known as Hyperplane. SVM selects
the extreme vector to find the hyperplane, and these vectors are known as support vectors.

c) Naïve Bayes

Naïve Bayes is another popular classification algorithm used in machine learning. It is called so as it is
based on Bayes theorem and follows the naïve(independent) assumption between the features which is
given as:
Each naïve Bayes classifier assumes that the value of a specific variable is independent of any other
variable/feature. For example, if a fruit needs to be classified based on color, shape, and taste. So yellow,
oval, and sweet will be recognized as mango. Here each feature is independent of other features.

Unsupervised Learning Models:

Unsupervised Machine learning models implement the learning process opposite to supervised
learning, which means it enables the model to learn from the unlabeled training dataset. Based on the
unlabeled dataset, the model predicts the output. Using unsupervised learning, the model learns
hidden patterns from the dataset by itself without any supervision.

Unsupervised learning models are mainly used to perform three tasks, which are as follows:

• Clustering
Clustering is an unsupervised learning technique that involves clustering or groping the
data points into different clusters based on similarities and differences. The objects with the
most similarities remain in the same group, and they have no or very few similarities from
other groups.
Clustering algorithms can be widely used in different tasks such as Image segmentation,
Statistical data analysis, Market segmentation, etc.
Some commonly used Clustering algorithms are K-means Clustering, hierarchal Clustering,
DBSCAN, etc.

• Association Rule Learning

Association rule learning is an unsupervised learning technique, which finds interesting
relations among variables within a large dataset. The main aim of this learning algorithm is to
find the dependency of one data item on another data item and map those variables accordingly
so that it can generate maximum profit. This algorithm is mainly applied in Market Basket
analysis, Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FPgrowth
algorithm.
• Dimensionality Reduction
The number of features/variables present in a dataset is known as the dimensionality of the
dataset, and the technique used to reduce the dimensionality is known as the dimensionality
reduction technique.
Although more data provides more accurate results, it can also affect the performance of the
model/algorithm, such as overfitting issues. In such cases, dimensionality reduction techniques
are used.
"It is a process of converting the higher dimensions dataset into lesser dimensions dataset
ensuring that it provides similar information."
Different dimensionality reduction methods such as PCA(Principal Component Analysis),
Singular Value Decomposition, etc.

Reinforcement Learning Models

In reinforcement learning, the algorithm learns actions for a given set of states that lead to a goal state.
It is a feedback-based learning model that takes feedback signals after each state or action by interacting
with the environment. This feedback works as a reward (positive for each good action and negative for
each bad action), and the agent's goal is to maximize the positive rewards to improve their performance.
The behavior of the model in reinforcement learning is similar to human learning, as humans learn
things by experiences as feedback and interact with the environment.

Below are some popular algorithms that come under reinforcement learning:

• Q-learning: Q-learning is one of the popular model-free algorithms of reinforcement learning,

which is based on the Bellman equa on.

It aims to learn the policy that can help the AI agent to take the best action for maximizing the reward
under a specific circumstance. It incorporates Q values for each state-action pair that indicate the reward
to following a given state path, and it tries to maximize the Q-value.

Missing Imputations:
Imputation:
• Imputation is a technique used for replacing the missing data with some substitute value to retain
most of the data/information of the dataset.

We use imputation because Missing data can cause the below issues: –

1. Incompatible with most of the Python libraries used in Machine Learning: While using the
libraries for ML(the most common is skLearn), they don’t have a provision to automatically
handle these missing data and can lead to errors.
2. Distortion in Dataset:- A huge amount of missing data can cause distortions in the variable
distribution i.e it can increase or decrease the value of a particular category in the dataset.
3. Affects the Final Model:- The missing data can cause a bias in the dataset and can lead to a
faulty analysis by the model.

Imputation using (Mean/Median) values:

• This works by calculating the mean/median of the non-missing values in a column and then
replacing the missing values within each column separately and independently from the others.
• It can only be used with numeric data.

Pros:

• Easy and fast.

• Works well with small numerical datasets.

Cons:

• Doesn’t factor the correlations between features. It only works on the column level.
• Will give poor results on encoded categorical features (do NOT use it on categorical features).
• Not very accurate.
• Doesn’t account for the uncertainty in the imputations.

Imputation Using Most Frequent Values:

• Most Frequent is another statistical strategy to impute missing values.

• It works with categorical features (strings or numerical representations) by replacing missing
data with the most frequent values within each column.
Pros:

• Works well with categorical features.

Cons: It also doesn’t factor the correlations between features. It can introduce bias in the data.

Zero or Constant imputation — as the name suggests — it replaces the missing values with either
zero or any constant value you specify

Arbitrary Value Imputation:

• This is an important technique used in Imputation as it can handle both the Numerical and
Categorical variables.
• This technique states that we group the missing values in a column and assign them to a new
value that is far away from the range of that column.
• Mostly we use values like 99999999 or -9999999 or “Missing” or “Not defined” for numerical
& categorical variables.
Pros:

• Easy to implement.
• We can use it in production.
• It retains the importance of “missing values” if it exists.

Cons:

• Can distort original variable distribution.

• Arbitrary values can create outliers.
• Extra caution required in selecting the Arbitrary value.

Business Model:
• A Business Model can be defined as a representation of a business or solution that often include
a graphic component along with supporting text and relationships to other components.
• Business Model is a structured model, just like a blueprint for the final product to be developed.
• It gives structure and dynamics for planning.
• It also provides the foundation for the final product.
• With the help of modelling techniques, we can create a complete description of existing and
proposed organizational structures, processes, and information used by the enterprise.

Need for Business Modeling:

• Business modelling is used to design current and future state of an enterprise.

• This model is used by the Business Analyst and the stakeholders to ensure that they have an
accurate understanding of the current “As-Is” model of the enterprise.
• It is used to verify if, stakeholders have a shared understanding of the proposed “To-be of the
solution.

• Analyzing requirements is a part of business modelling process and it forms the core focus area.
• Functional Requirements are gathered during the “Current state”.
• These requirements are provided by the stakeholders regarding the business processes, data, and
business rules that describe the desired functionality which will be designed in the Future State.

Chapter 1. INTRODUCTION: 1.1 3 1.2 Background of The Research 4 1.3 Problem Statement 5 1.4 Research Objectives 5
No ratings yet
Chapter 1. INTRODUCTION: 1.1 3 1.2 Background of The Research 4 1.3 Problem Statement 5 1.4 Research Objectives 5
36 pages
Research Methods For The Behavioral Sciences - 6th Edition Study Guide Download
100% (8)
Research Methods For The Behavioral Sciences - 6th Edition Study Guide Download
14 pages
Mathematical and Computational Sciences
No ratings yet
Mathematical and Computational Sciences
18 pages
5 Books That Will Teach You More Than Any College Degree - New Trader U
No ratings yet
5 Books That Will Teach You More Than Any College Degree - New Trader U
7 pages
9FM0-3B-4B Further Statistics 1 - SAMs PDF
No ratings yet
9FM0-3B-4B Further Statistics 1 - SAMs PDF
23 pages
Cosm - QB2
No ratings yet
Cosm - QB2
2 pages
Pearce - Tourist Scams Exploring The Dimensions of An International Tourism Phenomenon
No ratings yet
Pearce - Tourist Scams Exploring The Dimensions of An International Tourism Phenomenon
10 pages
Scope of Statistics in Fashion Industry
75% (4)
Scope of Statistics in Fashion Industry
13 pages
Tutorial 3: Section A
No ratings yet
Tutorial 3: Section A
4 pages
Unit 3 DM
No ratings yet
Unit 3 DM
15 pages
Application Question BBM
No ratings yet
Application Question BBM
3 pages
Testing of Hypothesis For DIFFERENT Proportion - : Large Sample Test
No ratings yet
Testing of Hypothesis For DIFFERENT Proportion - : Large Sample Test
15 pages
Module IV - Logistic Regression
No ratings yet
Module IV - Logistic Regression
16 pages
S1 Probability PDF
No ratings yet
S1 Probability PDF
8 pages
Time Series
100% (5)
Time Series
45 pages
Competence I - (EC1&2)
No ratings yet
Competence I - (EC1&2)
9 pages
Itae0006 Exam
No ratings yet
Itae0006 Exam
4 pages
Sampling Distribution of OLS Estimator of A Monte Carlo Simulation
No ratings yet
Sampling Distribution of OLS Estimator of A Monte Carlo Simulation
3 pages
Stochastic Processes Practicals
No ratings yet
Stochastic Processes Practicals
36 pages
1 Da
No ratings yet
1 Da
12 pages
Practical Research 2: Descriptive
No ratings yet
Practical Research 2: Descriptive
19 pages
CA Foundation Dec 20 Business Mathematics Paper
No ratings yet
CA Foundation Dec 20 Business Mathematics Paper
19 pages
WQD7005 Final Exam - 17219402
No ratings yet
WQD7005 Final Exam - 17219402
12 pages
An Introductory Guide To Shazam
No ratings yet
An Introductory Guide To Shazam
138 pages
3 Da
No ratings yet
3 Da
16 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
Planning Data Analysis
No ratings yet
Planning Data Analysis
11 pages
Case Problem 2 Harbor Dunes Golf Course
No ratings yet
Case Problem 2 Harbor Dunes Golf Course
2 pages
PH Ysics: Manual For M.Sc. (P) Nuclear Physics Lab
No ratings yet
PH Ysics: Manual For M.Sc. (P) Nuclear Physics Lab
177 pages
Skeletal Development of The Hand and Wrist Digital PDF
No ratings yet
Skeletal Development of The Hand and Wrist Digital PDF
9 pages
Business Research Methods: © Oxford Fajar Sdn. Bhd. (008974-T), 2012 1 - 1
No ratings yet
Business Research Methods: © Oxford Fajar Sdn. Bhd. (008974-T), 2012 1 - 1
41 pages
Practical Guide To Controlled Experiments On The Web: Listen To Your Customers Not To The Hippo
No ratings yet
Practical Guide To Controlled Experiments On The Web: Listen To Your Customers Not To The Hippo
34 pages
Introduction To Statistics Web
No ratings yet
Introduction To Statistics Web
18 pages
(Haskins) Practical Guide To Critical Thinking PDF
100% (2)
(Haskins) Practical Guide To Critical Thinking PDF
14 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

2 Da

Uploaded by

2 Da

Uploaded by

UNIT - II

Data Analytics: Introduction to Analytics, Introduction to Tools and Environment, Application of

Introduction to Data Analytics:

• Predic ve analy cs tries to predict what is likely to happen in the future.

Responsibilities of Data Analyst:

Step 1: Define the Question

Step 2: Collect the data

Step 3: Clean the data

Step 4: Analyze the data

Step5: Visualize and share your findings

The most common data analytics techniques are:

Skills required for Data Analysts:

• Mathematical and statistical ability

Qualitative Data or Categorical Data:

Examples of Nominal Data:

• Colour of hair (Blonde, red, Brown, Black, etc.)

Examples of Ordinal Data :

• When companies ask for feedback, experience, or satisfaction on a scale of 1 to 10

Difference between Nominal and Ordinal Data

Nominal Data Ordinal Data

• Height or weight of a person or object

The Quantitative data are further classified into two parts :

• The term discrete means distinct or separate.

• Total numbers of students present in a class

• Continuous data are in the form of fractional numbers.

Data Modelling Techniques:

Supervised Learning is further divided into two categories:

Unsupervised Learning is also divided into below categories:

• In regression problems, the output is a continuous variable.

There are two types of classifications in machine learning:

b) Support Vector Machine

Unsupervised Learning Models:

• Association Rule Learning

Reinforcement Learning Models

• Q-learning: Q-learning is one of the popular model-free algorithms of reinforcement learning,

Imputation using (Mean/Median) values:

• Easy and fast.

Imputation Using Most Frequent Values:

• Most Frequent is another statistical strategy to impute missing values.

• Works well with categorical features.

Arbitrary Value Imputation:

• Can distort original variable distribution.

Need for Business Modeling:

• Business modelling is used to design current and future state of an enterprise.

You might also like