Unitwise Imp Notes
Unitwise Imp Notes
10. What are the diffrences between data analytics & business analytics
12.What are the different tools used for data analytics? Explain
Tools used in Descriptive Analytics
i.Statistical Summary : It provides statistical descriptions for a given business metric, e.g.
Mean, Median, Standard Deviation, Percentile, Interquartile range, etc.
ii.Z–Score : Z Score tells us how far (in terms of standard deviation) is a particular value
of x
from its mean.
iii. Coefficient of Variance : It is a ratio where we divide standard deviation with mean.
iv. Interquartile Range : It is an important measure to gauge the variation in the dataset.
Data
v.Dashboard: Is a tool used to track, organise, visualize, analyse data. Overall purpose is
to make it easier for data analysts, decision makers and average users to understand their
data, gain deeper insights and make better data- driven decisions
vi. Descriptive Statistics: Includes central tendency, variability, and frequency
distribution of the dataset. The frequency distribution records how often data occurs,
central tendency records the data's centre point of distribution, and variability of a data
set records its degree of dispersion
Returning visitors: The number of visitors who return to your site after an initial visit
may indicate whether your web design and marketing efforts are effective
Customer lifetime value: This measures the value you‟ll gain from a repeat customer
and can help you decide whether you should prioritize new customer acquisition or
retention
Marketing Analytics
Marketing analytics helps business owners gain insight into their customers‟ preferences
and track the effectiveness of their marketing campaigns.
18.What is NLP?
Natural Language Processing (NLP) is a subfield of artificial intelligence that studies the
interaction between computers and languages. The goals of NLP are to find new methods of
communication between humans and computers, as well as to grasp human speech as it is
uttered
UNIT 2- PROBABILITY & STATISTICAL METHODS
1. Define sample space
A sample space is a collection or a set of possible outcomes of a random
experiment. The sample space is represented using the symbol, “S”.
Ex: For rolling a die, we will get the sample space, S as {1, 2, 3, 4, 5, 6 }
2. Define probability?
the probability of an event is a measure of how likely the event is to occur
when we run the experiment. Mathematically, probability is a function on
the collection of events that satisfies certain axioms.
3. What is event? Explain the types
The collection of some outcomes of an experiment is called an event.
Probability comes into application in various fields. Probability refers to the
occurrence of a random event.
The probability of an event E is defined as
P(E) = [Number of favorable outcomes of E]/[Total number of possible
outcomes].
The different types of events in probability are:
1. Sure event
2. Impossible event
3. Independent event
4. Dependent event
5. Mutually exclusive event
6. Complementary event
7. Compound event
8. Exhaustive event
9. Simple event
Sure Event
Example:
The probability of an event that has all outcomes of the experiment, i.e., sample
space, is 1.
Impossible Event
Independent Event
When the outcome of the first event does not influence the outcome of the second
Example: The event of getting a tail after tossing a coin and the event of getting a
head when tossing another coin.
Dependent Event
When the outcome of the first event influences the outcome of the second event,
those events are called dependent events.
Example: If we draw two colored marbles from a bag and the first marble is not
replaced before we draw the second marble, then the outcome of the second draw
will depend on the outcome of the first draw.
These events cannot happen at the same time. They cannot occur at the same time.
Example: The events of getting head and tail are mutually exclusive while tossing
a coin.
Complementary Event
For any event A, another event, A„, shows the remaining elements of the sample
space S. A‟ = S – A.
Example: Suppose the set of the first 10 natural numbers is a sample space, S = {1,
2, 3, 4, 5, 6, 7, 8, 9, 10} and A be the event of choosing an even number less than
10. So, A = {2, 4, 6, 8}
If an event has more than one sample point, it is termed as a compound event.
Exhaustive Event
Example: Suppose E1 be the event of getting an even number and E2 be the event
of getting an odd number when throwing a die.
Simple event
An event that has a single point of the sample space is known as a simple event in
probability.
The standard normal distribution has two parameters: the mean and the
standard deviation.
In a normal distribution the mean is zero and the standard deviation is 1.
Properties of the Normal Distribution
First, its mean (average), median (midpoint), and mode (most frequent
observation) are all equal to one another. Moreover, these values all
represent the peak, or highest point, of the distribution. The distribution then
falls symmetrically around the mean, the width of which is defined by the
standard deviation.
The Formula for the Normal Distribution
where:
x = value of the variable or data being examined and f(x) the probability
function
μ = the mean
σ = the standard deviation
Applications
Marks scored on the test
Heights of different persons
Size of objects produced by the machine
Blood pressure and so on.
where,
t = t-statistic
m = mean of the group
µ = theoretical value or population mean
n = sample size
s = standard deviation of the group
where,
t = t-statistic
m = mean of the group
s = standard deviation of the group
n = group size or sample size
9. Explain ANOVA
ANOVA stands for Analysis of Variance. It is a statistical method used to
analyze the differences between the means of two or more groups or
treatments.ANOVA is also called the Fisher analysis of variance, and it is
the extension of the t tests.
The ANOVA test allows a comparison of more than two groups at the same
time to determine whether a relationship exists between them. The result of
the ANOVA formula, the F statistic (also called the F-ratio), allows for the
analysis of multiple groups of data to determine the variability between
samples and within samples.
Types of ANOVA
1.One way ANOVA –has just one independent variable. For example,
difference in IQ can be assessed by Country, and County can have 2, 20, or
more different categories to compare.
2.Two way ANOVA – assess two independent variables
3.Multivariate ANOVA – assess multiple independent variables
Formula of ANOVA
F= Mean sum of squares between the groups (MSB)/ Mean squares of
errors (MSE).
Therefore F = MSB/MSE
where,
Mean squares between groups, MSB = SSB / (k – 1)
Mean squares of errors, MSE = SSE / (N – k)
of freedom of errors, N – k = df2 here, N is the total number of observations
throughout k groups.
SSB = ∑ nj j – )2 SSE =∑∑ - j)2
Correlation types
The correlation is said to be simple when only two variables are studied. The
correlation is either multiple or partial when three or more variables are studied.
Positive Correlation The value of one variable increases linearly with an increase in
another variable. This indicates a similar relation between both
variables. So its correlation coefficient would be positive or 1
in this case.
Negative Correlation When there is a decrease in the values of one variable with an
increase in the values of another variable, in that case, the
correlation coefficient would be negative.
Zero Correlation or There is one more situation when there is no specific relation
No Correlation between two variables.
The basic concept of linear regression is to find a line that best fits the data points.
Logistic regression
iv. Decision-making
Data visualization can help users recognize new patterns and errors in the data.
Data visualization can be a faster and more effective communication tool than
reports and spreadsheets.
iv.Saves time
Data visualization tools can simplify the data analysis process and present results
attractively.
v.Improves insights
Improper visualization
The core of a lot of issues and disadvantages stems from this main one. If you‟re
not careful in how you build your visualizations, you may end up with
visualizations that don‟t properly convey your data. This can lead to confusion and
issues down the line if you use that improper viz to do analysis and draw
conclusions.
Incorrect conclusions
As talked about above, a risk of using data visualization is that your audience may
draw incorrect conclusions. And that‟s not just because of improper visualizations.
Sometimes a visual medium can lead to confusion in the viewer, so different
people in your audience may walk away with drastically different conclusions after
viewing the same viz.
Inexact
features are:
i.data connectivity
ii. data transformation
iii. data modeling
iv. isualization
v. collaboration
vi. mobile access
vii. natural language queries
viii. AI- powered insights
ix. real time data streaming
x. custom visalization
xi. integration with other Microsoft products
1)Tile:- In Power BI, a tile is a snapshot of data that is pinned to a dashboard. Tiles
can be created from a variety of sources, including reports, dashboards, the Q&A
box, Excel, and SQL Server Reporting Services (SSRS) reports.
part of any Power BI report, as they help users identify and understand patterns in
the data. Visualizations part in powerBI
connected to, and used for reporting and visualizations. Datasets include data,
tables, relationships, calculations, and a connection to the data source. You can
find in Get data option->Excel,Web,Text/CSV(comma-separated values).You will
get plenty of datasets in kaggle.com where you can download the datasets.
Disadvantages are
Power BI does not accept file sizes larger than 1 GB and doesn't mix
imported data accessed from real-time connections.
There are very few data sources that allow real-time connections to
Power BI reports and dashboards.
It only shares dashboards and reports with users logged in with the
same email address.
Dashboard doesn't accept or pass user, account, or other entity
parameters.
Survey definition
A survey is a research method that collects data from a large sample or
population to understand their opinions on a topic.
A survey refers to research where data is gathered from an entire
population or a very large sample in order to comprehend the opinions
on a particular matter
A case study is a detailed study of a specific subject, such as a person,
group, or situation.