Dsa Report
Dsa Report
* Identifying products that are selling well or A manufacturing company might use
poorly the mean to calculate the average
production time for a product. This
* Developing marketing campaigns information could be used to identify
* Forecasting future sales areas for improvement and set
production targets.
( Now let us discuss the measures of central The mean is a powerful tool that can be
tendency that can be used to summarize and used to gain insights from data and
describe the main features of a dataset) make informed decisions.
(For example, if a dataset contains a few very It is calculated by ordering the values in
high or low values, the mean may not be a dataset from smallest to largest and
representative of the typical value in the then selecting the middle value.
dataset.) If there are two middle values, the
median is the mean of those two
The mean is often used in data analytics values.
to compare different datasets. The median is a good measure of
central tendency for datasets that are
(For example, a business might compare the
not normally distributed, or that
mean sales of different products or the mean
contain outliers.
customer satisfaction scores of different
regions.) (For example, if a dataset contains a few very
high or low values, the median will not be
The mean can also be used to track
skewed by those values.)
trends over time.
The median is often used in data (The mode is a good measure of central
analytics to compare different datasets. tendency for datasets that are not normally
The median can also be used to track distributed, or that contain outliers. For
trends over time. example, if a dataset contains a few very high or
low values, the mode will not be skewed by
Here are some examples of how the median is those values.)
used in data analytics:
The mode is often used in data analytics to
A real estate company might use the identify
median to calculate the median home
price in a neighborhood. This the most popular product
information could be used to help the most common customer behavior
buyers and sellers determine a fair price or the most frequent error message
for a home.
A government agency might use the (It can also be used to track trends over time.
median to calculate the median income For example, a business might track the mode
for a particular region. This information product category each quarter to see what
could be used to develop economic types of products are becoming more or less
policies and programs. popular.)
To calculate the median in Excel, you can use Here are some examples of how the mode is
the MEDIAN function. The MEDIAN function used in data analytics:
takes a range of cells as input and returns the A manufacturing company might use
median of the values in those cells. the mode to identify the most common
Mode defect in a product. This information
could be used to improve the
(In data analytics, the mode is the most manufacturing process and reduce the
frequent value in a dataset.) number of defective products.
A healthcare provider might use the
It is a measure of central tendency, mode to identify the most common
which is a statistic that provides diagnosis for a particular condition. This
information about the typical value in a information could be used to develop
dataset. treatment protocols and improve
The mode is calculated by counting the patient care.
number of times each value appears in The mode is a powerful tool that can be
a dataset and then selecting the value used to gain insights from data and
with the highest count. make informed decisions.
If there are two values with the same It is important to note that the mode is
highest count, the dataset is said to be not always the same as the mean or
bimodal. median.
If there are more than two values with
the same highest count, the dataset is (For example, if a dataset contains a few very
said to be multimodal. high or low values, the mode may be different
from the mean and median. In some cases, it (The standard deviation can be used to track
may be more appropriate to use the mode as a how the variability of a dataset changes over
measure of central tendency, while in other time.)
cases, it may be more appropriate to use the
mean or median. The best way to decide which Here are some examples of how the standard
measure of central tendency to useis to deviation is used in data analytics:
consider the specific dataset and the question A medical researcher might use the
that you are trying to answer.) standard deviation to compare the
-To calculate the mode in Excel, you can use the effectiveness of two different
MODE function. The MODE function takes a treatments. A higher standard deviation
range of cells as input and returns the most in the treatment group indicates that
frequent value in those cells. the treatment is more variable in its
effects.
Standard Deviation A quality control engineer might use the
standard deviation to monitor the
is a measure of how spread out the quality of a manufacturing process. A
values in a dataset are. high standard deviation indicates that
It is calculated by taking the square root the process is more variable and that
of the variance, which is the average of there is a higher risk of producing
the squared deviations from the mean. defective products.
A high standard deviation means that
the values in the dataset are spread out -It is important to note that the standard
over a wide range, while a low standard deviation is only a meaningful measure of
deviation means that the values in the variability for datasets that are normally
dataset are clustered close to the mean. distributed. If a dataset is not normally
distributed, other measures of variability, such
The standard deviation is often used in data as the interquartile range, may be more
analytics to: appropriate.
Identify outliers -To calculate the standard deviation in Excel,
(Values that are more than two standard you can use the STDEV.S function. The STDEV.S
deviations away from the mean are considered function takes a range of cells as input and
to be outliers. Outliers can be caused by errors returns the standard deviation of the values in
in data collection or entry, or they may those cells.
represent real but unusual events.) Range
Compare datasets -In data analytics, the range is the difference
(The standard deviation can be used to between the largest and smallest values in a
compare the variability of two or more dataset. It is a simple but effective measure of
datasets. variability, or how spread out the values in a
dataset are.
Track trends over time
-The range is a useful measure of variability for To calculate the range in Excel, you can use the
a variety of tasks in data analytics, such as: following formula: =MAX(range)-MIN(range)
where range is the range of cells that you want
Identifying outliers to calculate the range for.
(Values that are outside of the normal range Variance
may be outliers, which can be caused by errors
or unusual events.) -It is calculated by taking the average of the
squared deviations from the mean.
Comparing datasets
-To calculate the variance, you can use the
(The range can be used to compare the following formula:
variability of two or more datasets)
Variance = ∑(x - x̄ )2 / n
satisfaction scores for different products or
regions. where:
(Overall, the range is a useful and versatile -Variance is a powerful tool that can be used to
measure of variability in data analytics. It is gain insights from data and make informed
simple to calculate and interpret, and it can be decisions.
used for a variety of tasks. However, it is
important to be aware of its limitations, such as Here is an example of how variance is used in
its sensitivity to outliers and its lack of data analytics:
robustness to non-normality.) A financial analyst might use variance
Here is an example of how the range is used in to calculate the risk of a particular
data analytics: investment. A high variance indicates
that the investment is more risky, while
A financial analyst might use the range a low variance indicates that the
to assess the risk of a particular investment is less risky.
investment. A stock with a high range of
price movements is considered to be -Variance is a powerful tool that can be used to
more risky than a stock with a low gain insights from data and make informed
range of price movements. decisions. Overall, variance is a useful and
versatile measure of variability in data analytics.
It is simple to calculate and interpret, and it can -Kurtosis is a powerful tool that can be used to
be used for a variety of tasks. gain insights from data and make informed
decisions.
To calculate the variance in Excel, you can use
the VAR.S function. The VAR.S function takes a Here are some examples of how kurtosis is used
range of cells as input and returns the variance in data analytics:
of the values in those cells.
A medical researcher might use kurtosis
Kurtosis to compare the effectiveness of two
different treatments. A higher kurtosis
Kurtosis in data analytics is a measure in the treatment group indicates that
of how the tails of a distribution are the treatment is more variable in its
distributed compared to a normal effects.
distribution. A quality control engineer might use
It is calculated by taking the fourth kurtosis to monitor the quality of a
central moment of a distribution manufacturing process. A high kurtosis
divided by the fourth power of the indicates that the process is more
standard deviation. variable and that there is a higher risk
-A higher kurtosis means that the tails of the of producing defective products.
distribution are heavier than a normal -Overall, kurtosis is a useful and versatile
distribution, while a lower kurtosis means that measure of the shape of a distribution in data
the tails of the distribution are lighter than a analytics. It is simple to calculate and interpret,
normal distribution. and it can be used for a variety of tasks.
-Kurtosis is a useful measure of the shape of a It is important to note that kurtosis is not a
distribution, and it can be used to: perfect measure of the shape of a distribution,
Identify outliers For example, a distribution can have a high
kurtosis even if it is not symmetrical. However,
(Values that are more than two standard kurtosis is still a valuable tool for data analysts
deviations away from the mean are considered who want to gain insights from data and make
to be outliers. Kurtosis can be used to identify informed decisions.
outliers, even if they are not too far away from
the mean.)
- A normal distribution has a skewness of zero. - Note that skewness is not a perfect measure
of the shape of a distribution.
Skewness is a useful measure of the shape of a
distribution, and it can be used to: (example, a distribution can have a high
skewness even if it is not symmetrical.)
Identify outliers:
Here is an example of a skewness graph:
(Values that are more than two standard
deviations away from the mean are considered (IPAKITA MO YONG GRAPH NA NASA SINEND NI
to be outliers. Skewness can be used to identify SIR)
outliers, even if they are not too far away from (A distribution can have right (or positive), left
the mean) (or negative), or zero skewness. A rightskewed
distribution is longer on the right side of its
Compare distributions peak, and a left-skewed distribution is longer on
the left side of its peak)
(Skewness can be used to compare the
- Skewness can be a useful indicator of
asymmetry of two or more distributions.)
important trends and patterns in data.(if the
Track trends over time skewness of income data increases over time, it
may indicate that the gap between the rich and
(Skewness can be used to track how the the poor is widening. Similarly, if the skewness
asymmetry of a distribution changes over of test score data increases over time, it may
time.) indicate that students are becoming more
stratified in terms of their academic abilities.)
Here are some examples of how skewness is
used in data analytics: To calculate skewness in Excel, you can use the
SKEW function. The SKEW function takes a
A financial analyst might use skewness
range of cells as input and returns the skewness
to assess the risk of a particular
of the values in those cells.
investment. A high skewness indicates
that the investment is more likely to (I INSERT HERE YONG NA RECORD NA VID)
experience extreme returns, both
positive and negative. This information Advance Data Analysis Tools
Excel has a number of advanced data analysis • A production engineer might use Solver to
tools that can be used to perform complex find the optimal production schedule that
statistical and engineering analyses. minimizes costs.
Some of the most popular advanced data - Advanced data analysis tools can be a valuable
analysis tools in Excel include: asset for anyone who works with large amounts
of data. By using these tools, you can gain
• PivotTables insights from your data that would be difficult
(are a powerful tool for summarizing and to see with the naked eye.
analyzing large amounts of data, allows you to Excel functions for Data Analysis
quickly and easily create tables and charts that
show trends and patterns in your data (Excel has a wide range of functions that can be
used for data analysis)
• Data Analysis ToolPak
Some of the most commonly used functions
( is a collection of statistical and engineering include:
functions that can be used to perform complex
analyses, to use the Data Analysis ToolPak, you • SUM: Sums the values in a range of cells.
must first load it as an add-in)
• AVERAGE: Calculates the average of the
• Goal Seek values in a range of cells.
(can be used to find the value of one input cell • COUNT: Counts the number of cells in a range
that will produce a desired result in another that contain numbers.
cell)
• MAX: Returns the largest value in a range of
• Solver cells.
(can be used to solve complex optimization • MIN: Returns the smallest value in a range of
problems. It can be used to find the best cells.
solution to a problem, given a set of constraints.
• VLOOKUP: Looks up a value in a table and
Here are some examples of how advanced data returns the corresponding value from another
analysis tools in Excel can be used: column in the table.
• A financial analyst might use PivotTables to • IF: Returns one value if a condition is met and
analyze stock market data and identify trends. another value if the condition is not met.
• A marketing analyst might use the Data • COUNTIFS: Counts the number of cells in a
Analysis ToolPak to perform a regression range that meet multiple criteria.
analysis to identify the relationship between
advertising spending and sales. • SUMIFS: Sums the values in a range of cells
that meet multiple criteria.
• A sales manager might use Goal Seek to find
the optimal sales target for each salesperson. • XLOOKUP: A newer function that combines
the functionality of VLOOKUP and
HLOOKUP (a function that looks up a value in a (This involves collecting data from a sample of
table and returns the corresponding value from the population of interest. The data should be
another row in the table). collected in a random and unbiased way.)
(CAN U MAKE A VIDEO LANG HERE HAJIE
SHOWING THESE FUNCTIONS) 3. Analyze the data
(This involves stating the null hypothesis and (If the p-value is less than a certain significance
the alternative hypothesis. The null hypothesis level (typically 0.05), then the null hypothesis is
is typically a statement that there is no rejected and the alternative hypothesis
difference between two groups or that there is is )accepted.)
no change over time. The alternative hypothesis
T-test examples in excel
is the opposite of the null hypothesis.)
To perform a t-test in Excel, you can use the
2. Collect data
T.TEST function. The T.TEST function takes a
number of argumentsincluding the following:
• Array1
(The first range of cells containing the data for researcher would conclude that one teaching
the first group) method is more effective than the other.
• Type