0% found this document useful (0 votes)
52 views9 pages

SUJAL SINGH 22215148 - Assignment 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views9 pages

SUJAL SINGH 22215148 - Assignment 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Q1

The given data has two columns A and B with 50 rows. Column A contains a list of numbers and Column B
contains another list of numbers.

One way to visually represent this data is through a line graph or a scatter plot, where each row in the data set
is represented by a point or a line segment. However, since the data does not have a clear time component, it
might not make sense to use a line graph in this case.

Another way to visually represent this data is through a histogram, which can show the distribution of the
numbers in each column. From the histogram of Column A, it can be observed that most of the numbers are
between 0 and 100, with a few numbers being much larger. From the histogram of Column B, it can be
observed that the numbers are more spread out and do not follow a clear pattern.

The summary statistics for this data set are as follows:

The mean of Column A is 85.8536, while the mean of Column B is 66.4702.


The median of Column A is 45, while the median of Column B is 57.335.
The mode of Column A is 5, while the mode of Column B is 65.
The standard deviation of Column A is 179.4561892, while the standard deviation of Column B is 78.14428961.
From these summary statistics, it can be observed that the mean and median values for Column A are
significantly larger than those for Column B. This suggests that Column A contains a few very large numbers
that are pulling the average up. The standard deviation for Column A is also much larger than that for Column
B, further indicating that the numbers in Column A are more spread out.

Overall, this data set has a wide range of values and is more spread out in Column A than in Column B. The
presence of a few very large numbers in Column A is skewing the mean and standard deviation values.
Q2)

The given data represents the sales information of various items from different companies.
The table includes the team name, company name, rate per item, quantity sold, and the total
amount earned for each item sold. The table also includes a summary of total quantity sold,
total rate, and total earnings.

From the given data, we can observe that:

1)The company "Alpha" has sold the most number of items and earned the highest total
amount, followed by "ADC" and "LG".

2)The items "RAM" and "CD" have been sold the most number of times, with a total quantity
sold of 45 and 50 respectively.
3)The highest rate per item is charged by "RAM" of the company "Samsung" with a rate of
550.

4)The most popular company for sales is "Alpha" with a total of 12 items sold.

5)The total earnings from all the items sold are 160450, and the total quantity sold is 870.

Overall, the given data can be used to analyze the sales performance of each item and
company and to make decisions for future sales strategies.
Q3

This appears to be a table with information about dropout ratios of students in schools in various
states and union territories of India over the course of three years (2012-13 to 2014-15). The data
is organized into rows, with each row representing a particular state/union territory in a particular
year. The columns contain the following information:

● State_UT: the name of the state or union territory


● Year: the year of the data
● Primary_Boys: the dropout ratio of boys in primary school
● Primary_Girls: the dropout ratio of girls in primary school
● Secondary_Boys: the dropout ratio of boys in secondary school
● Secondary_Girls: the dropout ratio of girls in secondary school
The last three rows are incomplete, and some of the cells in the table are empty. There are also
some statistical information presented at the bottom of the table, including the mean, maximum,
minimum, range, quartiles, interquartile range, standard deviation, and coefficient of variation.
Q4)

CONCLUSION
In Q3, there are 3 groups with 0-10 subscribers, 11 groups with 10-20 subscribers, 14 groups with
20-30 subscribers, 14 groups with 30-40 subscribers, 8 groups with 40-50 subscribers, and 1 group
with 50 subscribers. In Q4, the data is not provided for each frequency range.

In addition to the subscriber and group data, there are also two other variables: CF and FREQ. It is
not clear what these variables represent, as there is no explanation provided in the data.

Lastly, there are two sets of data for P30 and P50, as well as D7 and D9. For each set, there are
values for CF, FREQ, HEIGHT, and LC. Again, it is not clear what these variables represent without
additional context.

In conclusion, the data provided appears to be incomplete and lacks context to fully understand
the meaning and significance of the variables analyzed.
Q5)

The conclusion of this exercise is that you have successfully created a vector, matrix,
and data frame in Excel. You have learned how to copy and paste values to create a
vector and matrix, and how to combine them to create a data frame.

A vector is a one-dimensional array of values, while a matrix is a two-dimensional


array of values. A data frame is a tabular representation of data that combines
multiple vectors and/or matrices into a single table.

By completing this exercise, you have gained practical experience in using Excel to
create and manipulate data structures commonly used in data analysis and statistics.
This knowledge can be applied to a wide range of real-world scenarios, including
scientific research, business analysis, and financial modeling.
Q6

CONCLUSION

The data given represents the quantity of different fruits in Column A and their
corresponding quantities in Column B.

One way to visually represent this data is through a bar graph, where each fruit is
represented by a bar with a height corresponding to its quantity. From the bar graph,
it can be observed that Honeydew has the highest quantity, followed by Lemon and
Kiwi. On the other hand, Cherry has the lowest quantity among all the fruits.

Another way to represent this data is through a pie chart, where the quantity of each
fruit is shown as a percentage of the total quantity. From the pie chart, it can be
observed that Honeydew accounts for the largest percentage of the total quantity,
followed by Lemon and Kiwi. Elderberry, Fig, and Cherry account for the smallest
percentages.

A line graph can also be used to show the trend of the fruit quantities over time or any
other continuous variable. However, since the data provided does not have any time
component, a line graph is not applicable in this case.

Finally, a histogram or a scatter plot can be used if the data were continuous and
needed to be binned or compared against another variable. However, since the given
data is discrete, a histogram or scatter plot is not applicable in this case.
● Q7)

Mean: The average reading score is around 70.64.


● Standard Error: The standard error of the mean is 1.54, which indicates the precision of
the sample mean.
● Median: The median reading score is 69.5, which is the middle value of the dataset
when it's ordered.
● Mode: The mode reading score is 80, which is the most frequent value in the dataset.
● Standard Deviation: The standard deviation of the dataset is 15.28, which measures
how spread out the scores are from the mean.
● Sample Variance: The variance of the dataset is 233.56, which is the average of the
squared differences from the mean.
● Kurtosis: The kurtosis value of 0.38 indicates that the distribution of scores is relatively
normal, with fewer outliers than a more peaked or flat distribution.
● Skewness: The skewness value of -0.44 indicates that the dataset is slightly skewed to
the left.
● Range: The range of scores is from 23 to 100, with a difference of 77 points between
the minimum and maximum.
● Minimum: The lowest reading score in the dataset is 23.
● Maximum: The highest reading score in the dataset is 100.
● Sum: The sum of all reading scores in the dataset is 6828.
● Count: The total number of reading scores in the dataset is 98.

Q8)
The correlation coefficient of -0.644 suggests a moderate negative correlation between the
variables being analyzed. This means that as one variable increases, the other variable tends to
decrease.

However, without knowing which variables were analyzed, it is difficult to provide a more detailed
analysis. Additionally, correlation does not necessarily imply causation, so further analysis is
needed to determine the relationship between the variables and any potential causal factors.

You might also like