0% found this document useful (0 votes)
33 views4 pages

STD 10 Chap 4 Data Merging Notes

The document discusses data merging, which is the process of combining multiple data sets into a single data frame for analysis. It explains different types of data joins (one-to-one, one-to-many, many-to-many), the concepts of primary and foreign keys, and provides definitions and interpretations of Z-scores, percentiles, quartiles, and deciles. Additionally, it includes formulas and examples to illustrate these concepts.

Uploaded by

kushalthakut809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views4 pages

STD 10 Chap 4 Data Merging Notes

The document discusses data merging, which is the process of combining multiple data sets into a single data frame for analysis. It explains different types of data joins (one-to-one, one-to-many, many-to-many), the concepts of primary and foreign keys, and provides definitions and interpretations of Z-scores, percentiles, quartiles, and deciles. Additionally, it includes formulas and examples to illustrate these concepts.

Uploaded by

kushalthakut809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Std 10

Chap 4: Data merging


Q1)What is data merging?

Ans: Data merging is the process of combining two or more data sets into a single data
frame. This process is necessary when we have raw data stored in multiple files or data
tables, that we want to analyse all in one go.

Q2)Explain 3 categories of data joins?

Ans:Following are the categories of data joins:

1. One to One Joins

2. One to Many Joins

3. Many to Many Joins

One To One Joins

One to one join is probably one of the simplest join techniques. In this type of join, each
row in one table is linked to a single row in another table using a “key” column.

For example, in a company database, each employee has only one Employee ID, and each
Employee ID is assigned to only one employee.

One To Many Joins

In a one to many join, one record in a table can be related to one or many records in
another table.

For example, each student can have multiple books by school library.
Many To Many Joins

A many to many relationships is said to occur when multiple records in one table are related
to multiple records of other table. For example, a many to many relationships exists
between students and courses. A student can register for multiple courses. A course can
have multiple students.

Q3)What is primary key and foreign KEY?

Ans: Primary keys serve as unique identifiers for each row in a database table. Foreign keys
link data in one table to the data in another table.

A foreign key column in a table point to a column with unique values in another table (often
the primary key column) to create a way of cross-referencing the two tables.

Q4) What is data merging?

Ans : In Data Science, data merging is the process of combining two or more data sets into a
single data frame. This process is necessary when we have raw data stored in multiple files
or data tables that we want to analyse all in one go.

Q5) What is join table? In which data join category, join table is used and why?

Ans : Every record in a join table contains a match field that contains the value of the
primary keys of two tables that it joins. Join table is used to perform join on tables which
have many to many relationships. Since it is not easy to merge tables of having many to
many relation, by using third table i.e. join table it is break into two one to many
relationships.
Q6) What is Z-score?

Ans : A Z-score describes the position of a point in terms of its distance from the mean
when it is measured in the standard deviation units. The z-score is always positive if the
value of z score lies above the mean and it is negative if its value is below the mean.

Q7) Write down Z-score formula.

Ans : The mathematical formula for calculating the z-score is as following: Z = (x-μ)/σ
Where, X = raw score μ = Population mean σ = Population Standard Deviation

Q 8) How to interpret the Z-score?

Ans : A positive z-score tells us that the raw score is higher than the mean average. For
example, if the z-score is equal to +2, it is 2 standard deviations above the mean. A negative
z-score tells us that the score is below the mean average. For example, if a z-score is equal
to -3, it is 3 standard deviations below the mean. If the z-score is equal to 0, it is on the
mean.

Q9) What is meant by percentile? Explain it using example.

Ans : A percentile can be defined as the percentage of the total ordered observations at or

below it.

Consider the following data set: [10, 12, 15, 17, 13, 22, 16, 23, 20, 24]

Here, we want to find the percentile for element 22, we follow the steps below:

1. Sort the dataset in ascending order.

[10, 12, 13, 15, 16, 17, 20, 22, 23, 24]

2. The number of values at or below the element 22 is 8.

The total number of elements in the dataset is 10.

3. By the definition, 80 percent of the values are at or below the element 22.

4. Thus, percentile for the element 22 is 80 percentiles.

Q 10) What is meant by Quartiles? Explain IQR and its application.

Ans : Quartiles of dataset partitions the data into four equal parts, with one-fourth of the
data values in each part. The total of 100% is divided into four equal parts: 25%, 50%, 75% &

100%.
An interquartile range can be defined as the measure of middle 50% of the values when
ordered from lowest to highest. The interquartile range can be calculated by subtracting
first quartile (Q1) from the third quartile (Q3).

IQR = Q3 – Q1

An important application of quartiles is in temperature ranges for the day as reported on a

weather report. In the presence of irregularities, IQR is more robust as well as a better

representation of the amount of spread in the data.

Q 11) Explain Deciles.

Ans : Deciles sort the data into ten equal parts: the 10th, 20th, 30th, 40th, 50th, 60th, 70th,

80th, 90th, 100th. The higher the place in the decile ranking, the higher is the overall
ranking.

The mathematical formula to calculate decile is:

Where n is the number of data in the population sample.

i is the ith decile and can be represented as: 1st Decile, D1 = 1 * (n + 1)/ 10th data 2nd
Decile,

D2 = 2 * (n + 1)/ 10th data and so on

You might also like