STD 10 Chap 4 Data Merging Notes
STD 10 Chap 4 Data Merging Notes
Ans: Data merging is the process of combining two or more data sets into a single data
frame. This process is necessary when we have raw data stored in multiple files or data
tables, that we want to analyse all in one go.
One to one join is probably one of the simplest join techniques. In this type of join, each
row in one table is linked to a single row in another table using a “key” column.
For example, in a company database, each employee has only one Employee ID, and each
Employee ID is assigned to only one employee.
In a one to many join, one record in a table can be related to one or many records in
another table.
For example, each student can have multiple books by school library.
Many To Many Joins
A many to many relationships is said to occur when multiple records in one table are related
to multiple records of other table. For example, a many to many relationships exists
between students and courses. A student can register for multiple courses. A course can
have multiple students.
Ans: Primary keys serve as unique identifiers for each row in a database table. Foreign keys
link data in one table to the data in another table.
A foreign key column in a table point to a column with unique values in another table (often
the primary key column) to create a way of cross-referencing the two tables.
Ans : In Data Science, data merging is the process of combining two or more data sets into a
single data frame. This process is necessary when we have raw data stored in multiple files
or data tables that we want to analyse all in one go.
Q5) What is join table? In which data join category, join table is used and why?
Ans : Every record in a join table contains a match field that contains the value of the
primary keys of two tables that it joins. Join table is used to perform join on tables which
have many to many relationships. Since it is not easy to merge tables of having many to
many relation, by using third table i.e. join table it is break into two one to many
relationships.
Q6) What is Z-score?
Ans : A Z-score describes the position of a point in terms of its distance from the mean
when it is measured in the standard deviation units. The z-score is always positive if the
value of z score lies above the mean and it is negative if its value is below the mean.
Ans : The mathematical formula for calculating the z-score is as following: Z = (x-μ)/σ
Where, X = raw score μ = Population mean σ = Population Standard Deviation
Ans : A positive z-score tells us that the raw score is higher than the mean average. For
example, if the z-score is equal to +2, it is 2 standard deviations above the mean. A negative
z-score tells us that the score is below the mean average. For example, if a z-score is equal
to -3, it is 3 standard deviations below the mean. If the z-score is equal to 0, it is on the
mean.
Ans : A percentile can be defined as the percentage of the total ordered observations at or
below it.
Consider the following data set: [10, 12, 15, 17, 13, 22, 16, 23, 20, 24]
Here, we want to find the percentile for element 22, we follow the steps below:
[10, 12, 13, 15, 16, 17, 20, 22, 23, 24]
3. By the definition, 80 percent of the values are at or below the element 22.
Ans : Quartiles of dataset partitions the data into four equal parts, with one-fourth of the
data values in each part. The total of 100% is divided into four equal parts: 25%, 50%, 75% &
100%.
An interquartile range can be defined as the measure of middle 50% of the values when
ordered from lowest to highest. The interquartile range can be calculated by subtracting
first quartile (Q1) from the third quartile (Q3).
IQR = Q3 – Q1
weather report. In the presence of irregularities, IQR is more robust as well as a better
Ans : Deciles sort the data into ten equal parts: the 10th, 20th, 30th, 40th, 50th, 60th, 70th,
80th, 90th, 100th. The higher the place in the decile ranking, the higher is the overall
ranking.
i is the ith decile and can be represented as: 1st Decile, D1 = 1 * (n + 1)/ 10th data 2nd
Decile,