Data Science Notes
Data Science Notes
When we have a lot of data then instead of working with the whole data set, we
can take a certain part of the data for our analysis. This division of a small set of data from
a large set of data is known as a Subset.
Row-based subsetting:
Row-based subsetting, also known as filtering or selecting rows, is a technique used
to extract specific rows from a dataset based on certain criteria,
Column based subsetting:
When data is selected from specific columns from the dataset. This process of
subsetting is known as column-based subsetting.
Data-based subsetting
Data-based subsetting is a technique that extracts a smaller, representative portion
of a larger dataset.
Mean:
Mean is a measure of central tendency. In data science, Mean, also termed as the
simple average, is an average value of a data set. Basically, mean is a value in the data set
around which entire data is spread out.
Example
Mean VS Median
Mean
1. Calculate the mean by adding up all the data pieces and dividing it by the number
of pieces of the data.
2. Subtract mean fromevery value
3. Square each of the differences
4. Find the average of squared numbers calculated in point number 3 to find the
variance.
5. Lastly, find the square root of variance. That is the standard deviation.
Example
Values; [1, 2, 3, 5, 8]