Data Science
Data Science
Your Name
June 7, 2025
Contents
1 Statistics & Probability 2
1.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Common Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Data Wrangling 5
3.1 Handling Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Example in Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Data Visualization 6
4.1 Types of Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Seaborn Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.4 Practice Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1
6 SQL for Data Science 7
6.1 Basic Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.2 Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.3 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7 Model Evaluation 8
7.1 Classification Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.3 ROC & AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.4 Practice Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8 End-to-End Projects 9
8.1 Project 1: Customer Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.2 Project 2: House Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.3 Project 3: Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
• Standard Deviation: √
σ= σ2
1.1.3 Example
Given dataset: [12, 15, 18, 22, 25, 25, 30]
• Mean = (12+15+18+22+25+25+30)/7 21
• Median = 22
• Mode = 25
• Variance 38.57
• Standard Deviation 6.21
2
1.2 Probability Distributions
1.2.1 Discrete Distributions
• Binomial: Fixed trials, two outcomes
3
2 Python for Data Science
2.1 Python Basics
1 # Variables and data types
2 x = 5 # integer
3 y = 3.14 # float
4 name = " Alice " # string
5 is_true = True # boolean
6
7 # Lists
8 numbers = [1 , 2 , 3 , 4 , 5]
9 numbers . append (6)
10
11 # Loops
12 for i in range (5) :
13 print ( i )
1 import numpy as np
2
3 # Create array
4 arr = np . array ([1 , 2 , 3 , 4 , 5])
5
6 # Array operations
7 mean = np . mean ( arr )
8 std_dev = np . std ( arr )
9
10 # Matrix operations
11 matrix = np . array ([[1 , 2] , [3 , 4]])
12 inverse = np . linalg . inv ( matrix )
2.2.2 Pandas
1 import pandas as pd
2
3 # Create DataFrame
4 data = { ’ Name ’: [ ’ Alice ’ , ’ Bob ’] , ’ Age ’: [25 , 30]}
5 df = pd . DataFrame ( data )
6
7 # Data operations
8 mean_age = df [ ’ Age ’ ]. mean ()
9 filtered = df [ df [ ’ Age ’] > 25]
2.2.3 Matplotlib
4
2.3 Practice Project
Analyze the Titanic dataset:
• Load data using pandas
• Clean missing values
• Calculate survival rates by passenger class
• Visualize age distribution
3 Data Wrangling
3.1 Handling Missing Data
Strategies:
• Deletion (remove rows/columns)
• Imputation (mean, median, mode)
• Prediction (model-based imputation)
5
4 Data Visualization
4.1 Types of Visualizations
• Histograms: Distribution of data
• Scatter plots: Relationship between variables
• Bar charts: Compare categories
• Box plots: Show spread and outliers
6
5.2 Unsupervised Learning
• Clustering: Group similar data points
– K-means
– Hierarchical
• Dimensionality Reduction:
– PCA
– t-SNE
6.2 Joins
• INNER JOIN: Matching rows only
• LEFT JOIN: All rows from left table
• RIGHT JOIN: All rows from right table
• FULL JOIN: All rows from both tables
7
6.2.1 Example
7 Model Evaluation
7.1 Classification Metrics
• Accuracy:
TP + TN
TP + TN + FP + FN
• Precision:
TP
TP + FP
• Recall/Sensitivity:
TP
TP + FN
• F1 Score:
P recision × Recall
2×
P recision + Recall
8
8 End-to-End Projects
8.1 Project 1: Customer Segmentation
1. Load customer transaction data
5. Visualize segments
6. Recommend marketing strategies
Conclusion
This guide covers the essential data science skills from statistics to machine learning. Master these concepts
through practice and real-world projects to become proficient in data science.
References
[1] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning.
Springer.
[2] McKinney, W. (2017). Python for Data Analysis. O’Reilly Media.
[3] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.