Data Science with Python
Module 1: Introduction to Python
Session Topic Functions
Python Basics with Jupyter Notebook Python Data types: Numbers, Strings
Download and Install Python: Anaconda Numbers: int(), float()
Python Data types: Numbers, Strings Strings: str(), len(), upper(), lower()
Data Structures of Python: Lists, Tuples, Data Structures of Python: Lists, Tuples,
Dictionaries, and Sets. Dictionaries, and Sets
Operators in Python Lists: append(), remove(), pop(), extend()
Libraries Tuples: tuple(), count(), index()
Self-Reading: Dictionaries: dict(), keys(), values(), items()
Conditional Statements (if, elif and else Sets: set(), add(), remove(), union()
statements), Looping Statements (for loop
1-2 and while loop),Functions
Module 2: Data Preprocessing (Data Handling with Pandas & NumPy)
Creating arrays: np.array(), np.zeros(),
np.ones()
Array operations:
np.add(),numpy.multiply() np.dot(),
np.transpose()
Statistical methods: np.mean(), np.std()
Working with Series and Data Frame
NumPy Basics: Arrays and Vectorized Series: pandas.Series()
Computation DataFrame: pandas.DataFrame()
Data Loading Methods
Pandas Basics Reading CSV files: pandas.read_csv()
Working with Series and Data Frame pd.read_excel()
Data Loading Methods Obtaining Basic Information About Data
Obtaining Basic Information About Data Frame
Frame DataFrame info: dataframe.info(),
3-4 dataframe.describe()
Handling Missing Data
Dropping missing values:
dataframe.dropna()
Filling missing values: dataframe.fillna()
Encoding categorical variables:
Data Cleaning and Preparation: Advanced pandas.get_dummies()
Pandas Scaling features:
Handling Missing Data sklearn.preprocessing.StandardScaler()
Encoding Detecting outliers:
Scaling dataframe[dataframe['column'] > threshold]
Outlier Splitting data:
Split the data sklearn.model_selection.train_test_split()
Module 3: Exploratory Data Analysis
Data Visualisation Matplotlib: plt.plot(), plt.bar()
Matplotlib Seaborn: sns.heatmap(), sns.scatterplot()
Seaborn
5-6 Hands-on project: Analyze retail sales data.
Module 4: Statistical Analysis & Hypothesis Testing
Scipy
Descriptive Statistics
Hypothesis Testing:
t-tests, chi-square tests, ANOVA
Case study: A/B testing for marketing
7-8 campaigns
Module 5: Predictive Analytics
Predictive Modeling with Scikit-Learn Calculating correlation: dataframe.corr()
Correlation Linear regression:
Regression sklearn.linear_model.LinearRegression()
Model Evaluation Evaluating models:
Case study: Predict sales based on sklearn.metrics.mean_squared_error(),
9 advertising spend. sklearn.metrics.r2_score()
Classification model with Scikit-Learn Logistic regression:
Logistic regression sklearn.linear_model.LogisticRegression()
Model Evaluation Classification report:
10 Case study: Predicting customer churn sklearn.metrics.classification_report()
Module 6: Clustering and Capstone Project
Hierarchical Clustering
Case study: Segmenting customers based
11 on purchasing behavior
Machine Learning Capstone Project
https://www.kaggle.com/datasets
https://archive.ics.uci.edu/datasets
https://github.com/awesomedata/awesome-
12 public-datasets
Data Sources:
• https://www.kaggle.com/datasets
• https://archive.ics.uci.edu/datasets
• https://github.com/awesomedata/awesome-public-datasets
• https://data.fivethirtyeight.com/
• https://data.gov.in/
• https://data.world/
• https://data.worldbank.org/
• https://registry.opendata.aws/
• https://www.gapminder.org/data/
• https://cloud.google.com/bigquery/public-data/
Session Topic Objectives Dataset
Introduction to Power - Overview of Power BI: Features and interface.
13 BI - Connecting to data sources (Excel, CSV, web).
- Cleaning and transforming data.
- Removing duplicates, filtering, splitting
Data Transformation columns.
14 with Power Query - Merging and appending datasets.
- Understanding relationships in tables.
Data Modeling and - Introduction to DAX for calculated columns and
15 Relationships measures (e.g., SUM(), COUNT()).
- Types of visuals: Bar, line, pie charts, maps, and
tables.
- Adding slicers and filters. Sample
Superstore
16-17 Data Visualization - Customizing visuals for interactivity.
Dataset
- Build an end-to-end dashboard:
Connect, clean, model, and visualize data.
- Emphasize storytelling and actionable insights.
18 Capstone Project - Final Q&A and wrap-up.