Course Title Data Science and Machine Learning using Python
Course Length 4 Weeks
Target Audience This course is ideal for aspiring and current data scientists, career switchers, and professionals looking to expand their
skills
Prerequisites (if any) Participants should have a basic working knowledge of Computer Programming and familiarity with handling data.
Language of Instruction English and Urdu
Course Description
This course seamlessly blends essential Python programming, hands-on data exploration, and practical machine learning concepts for a comprehensive
learning experience. Students will explore libraries like Pandas, NumPy, Matplotlib, and scikit-learn to manipulate, understand, and build predictive models
with your data, and will learn to leverage generative AI support for code generation, troubleshooting, and concept understanding.
In this course students will focus on:
Module 1: Python and Data Fundamentals
In this module, students will dive into the essentials of Python for data science. They will learn about variables, data types, how to control the flow of their
code with conditionals and loops, and how to build modular code using functions. Additionally, they'll be introduced to the cornerstone libraries of data
science – NumPy for numerical operations and Pandas for working with tabular data in Data Frames. Students will learn how to load datasets, perform
basic data cleaning, and transformations.
Module 2: Exploratory Data Analysis (EDA)
The power of data lies in understanding the story it tells. In this module, students will master the art of Exploratory Data Analysis (EDA). Students will
learn techniques for handling missing data and outliers, and how to convert data into appropriate formats. Students will calculate essential summary
statistics with NumPy and Pandas, uncovering measures like mean, median, and standard deviation. The focus then shifts to visualization. Students will
harness Matplotlib and Seaborn to create histograms, scatterplots, and boxplots, learning to interpret these to glean insights from their data. They will
solidify these skills with an EDA mini project, where they'll take a dataset from start to finish.
Module 3: Introduction to Machine Learning
Students will embark on the exciting world of machine learning! This module introduces the fundamentals. They will understand the differences between
supervised and unsupervised learning, as well as classification and regression tasks, illustrated with real-world examples. Students will deep dive into linear
regression, learning how this model works, implementing it with scikit-learn, and interpreting the results. Next, they will explore decision trees - how they
are built, visualized, and understood. Finally, model selection concepts like train/test splits, overfitting, and cross-validation will be introduced.
GENERATIVE AI Support Throughout
Students will have seamless support through integrated GENERATIVE AI assistance. They will get tailored code examples for common operations, help
with troubleshooting errors, and easy-to-understand explanations for complex concepts, making their learning journey smoother.
Course Learning Outcomes
By the end of this course, the students should be able to:
LO1: Python Fluency: Demonstrate proficiency in core Python concepts (variables, data types, control flow, functions) for data science tasks.
LO2: Data Handling Expertise: Utilize Pandas to effectively import, clean, transform, and manipulate datasets for analysis and modelling.
LO3: Exploratory Analysis Mastery: Employ NumPy, Matplotlib, and Seaborn to calculate summary statistics and create informative visualizations,
extracting meaningful insights from data.
LO4: Machine Learning Foundations – Supervised Learning: Understand the principles of supervised learning and build basic linear regression and
decision tree models. Evaluate model performance using appropriate metrics.
LO5: Machine Learning Foundations – Un-Supervised Learning: Understand the principles of un-supervised learning and build basic clustering
model. Evaluate model performance using appropriate metrics.
LO6: Process-Oriented Mindset: Apply a structured workflow to a data science project encompassing data cleaning, exploratory analysis, model
selection, and result interpretation.
Assessments/Graded Components
During the course, students will be assessed based on various assignments along with a capstone project.
Module Name Key Concepts/Topics Covered Assessments
Python and Data - Introduction to Python, Data Types, Variables, Operators, NumPy Short coding quiz
Fundamentals - Control Flow, Functions Mini-data cleaning exercise
Python and Data - Introduction to Pandas Practice project: Data cleaning and
Fundamentals (Cont.) - Pandas data selection and transformation manipulation with Pandas
Exploratory Data - Understanding statistical data analysis concepts Quiz on EDA concepts
Analysis - Data aggregation
Exploratory Data - Confidence interval and hypothesis testing EDA mini-project progress check-in
Analysis (Cont.) - Visualization
Intro to Machine - Machine learning introduction Quiz on ML fundamentals
Learning - Understand Supervised learning vs Unsupervised learning Models
and their usage
Model Building in - Model development, and usage, in python Capstone Project: Peer feedback and
Python - Capstone Project evaluation
Supplementary Material/Reading Material
Python:
Learn Python (https://www.learnpython.org/)
"Automate the Boring Stuff with Python" (https://automatetheboringstuff.com/)
Data Analysis and Visualization:
Kaggle Datasets (https://www.kaggle.com/datasets)
Python Data Science Handbook (https://jakevdp.github.io/PythonDataScienceHandbook/)
"Storytelling with Data" (https://www.storytellingwithdata.com/)
Flourish (https://flourish.studio/)