0% found this document useful (0 votes)
15 views

Data Analysis using Python (1) NAVTTC

The document outlines a 2-month course titled 'Data Analysis using Python' under the Prime Minister Youth Skills Development Program in Pakistan, aimed at teaching students Python programming for data analysis. It covers essential skills such as data manipulation, cleaning, visualization, and exploratory data analysis, culminating in a capstone project. The course is designed for beginners to intermediate learners and includes hands-on lab sessions, with job opportunities in various sectors like IT, finance, and e-commerce.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Data Analysis using Python (1) NAVTTC

The document outlines a 2-month course titled 'Data Analysis using Python' under the Prime Minister Youth Skills Development Program in Pakistan, aimed at teaching students Python programming for data analysis. It covers essential skills such as data manipulation, cleaning, visualization, and exploratory data analysis, culminating in a capstone project. The course is designed for beginners to intermediate learners and includes hands-on lab sessions, with job opportunities in various sectors like IT, finance, and e-commerce.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Government of Pakistan

National Vocational and Technical Training Commission

Prime Minister Youth Skills Development Program

"Skills for All"

Course Contents / Lesson Plan


Course Title: Data Analysis using Python
Duration: 2 Months
Trainer Name

Dr. Fawad Salam Khan (Air University, Islamabad)


Author Name Muhammad Nasir Khan (DACUM Facilitator, Ex-DD VT, SS&C Wing,
NAVTTC, Islamabad

Course Title Data Analysis using Python


Objectives and
Expectations Course Objectives

By the end of the "Data Analysis using Python" course, students will:

1. Understand Python Programming: Gain proficiency in Python


programming language, focusing on data analysis tasks, including working
with essential libraries such as NumPy, Pandas, Matplotlib, and Seaborn.
2. Master Data Manipulation: Develop the skills to efficiently manipulate data
using Python, including importing, cleaning, preprocessing, and transforming
data to prepare it for analysis.
3. Implement Data Cleaning and Preprocessing: Learn various techniques for
data cleaning, including handling missing values, normalizing data, encoding
categorical variables, and performing feature engineering.
4. Create Effective Data Visualizations: Be able to create both basic and
advanced data visualizations using Matplotlib, Seaborn, and Plotly, making
complex data more accessible and understandable through visual
representation.
5. Conduct Exploratory Data Analysis (EDA): Perform comprehensive
exploratory data analysis to uncover patterns, correlations, and insights
within datasets, laying the groundwork for more advanced statistical analysis
or machine learning.
6. Develop Interactive Dashboards: Gain the ability to create interactive data
visualizations and dashboards using Plotly and Dash, enhancing the
presentation and exploration of data in a user-friendly manner.
7. Apply Knowledge to Real-World Problems: Synthesize all the skills
learned in the course to conduct a complete analysis on a real-world dataset,
from data cleaning to visualization and interpretation, culminating in a
capstone project.
8. Communicate Findings Effectively: Learn to present data analysis results
through well-structured reports, incorporating visualizations and statistical
summaries that effectively communicate key insights.

Course Expectations

Students enrolled in the course are expected to:

1. Engage Actively in Learning: Attend lectures, participate in discussions,


and engage with lab sessions to fully grasp the course content.
2. Complete Assignments Promptly: Submit weekly assignments on time,
ensuring that they demonstrate an understanding of the week's material and
practical application skills.
3. Work Independently and Collaboratively: While individual assignments
will assess personal understanding, students should also engage in peer

Data Analysis using Python


discussions and group activities when applicable, to enhance learning through
collaboration.
4. Practice Consistently: Regularly practice coding and data analysis outside of
class hours, utilizing provided datasets and recommended resources to
reinforce learning.
5. Ask Questions and Seek Help: Be proactive in seeking clarification on
challenging topics, either through discussion forums, during lab sessions, or
directly from the instructor.
6. Apply Critical Thinking: Approach data analysis tasks with a critical
mindset, questioning assumptions, considering alternative methods, and
validating results to ensure accuracy and reliability.
7. Adhere to Ethical Standards: Uphold ethical standards in data analysis,
including respecting data privacy, acknowledging sources, and presenting
analysis results honestly and transparently.
8. Complete the Capstone Project: Dedicate sufficient time and effort to the
capstone project, which is a significant portion of the final grade, ensuring it
reflects a comprehensive understanding of the course material.

Entry-level of
trainees Prerequisites:

 Basic Computer Literacy: Students should be comfortable using a


computer, navigating software, and managing files.
 Basic Understanding of Mathematics: A foundational knowledge of basic
mathematical concepts, such as algebra and statistics, is beneficial but not
mandatory.
 No Prior Programming Experience Required: This course is designed to
accommodate those who are new to programming and Python, though
individuals with some programming experience may find it easier to grasp
the initial concepts.

Target Audience:

 Individuals interested in learning data analysis, including students,


professionals, and enthusiasts from various fields.
 Beginners who want to start their journey in data science and Python
programming.
 Intermediate learners who want to enhance their data manipulation and
analysis skills using Python.

Learning
Outcomes of Learning Outcomes
the course
By the end of the "Data Analysis using Python" course, students will be able to:

1. Understand and Utilize Python for Data Analysis:


o Develop a strong foundation in Python programming, including the
use of key libraries such as NumPy, Pandas, Matplotlib, and Seaborn.
o Write Python scripts to perform various data analysis tasks.

2. Efficiently Manipulate and Process Data:


o Import, clean, preprocess, and transform datasets using Python.
o Handle missing data, perform data normalization and standardization,

Data Analysis using Python


and apply feature engineering techniques.

3. Create and Interpret Data Visualizations:


o Produce a variety of data visualizations (e.g., line plots, bar plots,
histograms, heatmaps) using Matplotlib and Seaborn.
o Use Plotly to create interactive visualizations and Dash to develop
data dashboards.

4. Perform Comprehensive Exploratory Data Analysis (EDA):


o Conduct exploratory data analysis to identify patterns, correlations,
and key insights within datasets.
o Apply statistical techniques such as descriptive statistics and
hypothesis testing during the EDA process.

5. Develop and Present Data-Driven Insights:


o Integrate data cleaning, preprocessing, visualization, and analysis
skills to analyze real-world datasets.
o Communicate data-driven insights through well-structured reports and
presentations that incorporate visualizations and statistical summaries.

6. Build Interactive Data Dashboards:


o Create and deploy interactive dashboards using Dash, enabling users
to explore data dynamically.

7. Apply Python Skills to Real-World Projects:


o Complete a capstone project that involves cleaning, analyzing, and
visualizing a real-world dataset, demonstrating the ability to apply
learned skills in a practical context.

8. Work Independently on Data Analysis Projects:


o Develop the confidence and competence to undertake independent
data analysis projects, from data acquisition to presentation of
findings.

Course Course Duration: 8 Weeks (2 Months)


Execution Plan
Course Level: Beginner to Intermediate

Total Hours: 40 Hours (5 Hours per Week)

Delivery Mode: Lectures, Hands-on Lab Sessions, and Assignments


Companies
offering jobs in 1. Software Houses and IT Companies
the respective
trade  NetSol Technologies: A leading IT company offering services in software
development, data analysis, and IT consulting. They frequently hire data
analysts, Python developers, and data scientists.
 Systems Limited: A well-known IT services company providing solutions in
data analytics, business intelligence, and software development.
 10Pearls: A global technology company with a significant presence in
Pakistan, focusing on digital transformation, including data analytics and AI
solutions.

Data Analysis using Python


 Afiniti: A pioneer in AI and big data, offering opportunities in data analysis
and data science.

2. Telecommunication Companies

 Telenor Pakistan: A major telecom operator that hires data analysts and
business intelligence professionals to analyze customer data and improve
service delivery.
 Jazz (Mobilink): One of Pakistan's largest telecom companies, offering roles
in data analysis, customer insights, and data-driven decision-making.
 Zong: A leading telecom provider that uses data analytics to enhance
customer experiences and optimize operations.

3. Financial Services and Banks

 Habib Bank Limited (HBL): Pakistan's largest bank, often recruiting data
analysts and financial analysts to support their data-driven strategies.
 United Bank Limited (UBL): A major bank in Pakistan that leverages data
analytics for risk management, customer insights, and financial modeling.
 Meezan Bank: Pakistan's leading Islamic bank, offering opportunities in data
analysis, especially in the areas of financial performance and customer
behavior analysis.

4. E-Commerce and Retail

 Daraz.pk: The largest online marketplace in Pakistan, frequently hiring data


analysts and business intelligence professionals to enhance their e-commerce
platform.
 Foodpanda Pakistan: A prominent food delivery service that relies heavily
on data analytics to optimize operations, marketing strategies, and customer
experience.
 Careem: A ride-hailing service that uses data to improve its operations,
customer satisfaction, and service delivery.

5. FMCG Companies

 Unilever Pakistan: A global FMCG giant with a significant presence in


Pakistan, often hiring data analysts to support market research, sales
forecasting, and supply chain optimization.
 Nestlé Pakistan: A major player in the FMCG sector, offering roles in data
analysis for market research, product development, and operational
efficiency.

6. Consulting and Analytics Firms

 KPMG Taseer Hadi & Co.: A global professional services firm offering
audit, tax, and advisory services, including data analytics roles.
 PwC Pakistan: Part of the global PwC network, this firm offers
opportunities in data analytics, financial modeling, and business intelligence.
 Arbisoft: A technology consulting firm that offers data analytics services,
often recruiting data scientists and analysts.

7. Tech Startups
Data Analysis using Python
 Airlift Technologies: A tech startup focusing on logistics and transportation,
leveraging data analytics to optimize operations and customer experience.
 Bykea: A local ride-hailing and delivery startup that uses data analysis to
enhance service efficiency and customer satisfaction.
 Bazaar Technologies: A B2B e-commerce platform for small businesses in
Pakistan, relying on data analytics for market insights and operational
decision-making.

8. Healthcare and Pharmaceutical Companies

 Siemens Healthineers: A global healthcare company with a strong focus on


data-driven healthcare solutions, including opportunities in data analytics.
 GlaxoSmithKline (GSK) Pakistan: A leading pharmaceutical company that
uses data analysis for market research, product development, and supply
chain management.

Job
Opportunities 1. Python Developer (with a focus on Data Analysis)

 Role: Develop and maintain Python scripts and applications that automate
data collection, processing, and analysis tasks.
 Skills Utilized: Python programming, data manipulation, automation scripts,
and integration of data analysis libraries.

2. Research Analyst

 Role: Perform data-driven research, analyze datasets to support academic or


industry research projects, and present findings.
 Skills Utilized: Data analysis, statistical testing, data visualization, and
reporting.

3. Financial Analyst

 Role: Analyze financial data, create financial models, and provide insights
into market trends, investment opportunities, and risk management.
 Skills Utilized: Python for financial data analysis, data cleaning, and
advanced data visualization.

4. Marketing Analyst

 Role: Analyze marketing data, including customer behavior, sales trends, and
campaign effectiveness, to improve marketing strategies.
 Skills Utilized: Data analysis, segmentation, trend analysis, and visualization
using Python.

5. Operations Analyst

 Role: Analyze operational data to improve efficiency, optimize processes,


and reduce costs within an organization.
 Skills Utilized: Data manipulation, performance metrics analysis, and
reporting using Python.

Data Analysis using Python


6. Data Visualization Specialist

 Role: Focus on creating effective and visually appealing data visualizations


and dashboards to present complex data insights in an understandable way.
 Skills Utilized: Matplotlib, Seaborn, Plotly, and Dash for creating
visualizations and dashboards.

7. Entry-Level Machine Learning Engineer

 Role: Work with data scientists to prepare data for machine learning models,
perform EDA, and assist in developing basic machine learning algorithms.
 Skills Utilized: Python programming, data preprocessing, and basic
understanding of machine learning workflows.

No of Students 25
Learning Place Classroom / Lab
Instructional 1. "Python for Data Analysis" by Wes McKinney
Resources o A comprehensive guide to using Python libraries like Pandas and
NumPy for data manipulation and analysis.
o Amazon

2. "Automate the Boring Stuff with Python" by Al Sweigart


o Excellent for beginners, this book covers practical Python
applications, including data manipulation tasks.
o Automate the Boring Stuff

3. "Practical Statistics for Data Scientists" by Peter Bruce and Andrew


Bruce
o Focuses on essential statistical concepts using Python and R for data
analysis.
o O'Reilly

4. "Hands-On Data Analysis with Pandas" by Stefanie Molin


o Detailed examples and exercises using Pandas for data manipulation,
cleaning, and analysis.
o Packt

Tutorial Websites

1. Kaggle
o Offers Python tutorials, datasets for practice, and a community for
data science enthusiasts. Great for practical, hands-on learning.
o Kaggle

2. Real Python
o A comprehensive resource for learning Python, including tutorials on
data analysis, web scraping, and data visualization.
o Real Python

3. W3Schools Python Tutorial


o Beginner-friendly tutorials that cover Python programming basics and

Data Analysis using Python


data analysis topics.
o W3Schools

4. Towards Data Science


o A popular Medium publication with articles and tutorials on Python,
data analysis, machine learning, and more.
o Towards Data Science

5. GeeksforGeeks: Python Programming


o Provides a wide range of Python tutorials, from basic to advanced,
with a focus on data structures, algorithms, and data science.
o GeeksforGeeks

MODULES
Schedu Module Title Learning Units Home Assignment
led
Weeks
Week 1 1.1 Introduce 1.1 Introduce Python for Data Analysis
Python for Data
Analysis  Overview of Python programming
language
 Setup Python environment
(Anaconda, Jupyter Notebook)
 Introduction to Python libraries:
NumPy, Pandas, Matplotlib,
Seaborn
 Interpret Python syntax and
operations (variables, data types,
loops, functions)
Assignment 1:
1.2 Working with Data in Python
 Basic data manipulation
 Introduce to data structures: Lists, using Pandas: Creating,
Tuples, Dictionaries, and Sets reading, and writing
 Introduction to NumPy: Arrays, Data Frames
array operations, and basic
mathematical functions
 Introduction to Pandas: Series and
DataFrames
 Importing and exporting data
(CSV, Excel, JSON)

Lab Session:

 Setting up Jupyter Notebook for


data analysis

Basic data manipulation with NumPy and


Pandas
Data Analysis using Python
Week 2 Working with 3.1 Data Cleaning Techniques
Data in Python
 Handling missing data (drop,
fillna, interpolation)
 Data normalization and
standardization
 Detecting and treating outliers
 Data transformation (log, square
root, etc.)

3.2 Data Preprocessing

 Deal with categorical data:


Encoding techniques (One-Hot,
Label Encoding)
 Handling date and time data
 Data binning and discretization
 Feature selection and extraction

Lab Session:

 Practical data cleaning and


preprocessing exercises using
Pandas

Assignment 2:

 Create a Python script


that reads a data file,
processes the data, and
outputs results

Data Analysis using Python


Week 3 Advanced- 3.1 Data Cleaning Techniques
Data Cleaning
and Feature  Handle missing data (drop, fillna,
interpolation)
Engineering
 Data normalization and
standardization
 Detect and treat outliers
 Data transformation (log, square
root, etc.)

3.2 Data Preprocessing

 Deal with categorical data:


Encoding techniques (One-Hot,
Label Encoding)
 Handling date and time data
 Data binning and discretization
 Feature selection and extraction

Lab Session:

 Practical data cleaning and


preprocessing exercises using
Pandas

Assignment 3:

 Cleaning and
preprocessing a dataset
(handling missing data,
encoding categorical
variables, normalizing
features)

Data Analysis using Python


Week 4 Advanced 4.1 Advanced Feature Engineering
Data Cleaning
and Feature  Creating new features from
existing data
Engineering
 Feature scaling and polynomial
features
 Interaction features

4.2 Handling Large Datasets Assignment 4:

 Working with large datasets in  Engineer new features


Pandas for a dataset and
 Optimizing memory usage and analyze their impact
performance on data analysis
 Introduction to Disk for handling
large-scale data

Lab Session:

 Feature engineering and handling


large datasets

Data Analysis using Python


Week 5 Data 5.1 Introduction to Data Visualization
Visualization
 Importance of data visualization
in data analysis
 Overview of Matplotlib and
Seaborn libraries

5.2 Basic Plotting with Matplotlib

 Create simple plots: Line plot, Bar


plot, Histogram
 Customizing plots: Titles, labels,
legends, and colors
 Subplots and figure layouts

5.3 Advanced Data Visualization with


Seaborn

 Create advanced plots: Heatmaps,


Box plots, Pair plots
 Visualizing distributions and
correlations

 Plot aesthetics and customization

Lab Session:

 Hands-on practice with Matplotlib


and Seaborn for data visualization

Assignment 5:

 Creating visualizations
to analyze trends,
distributions, and
relationships in a given
dataset

Data Analysis using Python


Week 6 Interactive 6.1 Interactive Visualizations
Visualizations
and  Introduction to Plotly for
interactive visualizations
Dashboarding
 Create interactive plots: Scatter
plots, Line charts, and more
 Customizing interactive
visualizations
Assignment 6:
6.2 Dashboarding with Dash
 Develop a simple
 Introduction to Dash for creating dashboard to visualize a
dashboards dataset interactively
 Build a simple data dashboard
 Deploying dashboards for data
analysis

Lab Session:

 Create a interactive visualizations


and dashboards

Week 7 Exploratory 7.1 Introduction to Exploratory Data


Data Analysis Analysis (EDA)
(EDA)
 Understand the importance of
EDA
 Steps in performing EDA
 Identifying patterns, correlations,
and insights from data

7.2 EDA Techniques and Best Practices Assignment 7:

 Descriptive statistics: Mean,  Conduct an EDA on a


median, mode, standard deviation provided dataset and
 Correlation analysis and report the findings
covariance
 Hypothesis testing basics
 Identifying and interpreting trends

Lab Session:

 Performing a complete EDA


process on a sample dataset

Week 8 Capstone 8.1 Case Study: Real-World Data Final Project Submission:
Project and Analysis
Case Study  Submission of the
 Applying data cleaning, final project report
Data Analysis using Python
preprocessing, visualization, and
EDA on a real-world dataset
 Report writing: Presenting
findings and insights through
visualizations and statistical
summaries

8.2 Capstone Project Development including data


cleaning,
 Students work on a capstone preprocessing,
project to apply all learned visualization, and
concepts EDA findings.
 Guidance on project structuring
and report writing

Lab Session:

 Capstone project development


with instructor guidance

Data Analysis using Python


Practical Tasks:

Task Description Week

1 Basic data manipulation ● The goal of this task is to practice Week 1


using Pandas: Creating, basic data manipulation using the
reading, and writing Pandas library in Python. You will
Data Frames learn how to create, read, and write
Data Frames, which are essential for
handling and analyzing structured
data. This exercise will help you build
foundational skills in working with
data in Python.

2 Create a Python script ● The task is to develop a Python script Week 2


that reads a data file, that reads a data file, processes the
processes the data, and data through various stages of
outputs results cleaning and transformation, and
outputs the results in a specified
format. This exercise is designed to
enhance your skills in Python
programming, data manipulation, and
exploratory data analysis.

3 Practical data cleaning  The aim of this task is to perform Week 3


and preprocessing practical data cleaning and
exercises using Pandas preprocessing exercises using the
Pandas library in Python. This
exercise will help you develop
essential skills in preparing raw data
for analysis, ensuring that the data is
clean, consistent, and ready for further
exploration or modeling.

4 Engineer new features ● The goal of this task is to engineer Week 4


for a dataset and analyse new features for an existing dataset
their impact on data and analyze how these new features
analysis impact the overall data analysis.
Feature engineering is a critical step
in data preprocessing that can
significantly enhance the predictive
power of your models, and the
insights derived from your data.

5 Creating visualizations  The objective of this task is to create Week 5


to analyze trends, visualizations that effectively analyze
distributions, and and present trends, distributions, and
relationships in a given relationships within a given dataset.
dataset
Visualizations are a powerful tool for
uncovering insights and
communicating findings in an

Data Analysis using Python


intuitive and impactful way.

6 Develop a simple ● Create an interactive dashboard to Week 6


dashboard to visualize a visualize a dataset involves several
dataset interactively steps, from data preparation to
designing the dashboard itself. Below
is a high-level outline of how you can
develop a simple interactive
dashboard using Python

7 Conduct an EDA on a ● The objective of this Exploratory Data Week 7


provided dataset and Analysis (EDA) is to understand the
report the findings underlying patterns, distributions, and
relationships within the provided
dataset. EDA will help identify any
anomalies, trends, or insights that
could inform subsequent data
processing and model-building
phases.

8 Submit of the final ● The purpose of this task is to perform Week 8


project report a comprehensive Exploratory Data
including data Analysis (EDA) on the provided
cleaning, dataset. This process aims to uncover
preprocessing, underlying patterns, relationships, and
visualization, and anomalies within the data, which will
EDA findings be crucial for informing subsequent
stages of data processing and model
development..

Workplace/Institute Ethics Guide

Work ethic is a standard of conduct and values for job performance. The modern definition of what
constitutes good work ethics often varies. Different businesses have different expectations. Work
ethic is a belief that hard work and diligence have a moral benefit and an inherent ability, virtue, or
value to strengthen character and individual abilities. It is a set of values-centered on the
importance of work and manifested by determination or desire to work hard.

The following ten work ethics are defined as essential for student success:
1. Attendance:
Be at work every day possible, plan your absences don’t abuse leave time. Be punctual
every day.
2. Character:
Honesty is the single most important factor having a direct bearing on the final success of

Data Analysis using Python


an individual, corporation, or product. Complete assigned tasks correctly and promptly.
Look to improve your skills.
3. Team Work:
The ability to get along with others including those you don’t necessarily like. The ability to
carry your weight and help others who are struggling. Recognize when to speak up with an
idea and when to compromise by blend ideas together.
4. Appearance:
Dress for success set your best foot forward, personal hygiene, good manner, remember
that the first impression of who you are can last a lifetime
5. Attitude:
Listen to suggestions and be positive, accept responsibility. If you make a mistake, admit it.
Values workplace safety rules and precautions for personal and co-worker safety. Avoids
unnecessary risks. Willing to learn new processes, systems, and procedures in light of
changing responsibilities.
6. Productivity:
Do the work correctly, quality and timelines are prized. Get along with fellows, cooperation
is the key to productivity. Help out whenever asked, do extra without being asked. Take
pride in your work, do things the best you know-how. Eagerly focuses energy on
accomplishing tasks, also referred to as demonstrating ownership. Takes pride in work.
7. Organizational Skills:
Make an effort to improve, learn ways to better yourself. Time management; utilize time and
resources to get the most out of both. Take an appropriate approach to social interactions
at work. Maintains focus on work responsibilities.
8. Communication:
Written communication, being able to correctly write reports and memos.
Verbal communications, being able to communicate one on one or to a group.
9. Cooperation:
Follow institute rules and regulations, learn and follow expectations. Get along with fellows,
cooperation is the key to productivity. Able to welcome and adapt to changing work
situations and the application of new or different skills.
10. Respect:
Work hard, work to the best of your ability. Carry out orders, do what’s asked the first time.
Show respect, accept, and acknowledge an individual’s talents and knowledge. Respects
diversity in the workplace, including showing due respect for different perspectives,
opinions, and suggestions.

Data Analysis using Python

You might also like