0% found this document useful (0 votes)
1 views6 pages

303- Data Analysis Using Python

The course 'Data Analysis using Python' (BCADS-303) aims to equip students with fundamental data science concepts, hands-on experience with Python libraries like NumPy and Pandas, and skills in data visualization and exploratory data analysis. It covers topics such as data cleaning, visualization techniques, exploratory data analysis, data sources in AI, and time series analysis. Upon completion, students will be proficient in using Python for data manipulation, visualization, and analysis of various data types.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views6 pages

303- Data Analysis Using Python

The course 'Data Analysis using Python' (BCADS-303) aims to equip students with fundamental data science concepts, hands-on experience with Python libraries like NumPy and Pandas, and skills in data visualization and exploratory data analysis. It covers topics such as data cleaning, visualization techniques, exploratory data analysis, data sources in AI, and time series analysis. Upon completion, students will be proficient in using Python for data manipulation, visualization, and analysis of various data types.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Course Name: Data Analysis using Python

Course Code: BCADS-303


Course Credit: 4-0-1
Course Objectives:
● To introduce students to fundamental concepts of data science, including data
analysis, data cleaning, data visualization, exploratory data analysis, and data sources
in Artificial Intelligence (AI).
● To provide students with hands-on experience in using Python libraries NumPy and
Pandas for data manipulation, cleaning, and analysis.
● To introduce students to data visualization using Python libraries such as Matplotlib
and Pandas.
● To teach students the importance of data pre-processing and cleaning, and techniques
for handling missing data, detecting and filtering outliers, transforming data, and
string manipulation.
● To familiarize students with exploratory data analysis (EDA) techniques, including
defining descriptive statistics for numeric data, working with percentiles, counting for
categorical data, and creating contingency tables.

Course Content:
Theory
Unit 1: Data cleaning and Pre-processing
Data Cleaning and Preparation: Handling Missing Data, Data Transformation: Removing
Duplicates, Transforming Data Using a Function or Mapping, Replacing Values, Detecting
and Filtering Outliers- String Manipulation: Vectorized String Functions in pandas.

Unit 2: Data Visualization


Introduction to Data Visualization, static graphical techniques, multivariate graphical
techniques, customization.
Plotting with pandas: Line Plots, Bar Plots, Histograms and Density Plots, Scatter or Point
Plots.
Matplotlib: Create static, animated and interactive visualization in Python.
Unit 3: Exploratory Data Analysis (EDA)
Exploring Data Analysis (EDA), The EDA Approach, Defining Descriptive Statistics for
Numeric Data, Measuring central tendency, Measuring variance and range, Working with
percentiles, Defining measures of normality, Counting for Categorical Data, Understanding
frequencies, Creating contingency tables, Creating Applied Visualization for EDA,
Inspecting boxplots, Performing t‐tests after boxplots, Observing parallel coordinates,
Graphing distributions, Plotting scatterplots, Using covariance and correlation, Using
nonparametric correlation, Considering chi‐square for tables, Using the normal distribution,
Creating a Z‐score standardization, Transforming other notable distributions, Detecting
Outliers in Data, Clustering, Reducing dimensionality.

Unit 4: Data Sources in Artificial Intelligence (AI)


Data Basics, Types of Data, Big Data: Volume, Variety, Velocity, Database and other Tools,
Data Process, how much Data do you Need for AI?
Primary data source, Secondary data source, Qualitative Data, Quantitative Data, Structured
Data, Unstructured Data, Semi-Structured Data, Historical and Real-Time Data, Internal
Data, External Data.

Unit 5: Time Series Analysis


Introduction to time series analysis: Definition and properties of time series data,
Applications of time series analysis, Types of time series data (e.g., stationary, non-
stationary, periodic). Time series data preprocessing: Dealing with missing values and
outliers, Detrending and deseasonalizing, Resampling and aggregation. Time series
visualization and exploratory analysis: Plotting time series data, Decomposing time series
into trend, seasonal, and residual components, Autocorrelation and partial autocorrelation
analysis, Time series forecasting techniques. Time series modelling: ARIMA models
(autoregressive integrated moving average), Seasonal ARIMA models, Exponential
smoothing models (e.g., Holt-Winters), Vector Autoregression (VAR) models, State space
models. Model selection and evaluation: Akaike Information Criterion (AIC) and Bayesian
Information Criterion (BIC), Cross-validation and holdout methods, Residual analysis and
model diagnostics. Introduction to TensorFlow, Pytorch frameworks.
Suggested Reading
Text Books:
1. Python for Probability, Statistics, and Machine Learning, Second Edition, José
Unpingco, Springer.
2. Business Analytics: The Fundamentals of Business Analytics, R. N. Prasad and
Seema Acharya, Wiley.
3. Data Wrangling with Python, Jacqueline Kazil, Katharine Jarmul, O'Reilly Media,
Inc.
4. Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code,
Jack Dougherty, Ilya Ilyankou, O'Reilly Media, Inc.
5. Practical Time Series Analysis: Prediction with Statistics and Machine Learning,
Aileen Nielsen, O'Reilly Media, Inc.
6. Big Data: A Beginner's Introduction, Saswat Sarangi and Pankaj Sharma, Routledge
India.
Course Outcome:

After completion of this course, students will be able to:

CO Statement Blooms
Level
CO1 Use Python libraries NumPy and Pandas for data manipulation, L3
cleaning, and analysis.

CO2 Create data visualizations using Python libraries such as Matplotlib and L3
Pandas.

CO3 Illustrate proficiency in exploratory data analysis (EDA) techniques, for L4


descriptive statistics for numeric data, working with percentiles,
counting for categorical data, and creating contingency tables.

CO4 Employ different data sources, including primary and secondary data L3
sources, qualitative and quantitative data, structured and unstructured
data, historical and real-time data, and internal and external data.

CO5 Demonstrate skills to work with diverse data types, such as text, image, L3
video, and audio data.

Course Delivery Methods


CD1 Lecture-based learning: Traditional classroom lectures or online video lectures

CD2 Hands-on sessions

CD3 Workshop sessions

CD4 Online Coding Platform

CD5 Group Projects


Mapping of Course Outcomes onto Programme Outcomes & Programme
Specific Outcome

Course Bloom's PO PO PO PO PO PO PO PO PSO PSO PSO


Outcome Levels 1 2 3 4 5 6 7 8 1 2 3
CO1 L3 L - M L H M L M L L L

CO2 L3 H - H L H M M M H H H

CO3 L4 H - H H H M M M H H H

CO4 L3 H - H H H M M M H H H

CO5 L3 H - H H H M M M H H H

H- High, M- Moderate, L- Low, ‘-’ for No correlation

Mapping between CO and CD

CD Course Delivery methods Course Outcomes

CD1 Lecture-based learning: Traditional CO1,CO2,CO3,CO4,CO5


classroom lectures or online video lectures
CD2 Hands-on sessions CO1,CO2,CO3,CO4,CO5

CD3 Workshop sessions CO1,CO2,CO3,CO4,CO5

CD4 Online Coding Platform CO2,CO5

CD5 Group Projects CO5


List of Practical’s
Experiments:
1. Implement python script to read data and to implement the data cleaning and pre-
processing steps.
2. Perform the above practical in AWS.
a. Data File: https://archive.ics.uci.edu/ml/datasets/Adult (only for reference,
instructors could consider any good dataset)
3. Implement python script for the above mention methods of data representation or
visualization.
4. Implement python script to read data from following file and draw the line plots,
scatter plot on each column) Data File: https://archive.ics.uci.edu/ml/datasets/Adult)
5. Perform the above practical in GCP.
6. Measure variance and range on the columns: age, capital-gain, capital-loss, for the
above mentioned dataset
7. Perform the above practical in Azure.
8. Extraction and Processing of text, image, video and audio data from multiple
platforms
9. Implement various time series models on following data set using TensorFlow or
Pytorch
(DataSet:https://archive.ics.uci.edu/ml/datasets/Chess+%28Domain+Theories%29)

You might also like