0% found this document useful (0 votes)
120 views

Foundation of Data Science Syllabus

Uploaded by

Deepika Kamboj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

Foundation of Data Science Syllabus

Uploaded by

Deepika Kamboj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Course Code Course name L T P C

Foundation of Data Science 3 0 1


Total Units to be Covered: 5 Total Contact Hours:
Prerequisite:- Python Programming Syllabus version: 1.0

Course Objectives
1. To explore the different concepts of Statistics.
2. To acquire a basic understanding of the Machine learning Models.
3. To comprehend software requirements for implementing statistical and ML
models.

Course Outcomes

CO1. Understand the fundamentals of Data Science.


CO2. Acquire the concepts and tools of data integration and data processing.
CO3. Explore software for data integration and data preprocessing.
CO4. To learn how to apply statistical & ML methods for predictive modelling.
CO5. To develop skills for effective data visualization.

CO-PO Mapping

Program
Outcomes
Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
Outcomes

CO 1 2 3 2 2 1 - - - 2 - 3 - 1 2 -
CO 2 2 3 2 2 1 - - - 2 - 3 - 1 2 -
CO 3 2 3 2 3 1 - - - 2 - 3 - 1 2 -
CO 4 2 3 2 3 1 - - - 2 - 3 - 1 2 -
- - - - - -
Average
2 3 2 2.5 1 2 3 1 2

1 – Weakly Mapped (Low) 2 – Moderately Mapped (Medium)

3 – Strongly Mapped (High) “_” means there is no correlation


Syllabus

Unit I: Introduction to Data Science

7 Lecture Hours

Evolution of Data Science, Data Science Roles, Stages in a Data Science Project,
Applications of Data Science in various fields, Data Security Issues, Mathematical
Foundations for Data Science, Exploratory Data Analysis, Data Munging or Data
Wrangling, Theory of causation, The Difference Between Business Analytics (BI),
Data Analytics and Data Science

Unit II: Data Collection and Data Pre-Processing

7 Lecture Hours

Data Collection Strategies, Data Pre-Processing Overview, Data Cleaning, Data


Integration and Transformation, Data Reduction, Data Discretization, Binary
Encoding, One-Hot Encoding, Standardization, Normalization; Data Bases; SQL
Tables; Functions, Pandas. Data Types and Formats (Structured, Unstructured, Semi-
Structured), Data Collection Methods (APIs, Web Scraping, Databases)
.
Unit III: Exploratory Data Analytics & Descriptive Statistics

11 Lecture Hours

Introduction to exploratory data analytics & Descriptive Statistics (Mean, Standard


Deviation), Skewness and Kurtosis (Box Plots, Pivot Table, Heat Map, Correlation
Statistics), Basic Probability Concepts, Conditional Probability and Bayes' Theorem,
Probability Distributions (Binomial, Poisson, Normal). Inferential Statistics- (Sampling
Methods, Central Limit Theorem, Confidence Intervals), Hypothesis Testing (Null and
Alternative Hypotheses, Type I and Type II Errors, t-tests, Chi-Square Tests, ANOVA),
Regression Analysis (Simple Linear Regression, Multiple Linear Regression,
Assumptions of Regression Analysis, Model Evaluation Metrics (R², Adjusted R²,
RMSE))
Unit IV: Model Development (Classification & Clustering Methods)

13 Lecture Hours

Simple and Multiple Regression, Supervised vs. Unsupervised Learning, Key


Algorithms (Linear Regression, Decision Trees, K-Means), Classification Algorithms
(K-Nearest Neighbors, Support Vector Machines, etc), Clustering Techniques (K-
Means, Hierarchical Clustering, DBSCAN, etc), Dimensionality Reduction (Principal
Component Analysis), Anomaly Detection, Feature Selection and Extraction, Handling
Categorical and Numerical Data, Model Selection and Hyperparameter Tuning Model
Evaluation (Confusion Matrix, ROC Curve, AUC, Cross-Validation, Metrics) – Model
Evaluation using Visualization – Residual Plot – Distribution Plot – Polynomial
Regression and Pipelines – Measures for In-sample Evaluation – Prediction and
Decision Making,

Unit V: Big Data and Cloud Computing

7 Lecture Hours

Introduction to Big Data Technologies (Hadoop, Spark), Definition and Characteristics


of Big Data (Volume, Variety, Velocity, Veracity), Big Data vs. Traditional Data,
Overview of Big Data Technologies and Ecosystem, Big Data Storage and Processing
Frameworks, Distributed Systems and Parallel Computing, Overview of Hadoop
Ecosystem (HDFS, YARN, MapReduce), Introduction to Apache Spark Use Cases
and Applications of Big Data, Data Storage and Management (NoSQL), Relational vs.
NoSQL Databases, Types of NoSQL Databases: Key-Value, Document, Column-
Family, Graph, CAP Theorem and BASE Properties, NoSQL Use Cases and
Advantages

Cloud Platforms for Data Science (AWS, Google Cloud, Azure), Definition and History
of Cloud Computing, Benefits and Challenges of Cloud Computing, Key Concepts:
Scalability, Elasticity, Agility, Cloud Service Models (IaaS, PaaS, SaaS), Overview of
Amazon Web Services (AWS), Overview of Microsoft Azure, Overview of Google
Cloud Platform (GCP), Comparison of Cloud Providers

Total lecture Hours 45


Textbooks
1. Peter Bruce, Andrew Bruce, Peter Gedeck, Practical Statistics for Data Scientists,
2e: 50+ Essential Concepts Using R and Python June 2020, O′Reilly

2. Balamurugan Balusamy, Nandhini Abirami R et.el, " Big Data: Concepts,


Technology, and Architecture, June 2021, Wiley

3. Derrick Rountree, Ileana Castrillo (“The Basics of Cloud Computing:


Understanding the Fundamentals of Cloud Computing in Theory and Practice”
November 2013, Syngress

Reference Books
1. Aurélien Géron , " Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, Third
Edition, 2022, O′Reilly

2. Funmi Obembe, Ofer Engel, " A Hands-on Introduction to Big Data Analytics ",
February 2024 | SAGE Publications Ltd.

Modes of Evaluation: Quiz/Assignment/ presentation/ extempore/ Written


Examination
Examination Scheme
Components IA MID SEM End Sem Total
Weightage (%) 50 20 30 100

Detailed breakup of Internal Assessment


Internal Assessment Weightage in calculation of Internal
Component Assessment (100 marks)
Quiz 1 15%
Quiz 2 15%
Class Test 1 15%
Class Test 2 15%
Assignment 1/Project 20%
Assignment 2/Project 20%

You might also like