0% found this document useful (0 votes)
43 views5 pages

7-General Introduction About The Course-14-08-2023

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views5 pages

7-General Introduction About The Course-14-08-2023

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

MAT5010: Foundations of Data Science

Course Coordinator: Dr. Jisha Francis

Contents
1 Preface to the course: Foundations of Data Science : MAT5010 2
1.1 Contact of Course Coordinator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Course Details with L-T-P-J-C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Time Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Classroom Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.6 Course Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.7 Course Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.8 Text Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.9 FAQs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Preface to the course: Foundations of Data Science : MAT5010

1 Preface to the course: Foundations of Data Science : MAT5010


1.1 Contact of Course Coordinator
Dr. Jisha Francis
PRP 215 E
Assistant Professor,
Department of Mathematics,
School of Advanced Sciences
Vellore Institute of Technology, Vellore.
Email : [email protected]

1.2 Course Details with L-T-P-J-C

Session : August - December 2023


Semester : I
Subject Code : MTH 5020
Subject Name : Foundations of Data Science
Lectures (L) : 3 hours/week
Credits (C) : 3

1.3 Time Table

Day Slot (Time) Activity


Monday T B1 (11:00 - 11:50) Lecture
Tuesday B1 (08:00 - 08:50) Lecture
Thursday B1 (09:00 - 09:50) Lecture
Office Hours PRP 215E (Through prior appointment) Doubts clarification

1.4 Classroom Details

Mode : Physical (Regular)


Theory Class : SJT602

1.5 Evaluation Criteria

Final Assessment Test : 40%


Continuous Assessment Test 1 : 15%
Continuous Assessment Test 2 : 15%
Digital Assignment : 10%
Quiz 1 : 10%
Quiz 2 : 10%
Attendance : Minimum 75% (including medical and duty-leaves) is mandatory
to attendant CAT 1, CAT 2, & FAT exams.

1.6 Course Description


The Foundations of Data Science course provides students with a comprehensive introduction to the fundamental
concepts and techniques used in the field of data science. This course covers a wide range of topics, including
exploratory data analysis, statistical inference, regression models, linear algebra, data preprocessing and feature
selection, as well as basic machine learning algorithms. Through a combination of theoretical knowledge and
hands-on practical exercises, students will develop a solid understanding of the key principles that underpin
modern data science.

2
Preface to the course: Foundations of Data Science : MAT5010

Module 1: Introduction to Big Data and Data Science


In this module, we will delve into the world of Big Data Analytics. We’ll explore the distinctions between Business
Intelligence and Big Data, study prominent Big Data frameworks, and gain insight into the current landscape of
analytics. Moreover, we will learn essential data visualization techniques and become familiar with visualization
software tools.

Module 2: Exploratory Data Analysis (EDA)


Module 2 focuses on Exploratory Data Analysis (EDA), a crucial step in understanding and interpreting data.
We will cover various statistical measures and introduce basic tools such as plots, graphs, and summary statistics
used in EDA. The Data Analytics Lifecycle, along with its Discovery phase, will provide insights into how data
analysis evolves from raw data to valuable insights.

Module 3: Basic Statistical Inference


In this module, we will discuss the process of Developing Initial Hypotheses and Identifying Potential Data
Sources. We’ll engage in an EDA case study to apply our knowledge, and then dive into testing hypotheses
involving means, proportions, and variances.

Module 4: Regression Models


Module 4 introduces Regression Models, a key component of data analysis and prediction. We will start with
Simple Linear Regression and explore the least squares principle. The module will progress to Multiple Linear
Regression (MLR), Logistic Regression, and cover concepts like Multiple Correlation and Partial Correlation.

Module 5: Matrices and Linear Algebra Operations


Matrix representation and operations form the core of Module 5. We’ll explore how matrices are used to represent
relationships between data points and perform linear algebraic operations on them. Additionally, we’ll dive into
Matrix Decomposition techniques, including Singular Value Decomposition (SVD) and Principal Component
Analysis (PCA).

Module 6: Data Preprocessing and Feature Selection


Module 6 addresses the critical phase of Data Preprocessing and Feature Selection. We’ll cover strategies for
Data Cleaning, Integration, Reduction, Transformation, and Discretization. Furthermore, we will explore Feature
Generation and Selection, including algorithms like Filters, Wrappers, Decision Trees, and Random Forests.

Module 7: Machine Learning Algorithms and Techniques


The final module covers a range of essential Machine Learning Algorithms and Techniques. We’ll explore Clas-
sifiers such as Decision Trees, Naive Bayes, and k-Nearest Neighbors (k-NN). Clustering will be introduced
through the k-means algorithm, and we’ll delve into Support Vector Machines (SVM), Association Rule Mining,
and Ensemble Methods.

Course Outcomes
Upon successful completion of the Foundations of Data Science course, students will be able to:

1. Understand Big Data Concepts:


• Differentiate between Big Data Analytics and traditional Business Intelligence.
• Describe the characteristics of Big Data, including volume, velocity, variety, and veracity.
• Identify popular Big Data frameworks used in handling large and complex datasets.

3
Preface to the course: Foundations of Data Science : MAT5010

2. Conduct Effective Exploratory Data Analysis (EDA):


• Apply statistical measures to summarize and understand datasets.
• Utilize basic tools such as plots, graphs, and summary statistics for EDA.
• Describe the various stages of the Data Analytics Lifecycle and the role of EDA.
3. Formulate and Test Hypotheses:
• Develop initial hypotheses based on domain knowledge and data insights.
• Identify potential data sources and assess their suitability for analysis.
• Perform hypothesis testing on means, proportions, and variances to make informed decisions.
4. Apply Regression Models:
• Implement Simple Linear Regression and understand the least squares principle.
• Utilize Multiple Linear Regression (MLR) to model relationships among multiple variables.
• Apply Logistic Regression for classification tasks and interpret model outputs.
• Understand concepts of multiple correlation and partial correlation in regression analysis.
5. Utilize Matrices and Linear Algebra:
• Represent relationships between data points using matrices.
• Perform linear algebraic operations on matrices for data manipulation.
• Apply Matrix Decomposition techniques, including Singular Value Decomposition (SVD) and Principal
Component Analysis (PCA), for dimensionality reduction and feature extraction.
6. Preprocess Data and Select Features:
• Implement data cleaning techniques, including integration, reduction, transformation, and discretiza-
tion.
• Generate new features and select relevant features to improve model performance.
• Utilize various feature selection algorithms such as Filters, Wrappers, Decision Trees, and Random
Forests.
7. Apply Basic Machine Learning Algorithms:
• Implement and interpret Decision Trees, Naive Bayes, and k-Nearest Neighbors (k-NN) classifiers.
• Employ the k-means algorithm for clustering tasks.
• Apply Support Vector Machines (SVM) for classification tasks.
• Understand Association Rule Mining and Ensemble methods to enhance predictive accuracy

1.7 Course Syllabus


• Module 1: Big Data and Data Science - Big Data Analytics, Business intelligence vs Big data, big data
frameworks, Current landscape of analytics, data visualisation techniques, visualisation software.
• Module 2: Exploratory Data Analysis (EDA), statistical measures, Basic tools (plots, graphs and summary
statistics) of EDA, Data Analytics Lifecycle, Discovery.
• Module 3: Developing Initial Hypotheses, Identifying Potential Data Sources, EDA case study, testing
hypotheses on means, proportions and variances.
• Module 4: Regression models: Simple linear regression, least squares principle, MLR, logistic regression,
Multiple correlation, Partial correlation.
• Module 5: Matrices to represent relations between data, Linear algebraic operations on matrices Matrix
decomposition: Singular Value Decomposition (SVD) and Principal Component Analysis (PCA).
• Module 6: Data cleaning - Data integration - Data Reduction - Data Transformation and Data Discretiza-
tion, Feature Generation and Feature Selection, Feature Selection algorithms: Filters- Wrappers - Decision
Trees - Random Forests.
• Module 7: Classifiers - Decision tree - Naive Bayes - k-Nearest Neighbors (k-NN), k-means SVM Association
Rule mining Ensemble methods.
• Module 8: Expert Lecture- Skillsets required for a Data Scientist.

4
Preface to the course: Foundations of Data Science : MAT5010

1.8 Text Book


• Mining of Massive Datasets. v2.1, Jure Leskovek, Anand Rajaraman and Jefrey Ullman., Cambridge
University Press. (2019). (free online).

• Big Data Analytics, paperback 2nd ed., Seema Acharya, Subhasini Chellappan, Wiley (2019).

Reference Book
• Doing Data Science, Straight Talk From The Frontline, Cathy O’Neil and Rachel Schutt,, O’Reilly (2014).
• Data Mining: Concepts and Techniques, Third Edition, Jiawei Han and Micheline Kamber, Jian Pei, ISBN
0123814790,(2011)

• Big Data and Business Analytics, Jay Liebowitz, CRC press (2013

1.9 FAQs
1. I found Foundations of Data Science course difficult. How should I plan my studies?

(a) Review Foundations of Data Science course Concepts: Before starting this course, take some
time to review the basic concepts. Make sure you have a solid understanding of the fundamentals as
they will form the basis for the advanced topics in Foundations of Data Science.
(b) Create a Study Schedule: Break down the topics in Foundations of Data Science into manageable
chunks and create a study schedule. Allocate sufficient time for each topic, and set aside regular study
sessions to stay on track. This course is of 3 credits with 3 hours of classroom sessions. Thus every
student is expected to do 6 hours (3 × 2 = 6 hours) of self study per week for the course.
(c) Use Multiple Learning Resources: Explore various learning resources, such as textbooks, lecture
notes, video tutorials, and online courses. Different resources may provide different perspectives and
examples, making it easier to grasp complex concepts.
(d) Practice with Problems: Practice solving problems related to each topic in Foundations of Data
Science. Solving numerical and practical problems will help you gain confidence in applying the
concepts to real-world scenarios.
(e) Participate in Group Study: Join study groups or engage in discussions with classmates who are
also studying Foundations of Data Science. Collaboration can lead to better understanding and help
clarify doubts.
(f) Seek Help Early: If you encounter difficulties during your study, don’t hesitate to seek help from
faculty (Dr. Jisha Francis)
(g) Stay Consistent: Consistency is key to success. Dedicate regular study time, and avoid last-minute
cramming to allow for better comprehension and retention of the material.
(h) Stay Positive and Persistent: Believe in your abilities and keep pushing forward. Take care of
your physical and mental health, as it directly impacts your learning ability.
2. How the understanding of the subject helps me in my future studies or profession?
(a) Overall, the understanding of Foundations of Data Science will provide you with a strong foundation
in data analysis, modeling, and decision-making, which are highly sought-after skills in today’s data-
centric world. These skills are transferable across various industries and professions, including business,
engineering, research, healthcare, finance, and more. Whether you pursue further studies or enter the
workforce, the knowledge gained from this course will significantly contribute to your success and
effectiveness in your chosen career path.

You might also like