Foundations of Data Science II
Foundations of Data Science II
Course Description:
In this course, students are introduced to the basics of R programming, covering topics such as R
environment setup, data types, variables, control structures, and working with R functions. They
will also learn data input/output, cleaning, handling missing data, and explore vectorized
operations and logical operations in R. Students will explore statistical analysis using R to derive
meaningful insights from data. Further, students are introduced to the fundamentals of Python
programming, learning about Python's history, setting up the development environment, data
types, control structures, and user-defined functions. They will explore Python libraries, file
handling, and working with external data formats. The course covers data analysis with pandas,
including working with DataFrames and Series. Students will create basic plots using Matplotlib
and explore advanced data visualization techniques. By the end of this course, students will be
equipped with the necessary programming skills and tools to conduct data analysis, manipulate
data effectively, and create insightful visualizations using both R and Python.
Knowledge Outcome:
At the end of the course, the student should be able to:
CO1: Understand the fundamental concepts of programming in R and Python, including
variables, data types, control structures, and functions.
CO2: Comprehend the data wrangling and manipulation techniques in R and Python,
such as data cleaning, handling missing data, and working with data structures like
DataFrames and arrays.
CO3: Gain knowledge of data visualization principles and techniques using libraries such
as Matplotlib, ggplot2, and Plotly in R and Python.
CO2: Acquire a knowledge of statistical analysis concepts and their application in data
science using R and Python.
Skill Outcome:
At the end of the course, the student should be:
CO5: Apply programming skills in R and Python to perform data analysis tasks,
including data import/export, data cleaning, and manipulation, making use of vectorized
operations and logical operations.
CO6: Utilize data visualization libraries to create a wide range of interactive and
insightful plots and charts for data exploration and presentation.
CO7: Demonstrate proficiency in conducting exploratory data analysis (EDA) to uncover
patterns, trends, and outliers in datasets using R and Python.
CO8: Develop the ability to interpret and derive meaningful insights from data through
statistical analysis techniques and effectively communicate the findings to stakeholders
using data visualization techniques.
Methodology:
1. 35 participative lectures to discuss the theoretical concept.
2. Tutorial and hands-on sessions related to various tools used in data science.
3. 5-8 assignments.
4. Quizzes based on subject matter.
Grading:
Internal Assessment - 50%
1. Assignments 8%
2. Quizzes/Surprise Tests 7%
3. Attendance 5%
4. 1st Mid-term exam 15%
5. 2nd Mid-term exam 15%