0% found this document useful (0 votes)
9 views3 pages

Foundations of Data Science II

The course 'Foundations of Data Science II' introduces students to R and Python programming, focusing on data types, control structures, data manipulation, and visualization techniques. Students will learn to conduct statistical analysis and create insightful visualizations using libraries like ggplot2 and Matplotlib. By the end of the course, students will have developed essential programming skills for effective data analysis and communication of findings.

Uploaded by

vikrambasbata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Foundations of Data Science II

The course 'Foundations of Data Science II' introduces students to R and Python programming, focusing on data types, control structures, data manipulation, and visualization techniques. Students will learn to conduct statistical analysis and create insightful visualizations using libraries like ggplot2 and Matplotlib. By the end of the course, students will have developed essential programming skills for effective data analysis and communication of findings.

Uploaded by

vikrambasbata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Course Name : Foundations of Data Science II

Course Code : CSU1938


Course Instructor : Dr. Lokender Kumar
Contact Hours: 3+0+0 Credits: 3

Course Description:
In this course, students are introduced to the basics of R programming, covering topics such as R
environment setup, data types, variables, control structures, and working with R functions. They
will also learn data input/output, cleaning, handling missing data, and explore vectorized
operations and logical operations in R. Students will explore statistical analysis using R to derive
meaningful insights from data. Further, students are introduced to the fundamentals of Python
programming, learning about Python's history, setting up the development environment, data
types, control structures, and user-defined functions. They will explore Python libraries, file
handling, and working with external data formats. The course covers data analysis with pandas,
including working with DataFrames and Series. Students will create basic plots using Matplotlib
and explore advanced data visualization techniques. By the end of this course, students will be
equipped with the necessary programming skills and tools to conduct data analysis, manipulate
data effectively, and create insightful visualizations using both R and Python.

Detailed Course Contents:


Unit I: Basics of R programming
No of Lectures: 5; No of Tutorials:5; No of Practicals:0
Introduction to R and RStudio, Setting up the R Environment, R Data Types (numeric, character,
logical, etc.), Variables and Assignments, Basic Arithmetic Operations in R, Data Structures in R
(vectors, matrices, lists), Indexing and Sub-setting Data, Control Structures (if-else, for loop,
while loop), Built-in R Functions (mean, sum, max, min, etc.), Data Input and Output (read and
write data), Data Cleaning and Handling Missing Data, Vectorized Operations in R, Working
with Factors and Categorical Data, Logical Operations and Boolean Data, Introduction to R
Packages, Debugging Techniques in R, Handling Errors and Exceptions.

Unit II: Wrangling Data and Data visualization using R


No of Lectures: 5; No of Tutorials:5; No of Practicals:0
Introduction to Data Visualization in R, Data Import and Export in R (CSV, Excel, Text files,
databases, Base R Graphics (plot, barplot, histogram, boxplot, etc.), Data Visualization with
ggplot2 package, Customizing Plots in ggplot2, Faceting and Multiple Plots, Advanced Data
Visualization (heatmap, scatterplot matrices, etc.), Interactive Data Visualization with Plotly and
ggplotly, Data Visualization Best Practices, Statistical Analysis using R.
Unit III: Introduction to Python
No of Lectures: 6; No of Tutorials:6; No of Practicals:0
Introduction to Python and its History, Installing Python and Setting up the Development
Environment, Python Interpreter, Python Variables and Data Types (integers, floats, strings,
booleans, etc.), Working with Python Operators, Python Data Structures (lists, tuples,
dictionaries, sets), Indexing and Slicing in Python Lists and Strings, Python Control Structures
(if-else, for loop, while loop), Python Functions and User-defined Functions, Built-in Python
Functions (print, len, range, etc.), Python Libraries and Packages, Python File Handling
(opening, reading, and writing files), Working with External Data Formats (CSV, JSON, Text
files), Handling Errors and Debugging Techniques in Python.

Unit IV: Fundamentals of Data Manipulation with Python


No of Lectures: 6; No of Tutorials:6; No of Practicals:0
Introduction to Jupyter Notebook and Google Colab, Introduction to Numerical Python Library
(NumPy), array, matrix, Mathematical Operations, Introduction to Data Analysis with pandas,
Working with DataFrames and Series, Basic Plots (line plots, bar plots, scatter plots, etc.) using
Matplotlib, Advanced Data Visualization Techniques with Matplotlib, Introduction to Statistical
Analysis with Python.

Knowledge Outcome:
At the end of the course, the student should be able to:
CO1: Understand the fundamental concepts of programming in R and Python, including
variables, data types, control structures, and functions.
CO2: Comprehend the data wrangling and manipulation techniques in R and Python,
such as data cleaning, handling missing data, and working with data structures like
DataFrames and arrays.
CO3: Gain knowledge of data visualization principles and techniques using libraries such
as Matplotlib, ggplot2, and Plotly in R and Python.
CO2: Acquire a knowledge of statistical analysis concepts and their application in data
science using R and Python.

Skill Outcome:
At the end of the course, the student should be:
CO5: Apply programming skills in R and Python to perform data analysis tasks,
including data import/export, data cleaning, and manipulation, making use of vectorized
operations and logical operations.
CO6: Utilize data visualization libraries to create a wide range of interactive and
insightful plots and charts for data exploration and presentation.
CO7: Demonstrate proficiency in conducting exploratory data analysis (EDA) to uncover
patterns, trends, and outliers in datasets using R and Python.
CO8: Develop the ability to interpret and derive meaningful insights from data through
statistical analysis techniques and effectively communicate the findings to stakeholders
using data visualization techniques.

Methodology:
1. 35 participative lectures to discuss the theoretical concept.
2. Tutorial and hands-on sessions related to various tools used in data science.
3. 5-8 assignments.
4. Quizzes based on subject matter.

Grading:
Internal Assessment - 50%
1. Assignments 8%
2. Quizzes/Surprise Tests 7%
3. Attendance 5%
4. 1st Mid-term exam 15%
5. 2nd Mid-term exam 15%

End Term Exam - 50%

Required Books and Materials:


 Wickham, Hadley, and Garrett Grolemund. R for Data Science: Import, Tidy, Transform,
Visualize, and Model Data. O'Reilly Media, 2017.
 VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with
Data. O'Reilly Media, 2016.
 Grolemund, Garrett, and Hadley Wickham. R Graphics Cookbook: Practical Recipes for
Visualizing Data. O'Reilly Media, 2013.
 McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and
IPython. O'Reilly Media, 2017.

You might also like