0% found this document useful (0 votes)
234 views1 page

CS3352 Foundations of Data Science

The document outlines the units of a course on foundations of data science. The five units cover topics like data science process, describing data with statistics and graphs, describing relationships with correlation and regression, using Python libraries like NumPy and Pandas for data wrangling, and data visualization with Matplotlib and Seaborn.

Uploaded by

Thamarai Kannan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
234 views1 page

CS3352 Foundations of Data Science

The document outlines the units of a course on foundations of data science. The five units cover topics like data science process, describing data with statistics and graphs, describing relationships with correlation and regression, using Python libraries like NumPy and Pandas for data wrangling, and data visualization with Matplotlib and Seaborn.

Uploaded by

Thamarai Kannan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

CS3352 FOUNDATIONS OF DATA SCIENCE

UNIT I INTRODUCTION
Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining
Research goals – Retrieving data – Data preparation - Exploratory Data analysis – build the model–
Presenting findings and building applications - Data Mining - Data Warehousing – Basic Statistical
descriptions of Data.

UNIT II DESCRIBING DATA


Types of Data - Types of Variables -Describing Data with Tables and Graphs –Describing Data
with Averages - Describing Variability - Normal Distributions and Standard (z) Scores

UNIT III DESCRIBING RELATIONSHIPS


Correlation –Scatter plots –correlation coefficient for quantitative data –computational formula for
correlation coefficient – Regression –regression line –least squares regression line – Standard error
of estimate – interpretation of r2 –multiple regression equations –regression towards the mean

UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING


Basics of Numpy arrays –aggregations –computations on arrays –comparisons, masks, Boolean
logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and
selection – operating on data – missing data – Hierarchical indexing – combining datasets –
aggregation and grouping – pivot tables

UNIT V DATA VISUALIZATION


Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots
– Histograms – legends – colors – subplots – text and annotation – customization – three
dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.

TEXTBOOKS:
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning
Publications, 2016. (Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.
(Units II and III)
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Units IV and V)

REFERENCE:
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea
Press,2014.for Data Wrangling.

You might also like