Fundamentals of Data Science: Nehru Institute of Engineering and Technology

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Couse On

AD8302
Fundamentals of Data Science
Dr. N. K. Sakthivel, M.Tech., Ph.D.
Dean (Computing)
Nehru Institute of Engineering and Technology
Accredited by NAAC, Recognized by UGC with Section 2(f) and 12(B), NBA Accredited UG Courses : Aero, CSE and Mech
Coimbatore, Tamil Nadu, INDIA

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 1


AD8302 Fundamentals of Data Science
OBJECTIVES:

• Will Gain Knowledge in the basic concepts of Data Analysis


• To acquire Skills in Data Preparatory and Preprocessing Steps
• To Understand the Mathematical Skills in Statistics
• To Learn the Tools and Packages in Python for Data Science
• To Gain Understanding in Classification and Regression Model
• To Acquire Knowledge in Data Interpretation and Visualization Techniques

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 2


AD8302 Fundamentals of Data Science
UNIT I INTRODUCTION 9
Need for data science – benefits and uses – facets of data – data science process – setting the research
goal – retrieving data – cleansing, integrating, and transforming data – exploratory data analysis – build
the models – presenting and building applications
UNIT II DESCRIBING DATA Level I 9
Frequency distributions – Outliers – relative frequency distributions – cumulative frequency
distributions – frequency distributions for nominal data – interpreting distributions – graphs – averages
– mode – median – mean – averages for qualitative and ranked data – describing variability Tentative –
range – variance – standard deviation – degrees of freedom – interquartile range – variability for
qualitative and ranked data
UNIT III PYTHON FOR DATA HANDLING 9
Basics of Numpy arrays – aggregations – computations on arrays – comparisons, masks, boolean logic –
fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and selection –
operating on data – missing data – hierarchical indexing – combining datasets – aggregation and
grouping – pivot tables

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 3


AD8302 Fundamentals of Data Science
UNIT IV DESCRIBING DATA Level II 9
Normal distributions – z scores – normal curve problems – finding proportions – finding scores – more
about z scores – correlation – scatter plots – correlation coefficient for quantitative data –
computational formula for correlation coefficient – regression – regression line – least squares
regression line – standard error of estimate – interpretation of r2 – multiple regression equations –
regression toward the mean
UNIT V PYTHON FOR DATA VISUALIZATION 9
Visualization with matplotlib – line plots – scatter plots – visualizing errors – density and contour plots –
histograms, binnings, and density – three dimensional plotting – geographic data – data analysis using
statmodels and seaborn – graph plotting using Plotly – interactive data visualization using Bokeh
TEXT BOOK:
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning Publications, 2016. (first two chapters
for Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017. (Chapters 1–7 for Units II and III)
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Parts of chapters 2–4 for Units IV and V)Bob Hughes, Mike
Cotterell and Rajib Mall: Software Project Management – Fifth Edition, Tata McGraw Hill, New Delhi, 2012.

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 4


AD8302 Fundamentals of Data Science
OUTCOMES :
At the end of the course, the students will be able to:
• Apply Python PANDAS Library for Data Pre-processing and Cleansing (Apply)
• Determine the relationship between Data Dependencies using Statistics (Analysis)
• Analyze data using primary tools used for Data Science in Python (Analysis)
• Discover the useful information using Mathematical Statistical Skills (Apply)
• Apply the knowledge for Data Describing and Visualization using Python Data Visualization Libraries
(Apply)

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 5


AD8302 Fundamentals of Data Science
UNIT I INTRODUCTION 9
Need for data science – benefits and uses – facets of data – data science process – setting the research goal –
retrieving data – cleansing, integrating, and transforming data – exploratory data analysis – build the models –
presenting and building applications
Data Science
• Data Science is a multidisciplinary approach to extract actionable insights /
Understanding from the large Data ever-increasing volumes of data
• Data Science encompasses preparing data for
• Analysis and Processing
• Performing Advanced Data Analysis
• Presenting the Results to Reveal Patterns and
• Enable us to Draw Conclusions

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 6


Big Data vs Data Science
Big Data
• Big Data is data that contains greater Variety, arriving in increasing Volumes and with
more Velocity. ...
• Volume - How much data is there?
• Variety - How diverse are different types of data?
• Velocity - At what speed is new data generated?
• Veracity: How Accurate is the data?
• These four properties make big data different from the data found in traditional data
management tools
• That is, Big Data is larger, more complex data sets, especially from new data sources
• Thus the traditional data processing software can't manage them
• Difficult to process by traditional Data Management Techniques such as, RDBMS.

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 7


Benefits and uses of Data Science and Big Data
• Data Science and big data are used almost everywhere in both commercial and non-
commercial settings
• Commercial companies in almost every industry use data science and big data to gain
insights into their
• Customers, Processes, Staff, Completion, and Products
• Many companies use data science to offer customers a better user experience, as
well as to cross-sell, up-sell, and personalize their offerings
• Google AdSense, which collects data from internet users so relevant commercial
messages can be matched to the person browsing the internet
• MaxPoint (http://maxpoint.com/us) is another example of real-time personalized
advertising

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 8


Data Science Processes
• Capture
• Discovering
• Discover features of your data and quickly determine the
value of your data set
• Prepare and Cleaning for Enriching Data
• Cleansing… Reduplicating
• Aggregating… Integrating… Transforming (Reformatting)
and
• Manipulating
• Analysis
• To Extract Insights… Understand Pattern… Predict
Patterns
• Build
• Validate
• Visualization and Deploy

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 9


Data Science Processes

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 10


Facets of Data
• In data science and big data, there were many different types of data, and each of them
tends to require different tools and techniques.
• The main categories of data are
• Structured
• Unstructured
• Natural Language
• Machine-Generated
• Graph-Based
• Audio, Video, and Images
• Streaming

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 11


Facets of Data
• Structured vs Unstructured

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 12


Facets of Data
• Natural Language
• Natural Language or ordinary language is any language that has evolved naturally in
humans through use and repetition without conscious planning or premeditation
• Natural languages can take different forms, such as speech or signing
• NLP helps machines to understand what people write or say
• Using techniques like audio to text conversion, it gives computers the power to
understand human speech
• It also allows us to implement voice control over different systems

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 13


Facets of Data
Machine-Generated Graph-Based

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 14


Facets of Data
• Audio, Video, and Images
• Audio, image, and video are data types that pose specific challenges to a data
scientist
• Recognizing objects in pictures, turn out to be challenging for computers
• We have techniques to learn how to play video games
• This algorithm takes the video screen as input and learns to interpret everything via a
complex process of deep learning
• Streaming
• While streaming data can take almost any of the Audio form
• It has an extra property
• The data flows into the system when an event happens instead of being loaded,
stored and watch

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 15


Facets of Data
• Audio, Video, and Images
• Audio, image, and video are data types that pose specific challenges to a data
scientist
• Recognizing objects in pictures, turn out to be challenging for computers
• We have techniques to learn how to play video games
• This algorithm takes the video screen as input and learns to interpret everything via a
complex process of deep learning
• Streaming
• While streaming data can take almost any of the Audio form
• It has an extra property
• The data flows into the system when an event happens instead of being loaded,
stored and watch

09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 16


09/17/2021 Dr. N. K. Sakthivel +91 9443302399 [email protected] www.youtube.com/techmakersindia 17

You might also like