Chapter 14 DataScience
Chapter 14 DataScience
Chapter 14
DATA SCIENCE
What is Data Science?
• It is a concept that unify statistics, data
analysis, machine learning and their related
methods in order to understand and analyse
actual phenomena with data.
• Employs techniques and theories drawn from
many fields within the context of
Mathematics, Computer Science and
Information Science.
Data Scientist
• They are analytical experts who utilize their
skills in both technology and social science to
find trends and manage data.
When you collect Data from data
source,Remember!!!
• Use data which is available for public.
• Personal dataset should be used only with the
consent of the owner.
• One should never breach someone’s privacy to
collect data.
• Data should be collected only from reliable sources.
• Reliable source of data ensure the authenticity of
data which helps in proper training of the AI model.
• For data science , the data is collected in the
form of tables.
TYPES OF DATA
CSV SQL
Spreadshee
Comma t Structured
Separated Query
Values Language
• https://www.afiniti.com/corporate/rock-paper
-scissors
• GO to the link and play the game Rock –Paper-
Scissor against an AI model.
• The challenge here is to win 20 games against
AI before AI wins them against you!!!
• and try to answer the questions given in page
160-161
Applications and Use of Data Science
DATA VISUALIZATION
• Erroneous Data:
– Incorrect Values:
– Invalid or null values:
• Missing Data: empty cells
• Outliers : Data which does not fall into the
range of certain element are known as outliers
PYTHON FOR DATA SCIENCE
• Data Visualization : Interprets the data and
identify patterns and trends out of it.
• Three important Libraries in python:
Numpy
Pandas
Matplotlib
NumPy
• NumPy stands for Numerical Python.
• Was created in 2005 by Travis Oliphant.
• Provides high performance multidimensional
array and matrix structure.
• It can be used to perform mathematical
operations on arrays.
NumPy Contains….
• Powerful N-Dimentional array objects
• Sophisticated functions
• Tools for interacting C/C++ and Fortran code
• Useful in linear algebra, Fourier
transformations and random number
capabilities.
• Arrays:It is a homogeneous Data-it can have
data of only one type.
NumPy Array
• A NumPy array is a grid of values, all of the
same type. And indexed by a tuple of non-
negative integers.
Pandas
• The name is derived from ‘Panel Data’.
• Provide functions to manipulate large amount of
structured data.
• There are two data structures in Pandas:
– Series:-handles and stores data in 1-Dimentional data
– Data Frames:-handles and stores 2-Dimentional data
(2D) .It contains two components
– Index
– Rows
– Coloumns
Series
DataFrame
matplotlib
Basic nomenclature of a plot
• Figure Title
• Axes
• Axis
• Artist
• Labels
• Title
• Legends
• Xticks
• Yticks
Installing and importing matplotlib
Components of Histogram plot
• Title : To display heading of the Histogram
• Color: to show colour of the bar
• Axis: X –Axis and Y-Axis
• Data : can be given as array
• Height and Width of Bars: This is determined
based on the analysis . The width of the bar is
called bin or interval
• Border color : To display the border colour of
the bar
Basics Statistics with Python
Regression and classification
• Regression : predicts the continuous values ,
eg: salary, age etc