0% found this document useful (0 votes)
36 views53 pages

Chapter 14 DataScience

Data science Chapter 14

Uploaded by

vibhay vibhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views53 pages

Chapter 14 DataScience

Data science Chapter 14

Uploaded by

vibhay vibhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Artificial Intelligence Class X

Chapter 14
DATA SCIENCE
What is Data Science?
• It is a concept that unify statistics, data
analysis, machine learning and their related
methods in order to understand and analyse
actual phenomena with data.
• Employs techniques and theories drawn from
many fields within the context of
Mathematics, Computer Science and
Information Science.
Data Scientist
• They are analytical experts who utilize their
skills in both technology and social science to
find trends and manage data.
When you collect Data from data
source,Remember!!!
• Use data which is available for public.
• Personal dataset should be used only with the
consent of the owner.
• One should never breach someone’s privacy to
collect data.
• Data should be collected only from reliable sources.
• Reliable source of data ensure the authenticity of
data which helps in proper training of the AI model.
• For data science , the data is collected in the
form of tables.

TYPES OF DATA
CSV SQL
Spreadshee
Comma t Structured
Separated Query
Values Language
• https://www.afiniti.com/corporate/rock-paper
-scissors
• GO to the link and play the game Rock –Paper-
Scissor against an AI model.
• The challenge here is to win 20 games against
AI before AI wins them against you!!!
• and try to answer the questions given in page
160-161
Applications and Use of Data Science
DATA VISUALIZATION
• Erroneous Data:
– Incorrect Values:
– Invalid or null values:
• Missing Data: empty cells
• Outliers : Data which does not fall into the
range of certain element are known as outliers
PYTHON FOR DATA SCIENCE
• Data Visualization : Interprets the data and
identify patterns and trends out of it.
• Three important Libraries in python:
Numpy
Pandas
Matplotlib
NumPy
• NumPy stands for Numerical Python.
• Was created in 2005 by Travis Oliphant.
• Provides high performance multidimensional
array and matrix structure.
• It can be used to perform mathematical
operations on arrays.
NumPy Contains….
• Powerful N-Dimentional array objects
• Sophisticated functions
• Tools for interacting C/C++ and Fortran code
• Useful in linear algebra, Fourier
transformations and random number
capabilities.
• Arrays:It is a homogeneous Data-it can have
data of only one type.
NumPy Array
• A NumPy array is a grid of values, all of the
same type. And indexed by a tuple of non-
negative integers.
Pandas
• The name is derived from ‘Panel Data’.
• Provide functions to manipulate large amount of
structured data.
• There are two data structures in Pandas:
– Series:-handles and stores data in 1-Dimentional data
– Data Frames:-handles and stores 2-Dimentional data
(2D) .It contains two components
– Index
– Rows
– Coloumns
Series
DataFrame
matplotlib
Basic nomenclature of a plot
• Figure Title

• Axes
• Axis
• Artist
• Labels
• Title
• Legends
• Xticks
• Yticks
Installing and importing matplotlib
Components of Histogram plot
• Title : To display heading of the Histogram
• Color: to show colour of the bar
• Axis: X –Axis and Y-Axis
• Data : can be given as array
• Height and Width of Bars: This is determined
based on the analysis . The width of the bar is
called bin or interval
• Border color : To display the border colour of
the bar
Basics Statistics with Python
Regression and classification
• Regression : predicts the continuous values ,
eg: salary, age etc

• Classification : predict /classify the discrete


values such as male or female , true or false
spam or not spam etc.
K-Nearest Neighbor(KNN)
• K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
• assumes the similarity between the new case/data and
available cases and put the new case into the category that is
most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a
new data point based on the similarity.
• This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification
problems.

You might also like