0% found this document useful (0 votes)
24 views

Data Science Class X Notes

Uploaded by

dhruv.chougale31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Data Science Class X Notes

Uploaded by

dhruv.chougale31
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DATA SCIENCE

Data Science provide some features like statistics, data analysis, machine learning and deep
learning based on the principles of Mathematics, Statistics, Computer Science and Information
Science. Data science helps to understand and analyse the actual scenario and help to take
fruitful decisions. It is also capable to discover hidden patterns from the raw data and used for
predictions.

Why Data Science?


Earlier the data processing was quite easy because data was limited and structured. But as
we started using Internet it increases the use of unstructured data in huge amount.This
increaseed the demand for data science.

Applications of data science

Fraud and Risk Detection*: banking companies learned to divide and conquer data via customer
profiling, past expenditures, and other essential variables to analyse the probabilities of risk and
default. Moreover, it also helped them to push their banking products based on customer’s purchasing
power.

Genetics & Genomics*: Data Science applications also enable an advanced level of treatment
personalization through research in genetics and genomics. The goal is to understand the impact of
the DNA on our health and find individual biological connections between genetics, diseases, and drug
response

Internet Search*: All search engines like Google,Yahoo, Bing, Ask, AOL etc., make use of data science
algorithms to deliver the best result for our searched query in the fraction of a second.

Targeted Advertising*: They can target based on a user’s past behaviour.

Website Recommendations: online shopping promote their products in accordance with the user’s
interest and relevance of information. Internet giants like Amazon, Twitter, Google Play, Netflix,
LinkedIn, IMDB and many more use this system to improve the user experience. The
recommendations are made based on previous search results for a user.

Airline Route Planning*: Now, while using Data Science, the airline companies can :
• Predict flight delay
• Decide which class of airplanes to buy
• Whether to directly land at the destination or take a halt in between (For example, A flight can
have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any
country.)
• Effectively drive customer loyalty programs

Revisiting AI Project Cycle

Problem scoping :

Revision of 4W canvas and Problem Statement Template

Data Collection / Data Acquisition : Data Acquisition is the process of collecting accurate
and reliable data to work with. Data Can be in the format of the text, video, images, audio, and
so on and it can be collected from various sources.
Different Sources of data collection are :
Interview, Survey, Internet, Web Scraping, observation, Camera, Sensor, Application
Programming Interface(API).

The major kinds of sources for data collection are:


1. Online 2. Offline

The following point should be remembered while accessing data from any data sources:

1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data collected from random
sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in the proper
training of the AI model.

Types of data
For data science models or projects, generally, data is collected in the form of tables in
different formats:

1. CSV (comma separated value) : It is a common and simple file format to store data in
tabular form separated by comma. It can be opened through any spreadsheet software
(MS Excel), documentation software (MS Word ) and any text editor (Notepad
2. Spreadsheet: A spreadsheet contains rows and columns to represent data in tabular
form. Mostly spreadsheet is used to calculate data, manipulate data, analyse data and
maintain data records. Eg: Ms excel.
3. DBMS SQL: It stands for Structured Query Language. It is used to handle the data
stored in DBMS (DataBase Management System) Software. It provides basic
commands to create, alter, delete and manage transactions for database
management.

Some of the these packages are

1. NumPy
2. Pandas
3. Matplotlib
Basic Statistics with Python

Basic statistical methods used in mathematics are used for analysing and working around numeric
datasets. Statistical tools widely used in Python are:

Mean : The average of the numbers Eg: mean of 7,13,22 is (7+13+22)/3 = 42/3 = 14

Median : is the middle number in a set of datawhich ordered from least to greatest

Eg: to find median of 7,13,22, arrange numbers in increasing oder , then consider the middle number
i.e 13 is the median.

Mode : Mode os the number that occurs most frequently. Eg: in 7,13,22,13 , mode value is 13 as 13
appears more time than other numbers 7 & 22.

Standard Deviation (squareroot of a variance) : Measures the spread of the sequence around its
average value. Eg: Std deviation of 7,13,22 will be √38 = 6.164 = 6

Variance : Average of the squared differences from the mean.

Data Visualisation

Humans need visual aid to understand and comprehend the information passed as numbers and
tabular data. Hence, data visualisation is used to interpret the data collected and identify patterns and
trends out of it. Matplotlib package helps in visualising the data and making some sense out of it using
various kinds of graphs. Some types of graphs that we can make with this package are listed below:

1. Scatter Plot : Scatter plots are used to plot discontinuous datq.


2. Bar Chart : It is one of the most commonly used graphical methods. It is used represent the
quantities at different period s of time using bars.
3. Histogram : Histograms are the accurate representation of a continuous data at a period of
time or frequency called bins.
4. Pie Chart : It is used represent the data divided in percentage of whole data.

You might also like