Data Science Class X Notes
Data Science Class X Notes
Data Science provide some features like statistics, data analysis, machine learning and deep
learning based on the principles of Mathematics, Statistics, Computer Science and Information
Science. Data science helps to understand and analyse the actual scenario and help to take
fruitful decisions. It is also capable to discover hidden patterns from the raw data and used for
predictions.
Fraud and Risk Detection*: banking companies learned to divide and conquer data via customer
profiling, past expenditures, and other essential variables to analyse the probabilities of risk and
default. Moreover, it also helped them to push their banking products based on customer’s purchasing
power.
Genetics & Genomics*: Data Science applications also enable an advanced level of treatment
personalization through research in genetics and genomics. The goal is to understand the impact of
the DNA on our health and find individual biological connections between genetics, diseases, and drug
response
Internet Search*: All search engines like Google,Yahoo, Bing, Ask, AOL etc., make use of data science
algorithms to deliver the best result for our searched query in the fraction of a second.
Website Recommendations: online shopping promote their products in accordance with the user’s
interest and relevance of information. Internet giants like Amazon, Twitter, Google Play, Netflix,
LinkedIn, IMDB and many more use this system to improve the user experience. The
recommendations are made based on previous search results for a user.
Airline Route Planning*: Now, while using Data Science, the airline companies can :
• Predict flight delay
• Decide which class of airplanes to buy
• Whether to directly land at the destination or take a halt in between (For example, A flight can
have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any
country.)
• Effectively drive customer loyalty programs
Problem scoping :
Data Collection / Data Acquisition : Data Acquisition is the process of collecting accurate
and reliable data to work with. Data Can be in the format of the text, video, images, audio, and
so on and it can be collected from various sources.
Different Sources of data collection are :
Interview, Survey, Internet, Web Scraping, observation, Camera, Sensor, Application
Programming Interface(API).
The following point should be remembered while accessing data from any data sources:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken from reliable sources as the data collected from random
sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in the proper
training of the AI model.
Types of data
For data science models or projects, generally, data is collected in the form of tables in
different formats:
1. CSV (comma separated value) : It is a common and simple file format to store data in
tabular form separated by comma. It can be opened through any spreadsheet software
(MS Excel), documentation software (MS Word ) and any text editor (Notepad
2. Spreadsheet: A spreadsheet contains rows and columns to represent data in tabular
form. Mostly spreadsheet is used to calculate data, manipulate data, analyse data and
maintain data records. Eg: Ms excel.
3. DBMS SQL: It stands for Structured Query Language. It is used to handle the data
stored in DBMS (DataBase Management System) Software. It provides basic
commands to create, alter, delete and manage transactions for database
management.
1. NumPy
2. Pandas
3. Matplotlib
Basic Statistics with Python
Basic statistical methods used in mathematics are used for analysing and working around numeric
datasets. Statistical tools widely used in Python are:
Mean : The average of the numbers Eg: mean of 7,13,22 is (7+13+22)/3 = 42/3 = 14
Median : is the middle number in a set of datawhich ordered from least to greatest
Eg: to find median of 7,13,22, arrange numbers in increasing oder , then consider the middle number
i.e 13 is the median.
Mode : Mode os the number that occurs most frequently. Eg: in 7,13,22,13 , mode value is 13 as 13
appears more time than other numbers 7 & 22.
Standard Deviation (squareroot of a variance) : Measures the spread of the sequence around its
average value. Eg: Std deviation of 7,13,22 will be √38 = 6.164 = 6
Data Visualisation
Humans need visual aid to understand and comprehend the information passed as numbers and
tabular data. Hence, data visualisation is used to interpret the data collected and identify patterns and
trends out of it. Matplotlib package helps in visualising the data and making some sense out of it using
various kinds of graphs. Some types of graphs that we can make with this package are listed below: