0% found this document useful (0 votes)
5 views3 pages

Chapter-3 Data Sciences Study Materials Final-1

Chapter 3 discusses Data Science as a unifying concept that combines statistics, data analysis, and machine learning to analyze phenomena with data. It highlights various applications including fraud detection in finance, personalized medicine in genetics, internet search optimization, targeted advertising, product recommendations, and airline route planning. The chapter also emphasizes the importance of reliable data sources and outlines common data formats used in data science.

Uploaded by

halawa0071
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Chapter-3 Data Sciences Study Materials Final-1

Chapter 3 discusses Data Science as a unifying concept that combines statistics, data analysis, and machine learning to analyze phenomena with data. It highlights various applications including fraud detection in finance, personalized medicine in genetics, internet search optimization, targeted advertising, product recommendations, and airline route planning. The chapter also emphasizes the importance of reliable data sources and outlines common data formats used in data science.

Uploaded by

halawa0071
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Chapter-3 Data Science

1. Data Sciences, it is a concept to unify(combined) statistics(collection,


analysis of numeric data,), data analysis, machine learning and their related
methods in order to understand and analyse actual phenomena with data.
2. It employs techniques and theories drawn from many fields within the
context of Mathematics, Statistics, Computer Science and Information
Science.

Applications of Data Sciences


(a) Fraud and Risk Detection*:
1. The earliest applications of data science were in Finance. Companies were
fed up of bad debts and losses every year. However, they had a lot of data
which use to get collected during the initial paperwork while sanctioning
loans. They decided to bring in data scientists in order to rescue them from
losses.
2. Over the years, banking companies learned to divide and conquer(solve) data
via customer profiling, past expenditures, and other essential variables to
analyse the probabilities of risk and default. Moreover, it also helped them to
push their banking products based on customer’s purchasing power.
(b) Genetics & Genomics*:
1. Data Science applications also enable an advanced level of treatment
personalization through research in genetics and genomics.
2. The goal is to understand the impact of the DNA on our health and find
individual biological connections between genetics, diseases, and drug
response.
3. Data science techniques allow integration of different kinds of data with
genomic data in disease research, which provides a deeper understanding
of genetic issues in reactions to particular drugs and diseases.
4. As soon as we acquire reliable personal genome data, we will achieve a
deeper understanding of the human DNA.
5. The advanced genetic risk prediction will be a major step towards more
individual care.
(c) Internet Search*:
1. There are many other search engines like Yahoo, Bing, Ask, AOL,google and
so on.
2. All these search engines (including Google) make use of data science
algorithms to deliver the best result for our searched query in the fraction
of a second. Considering the fact that Google processes more than 20
petabytes of data every day, had there been no data science, Google
wouldn’t have been the ‘Google’ we know today.
(d) Targeted Advertising*:
1. If you thought Search would have been the biggest of all data science
applications. Its not true.
2. The entire digital marketing spectrum--Starting from the display banners on
various websites to the digital billboards(hoardings) at the airports – almost
all of them are decided by using data science algorithms.
3. This is the reason why digital ads have been able to get a much higher CTR
(Call-Through Rate) i.e. CTR is the number of clicks that your ad receives
divided by the number of times your ad is shown: clicks ÷ impressions =
CTR. than traditional advertisements. They can be targeted based on a
user’s past behaviour.
(e) Website Recommendations:*
1. Aren’t we all used to the suggestions about similar products on Amazon?
They not only help us find relevant products from billions of products
available with them but also add a lot to the user experience.
2. A lot of companies have used this engine to promote their products in
accordance with the user’s interest and relevance of information.
3. Internet giants like Amazon, Twitter, Google Play, Netflix, LinkedIn,
IMDB(Internet Movie Database) and many more use this system to improve
the user experience. The recommendations are made based on previous
search results for a user.
(f) Airline Route Planning*:
1. The Airline Industry across the world is known to bear heavy losses. Except
for a few airline service providers, companies are struggling to maintain
their occupancy ratio (how much of a space or property is being used or
rented compared to the total amount of available space) and operating
profits.
2. With high rise in air-fuel prices and the need to offer heavy discounts to
customers, the situation has got worse.
3. It wasn’t long before airline companies started using Data Science to
identify the strategic areas of improvements. Now, while using Data
Science, the airline companies can:
 Predict flight delay
 Decide which class of airplanes to buy
 Whether to directly land at the destination or take a halt in between (For
example, A flight can have a direct route from New Delhi to New York.
 Alternatively, it can also choose to halt in any country.)
 Effectively drive customer loyalty programs

Sources of Data
There exist various sources of data from where we can collect any type of data required and the data
collection process can be categorised in two ways: Offline and Online.
While accessing data from any of the data sources, following points should be
kept in mind:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken form reliable sources as the data collected from
random sources can be wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper
training of the AI model.
Types of Data
For Data Science, usually the data is collected in the form of tables. These tabular
datasets can be stored in different formats. Some of the commonly used formats
are:
1. CSV: CSV stands for comma separated values. It is a simple file format used to
store tabular data. Each line of this file is a data record and each record consists
of one or more fields which are separated by commas. Since the values of records
are separated by a comma, hence they are known as CSV files.
2. Spreadsheet: A Spreadsheet is a piece of paper or a computer program which
is used for accounting and recording data using rows and columns into which
information can be entered. Microsoft excel is a program which helps in creating
spreadsheets.
3. SQL: SQL is a programming language also known as Structured Query Language.
It is a domain-specific language used in programming and is designed for
managing data held in different kinds of DBMS (Database Management System) It
is particularly useful in handling structured data.

You might also like