0% found this document useful (0 votes)

19 views24 pages

FDS - Unit-I - Notes

Uploaded by

mmp.scos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views24 pages

FDS - Unit-I - Notes

Uploaded by

mmp.scos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Introduction to Big Data

Big Data is large, diversified sets of data sourcing out of multiple channels: social media
platforms, websites, electronic check-ins, sensors, product purchase, call logs, the
choices are limitless. Big Data has three unique characteristics: volume, velocity, and
variety.

 Volume: Big Data contains an undecided and unfiltered volume of information.

The data collected is different for different businesses. Therefore, the efforts paid
are unique. Nonetheless, filtering valuable data from the voluminous pile is
essential. Companies need to process this high-volume information to address
their business challenges.
 Velocity: It is the speed at which data is created and collected. Mobile, SaaS
solutions, e-commerce transactions, and IoT devices are a few of the primary
sources of acquiring real-time data. The velocity at which data is generated at
scale requires real-time handling and processing for augmenting Data Analytics.
 Variety: Conventional data types consist of structured data that fit well with
relational databases. However, with Semi-structured and Unstructured data in
the landscape, the information received requires additional preprocessing to
convert it into digestible formats. While Structured data can be quickly dealt with,
Semi-structured and Unstructured data need to be converted into predetermined
models or formats before turning them into actionable information.
Processing Big Data has become the go-to technique to collect information that can
further be used to enhance business operations. However, the process is not
straightforward. Considering its diversified nature and content, traditional relational
databases are incapable of capturing, managing, or processing Big Data into digestible
formats.

Understanding the Benefits of Big Data

Data Analysts harness different data types primarily to make better and improved
business decisions by understanding customer behavior and their purchasing patterns.
Data Mining, Machine Learning, and Predictive Analytics are a few of the newly-evolved
techniques used to achieve new insights into untapped data source areas for optimizing
business processes. Let’s discuss the main benefits that businesses can reap from Big
Data:

 Big Data allows companies to improve their products and create tailored
marketing by gaining a 360-degree view of their customers’ behavior and
motivations.
 It enables businesses or service providers to monitor fraudulent activities in real-
time by identifying unusual patterns and behavior with the help of Predictive
Analytics.
 It drives supply chain efficiencies by collecting and analyzing data to determine if
products are reaching their destination in the desired conditions to attract
customers’ interest.
 Predictive analysis allows businesses to scan and analyze social media feeds to
understand the sentiment among customers.
 Companies that collect a large amount of data have a better chance to explore
the untapped area alongside conducting a more profound and richer analysis to
benefit all stakeholders.
 The faster and better a business understands its customer, the greater benefits it
reaps. Big Data is used to train Machine Learning models to identify patterns and
make informed decisions with minimal or no human intervention.

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo helps you integrate and load
data from 100+ different sources to a destination of your choice in real-time in an
effortless manner. Hevo with its minimal learning curve can be set up in just a few
minutes allowing the users to load data without having to compromise performance. Its
strong integration with umpteenth sources allows users to bring in data of different kinds
in a smooth fashion without having to code a single line.

Check out some of the cool features of Hevo:

 Completely Automated: The Hevo platform can be set up in just a few minutes
and requires minimal maintenance.
 Transformations: Hevo provides preload transformations through Python code.
It also allows you to run transformation code for each event in the pipelines you
set up. You need to edit the properties of the event object received in the
transform method as a parameter to carry out the transformation. Hevo also
offers drag and drop transformations like Date and Control Functions, JSON, and
Event Manipulation to name a few. These can be configured and tested before
putting them to use.
 Connectors: Hevo supports 100+ integrations to SaaS platforms, files,
databases, analytics, and BI tools. It supports various destinations including
Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3
Data Lakes; and MySQL, MongoDB, TokuDB, DynamoDB, PostgreSQL
databases to name a few.
 Real-Time Data Transfer: Hevo provides real-time data migration, so you can
have analysis-ready data always.
 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure
ensures reliable data transfer with zero data loss.
 Scalable Infrastructure: Hevo has in-built integrations for 100+ sources like
Google Analytics, that can help you scale your data infrastructure as required.
 24/7 Live Support: The Hevo team is available round the clock to extend
exceptional support to you through chat, email, and support calls.
 Schema Management: Hevo takes away the tedious task of schema
management & automatically detects the schema of incoming data and maps it
to the destination schema.
 Live Monitoring: Hevo allows you to monitor the data flow so you can check
where your data is at a particular point in time.
Data Science Introduction
Data Science is a combination of multiple disciplines that uses statistics, data analysis, and
machine learning to analyze data and to extract knowledge and insights from it.

In today’s world a large amount of data is generated daily the main challenge is to deal
with this data and extract insights from the data to help various organizations and
businesses. This is where Data Science comes in when it helps data to combine data
and make a pattern with the help of skills as such computer science, mathematics,
statistics, information visualization, graphics, and business to deal with this data.

What is Data Science?

 Data science is a concept to brings together ideas, data examination, Machine
Learning, and their related strategies to comprehend and dissect genuine
phenomena with data.
 It is an extension of data analysis fields such as data mining, statistics, and
predictive analysis.
 It is a huge field that uses a lot of methods and concepts that belong to other
fields like information science, statistics, mathematics, and computer science.
 Some of the techniques utilized in Data Science encompass machine
learning, visualization, pattern recognition, probability modeling data, data
engineering, signal processing, etc.

Data Science is about data gathering, analysis and decision-making.

Data Science is about finding patterns in data, through analysis, and make future predictions.

By using Data Science, companies are able to make:

Better decisions (should we choose A or B)

Predictive analysis (what will happen next?)

Pattern discoveries (find pattern, or maybe hidden information in the data)

Where is Data Science Needed?

Data Science is used in many industries in the world today, e.g. banking, consultancy,
healthcare, and manufacturing.

Let us understand what is the need for Data Science?

Earlier data used to be much less and generally accessible in a well-structured form,
that we could save effortlessly and easily in Excel sheets, and with the help of Business
Intelligence tools data can be processed efficiently. But Today we used to deal with
large amounts of data like about 3.0 quintals bytes of records is being produced on each
and every day, which ultimately results in an explosion of records and data. According
to recent researches, It is estimated that 1.9 MB of data and records are created in a
second that too through a single individual.
So this a very big challenge for any organization to deal with such a massive amount of
data generating every second. For handling and evaluating this data we required some
very powerful, complex algorithms and technologies and this is where Data science
comes into the picture.

Examples of where Data Science is needed:

For route planning: To discover the best routes to ship

To foresee delays for flight/ship/train etc. (through predictive analysis)

To create promotional offers

To find the best suited time to deliver goods

To forecast the next years revenue for a company

To analyze health benefit of training

To predict who will win elections

Data Science can be applied in nearly every part of a business where data is available. Examples
are:

Consumer goods
Stock markets
Industry
Politics
Logistic companies
E-commerce
ADVERTISEMENT
How Does a Data Scientist Work?
A Data Scientist requires expertise in several backgrounds:

Machine Learning
Statistics
Programming (Python or R)
Mathematics
Databases
A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she
must organize the data in a standard format.

Here is how a Data Scientist works:

Ask the right questions - To understand the business problem.

Explore and collect data - From database, web logs, customer feedback, etc.
Extract the data - Transform the data to a standardized format.
Clean the data - Remove erroneous values from the data.
Find and replace missing values - Check for missing values and replace them with a suitable
value (e.g. an average value).
Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m.
However, the number 140 is larger than 1,8. - so scaling is important).
Analyze data, find patterns and make future predictions.
Represent the result - Present the result with useful insights in a way the "company" can
understand.
Where to Start?
In this tutorial, we will start by presenting what data is and how data can be analyzed.
You will learn how to use statistics and mathematical functions to make predictions.

Data Science - What is Data?

What is Data?
Data is a collection of information.

One purpose of Data Science is to structure data, making it interpretable and easy to work with.
Data can be categorized into two groups:
Structured data
Unstructured data
Unstructured Data
Unstructured data is not organized. We must organize the data for analysis purposes.
Structured Data
Structured data is organized and easier to work with.

How to Structure Data?

We can use an array or a database table to structure or present data.

Example of an array:

[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
The following example shows how to create an array in Python:

Example
Array = [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
print(Array)
It is common to work with very large data sets in Data Science.

In this tutorial we will try to make it as easy as possible to understand the concepts of Data
Science. We will therefore work with a small data set that is easy to interpret.

Data Science Lifecycle

Data Science Lifecycle revolves around the use of machine learning and different
analytical strategies to produce insights and predictions from information in order to
acquire a commercial enterprise objective. The complete method includes a number of
steps like data cleaning, preparation, modelling, model evaluation, etc. It is a lengthy
procedure and may additionally take quite a few months to complete. So, it is very
essential to have a generic structure to observe for each and every hassle at hand. The
globally mentioned structure in fixing any analytical problem is referred to as a Cross
Industry Standard Process for Data Mining or CRISP-DM framework.

The following are some primary motives for the use of Data science technology:
1. It helps to convert the big quantity of uncooked and unstructured records into
significant insights.
2. It can assist in unique predictions such as a range of surveys, elections, etc.
3. It also helps in automating transportation such as growing a self-driving car,
we can say which is the future of transportation.
4. Companies are shifting towards Data science and opting for this technology.
Amazon, Netflix, etc, which cope with the big quantity of data, are the use
of information science algorithms for higher consumer experience.
5. The lifecycle of Data Science
6.
7.
8. 1. Business Understanding: The complete cycle revolves around the enterprise
goal. What will you resolve if you do not longer have a specific problem? It is
extraordinarily essential to apprehend the commercial enterprise goal sincerely
due to the fact that will be your ultimate aim of the analysis. After desirable
perception only we can set the precise aim of evaluation that is in sync with the
enterprise objective. You need to understand if the customer desires to minimize
savings loss, or if they prefer to predict the rate of a commodity, etc.
9. 2. Data Understanding: After enterprise understanding, the subsequent step is
data understanding. This includes a series of all the reachable data. Here you
need to intently work with the commercial enterprise group as they are certainly
conscious of what information is present, what facts should be used for this
commercial enterprise problem, and different information. This step includes
describing the data, their structure, their relevance, their records type. Explore
the information using graphical plots. Basically, extracting any data that you can
get about the information through simply exploring the data.
10. 3. Preparation of Data: Next comes the data preparation stage. This consists of
steps like choosing the applicable data, integrating the data by means of merging
the data sets, cleaning it, treating the lacking values through either eliminating
them or imputing them, treating inaccurate data through eliminating them,
additionally test for outliers the use of box plots and cope with them.
Constructing new data, derive new elements from present ones. Format the data
into the preferred structure, eliminate undesirable columns and features. Data
preparation is the most time-consuming but arguably the most essential step in
the complete existence cycle. Your model will be as accurate as your data.
11. 4. Exploratory Data Analysis: This step includes getting some concept about
the answer and elements affecting it, earlier than constructing the real model.
Distribution of data inside distinctive variables of a character is explored
graphically the usage of bar-graphs, Relations between distinct aspects are
captured via graphical representations like scatter plots and warmth maps. Many
data visualization strategies are considerably used to discover each and every
characteristic individually and by means of combining them with different
features.
12. 5. Data Modeling: Data modeling is the coronary heart of data analysis. A
model takes the organized data as input and gives the preferred output. This step
consists of selecting the suitable kind of model, whether the problem is a
classification problem, or a regression problem or a clustering problem. After
deciding on the model family, amongst the number of algorithms amongst that
family, we need to cautiously pick out the algorithms to put into effect and
enforce them. We need to tune the hyperparameters of every model to obtain the
preferred performance. We additionally need to make positive there is the right
stability between overall performance and generalizability. We do no longer
desire the model to study the data and operate poorly on new data.
13. 6. Model Evaluation: Here the model is evaluated for checking if it is geared up
to be deployed. The model is examined on an unseen data, evaluated on a
cautiously thought out set of assessment metrics. We additionally need to make
positive that the model conforms to reality. If we do not acquire a quality end
result in the evaluation, we have to re-iterate the complete modelling procedure
until the preferred stage of metrics is achieved. Any data science solution, a
machine learning model, simply like a human, must evolve, must be capable to
enhance itself with new data, adapt to a new evaluation metric. We can construct
more than one model for a certain phenomenon, however, a lot of them may
additionally be imperfect. The model assessment helps us select and construct an
ideal model.
14. 7. Model Deployment: The model after a rigorous assessment is at the end
deployed in the preferred structure and channel. This is the last step in the data
science life cycle. Each step in the data science life cycle defined above must be
laboured upon carefully. If any step is performed improperly, and hence, have an
effect on the subsequent step and the complete effort goes to waste. For example,
if data is no longer accumulated properly, you’ll lose records and you will no
longer be constructing an ideal model. If information is not cleaned properly, the
model will no longer work. If the model is not evaluated properly, it will fail in
the actual world. Right from Business perception to model deployment, every
step has to be given appropriate attention, time, and effort.
What is Data?
Data is an extremely important factor when it comes to gaining insights about a specific
topic, study, research, or even people. This is why it is regarded as a vital component of
all the systems that make up our world today.

In fact, data offers a broad range of applications and uses in the modern age. So
whether or not you’re considering digital transformation, data collection is an aspect that
you should never brush off, especially if you want to get insights, make forecasts, and
manage your operations in a way that creates significant value.

However, many people still gravitate towards confusion when they come to terms with
the idea of data collection.

Let us understand:

 What is data collection?

 Why collecting and acquiring data can be beneficial for your business
 What are the different methods of collecting data
 Modern tools for data collection

What is Data Collection?

Data collection is defined as a systematic method of obtaining, observing, measuring,
and analyzing accurate information to support research conducted by groups of
professionals regardless of the field where they belong.

While techniques and goals may vary per field, the general data collection methods
used in the process are essentially the same. In other words, there are specific
standards that need to be strictly followed and implemented to make sure that data is
collected accurately.

Not to mention, if the appropriate procedures are not given importance, a variety of
problems might arise and impact the study or research being conducted.

The most common risk is the inability to identify answers and draw correct conclusions
for the study, as well as failure to validate if the results are correct. These risks may also
result in questionable research, which can greatly affect your credibility.

So before you start collecting data, you have to rethink and review all of your research
goals. Start by creating a checklist of your objectives. Here are some important
questions to take into account:

 What is the goal of your research?

 What type of data are you collecting?
 What data collection methods and procedures will you utilize to acquire, store,
and process the data you’ve gathered?

Take note that bad data can never be useful. This is why you have to ensure that you
only collect high-quality ones. But to help you gain more confidence when it comes to
collecting the data you need for your research, let’s go through each question presented
above.

What is the Goal of your Research?

Identifying exactly what you want to achieve in your research can significantly help you
collect the most relevant data you need. Besides, clear goals always provide clarity to
what you are trying to accomplish. With clear objectives, you can easily identify what
you need and determine what’s most useful to your research.

What Type of Data are you Collecting?

Data can be divided into two major categories: qualitative data and quantitative data.
Qualitative data is the classification given to a set of data that refers to immeasurable
attributes. Quantitative data, on the other hand, can be measured using numbers.
Based on the goal of your research, you can either collect qualitative data or
quantitative data; or a combination of both.

What Data Collection Methods will you use?

There are specific types of data collection methods that can be used to acquire, store,
and process the data. If you’re not familiar with any of these methods, keep reading as
we will tackle each of them in the latter part of this article. But to give you a quick
overview, here are some of the most common data collection methods that you can
utilize:

 Experiment
 Survey
 Observation
 Ethnography
 Secondary data collection
 Archival research
 Interview/focus group

Note: We will discuss these methods more in the Data Collection Methods + Examples
section of this article.

Benefits of Collecting Data

Regardless of the field, data collection offers heaps of benefits. To help you become
attuned to these advantages, we’ve listed some of the most notable ones below:
1. Collecting good data is extremely helpful when it comes to identifying and verifying
various problems, perceptions, theories, and other factors that can impact your
business.
2. It allows you to focus your time and attention on the most important aspects of your
business.
3. It helps you understand your customers better. Collecting data allows your company to
truly understand what your consumers expect from you, the unique products or services
they desire, and how they want to connect with your brand as a whole.
4. Collecting data allows you to study and analyze trends better.
5. Data collection enables you to make more effective decisions and come up with
solutions to common industry problems.
6. It allows you to resolve problems and improve your products or services based on data
collected.
7. Accurate data collection can help build trust, establish productive and professional
discussions, and win the support of important decision-makers and investors.
8. When engaging with key decision-makers, collecting, monitoring, and assessing data on
a regular basis may offer businesses reliable, relevant information.
9. Collecting relevant data can positively influence your marketing campaigns, which can
help you develop new strategies in the future.
10. Data collection enables you to satisfy customer expectations for personalized messages
and recommendations.

These are just a few of the many benefits of data collection in general. In fact, there are
still a lot of advantages when it comes to collecting consumer data that you can benefit
from.
Introduction – Importance of Data
“Data is the new oil.” Today data is everywhere in every field. Whether you are a data scientist,
marketer, businessman, data analyst, researcher, or you are in any other profession, you need to
play or experiment with raw or structured data. This data is so important for us that it becomes
important to handle and store it properly, without any error. While working on these data, it is
important to know the types of data to process them and get the right results. There are two
types of data: Qualitative and Quantitative data, which are further classified into:

The data is classified into four categories:

 Nominal data.
 Ordinal data.
 Discrete data.
 Continuous data.
So there are 4 Types of Data: Nominal, Ordinal, Discrete, and Continuous.

Now business runs on data, and most companies use data for their insights to create and launch
campaigns, design strategies, launch products and services or try out different things. According
to a report, today, at least 2.5 quintillion bytes of data are produced per day.

Types of Data
Qualitative or Categorical Data
Qualitative or Categorical Data is data that can’t be measured or counted in the form of numbers.
These types of data are sorted by category, not by number. That’s why it is also known as
Categorical Data. These data consist of audio, images, symbols, or text. The gender of a person,
i.e., male, female, or others, is qualitative data.

Qualitative data tells about the perception of people. This data helps market researchers
understand the customers’ tastes and then design their ideas and strategies accordingly.

The other examples of qualitative data are :

 What language do you speak
 Favorite holiday destination
 Opinion on something (agree, disagree, or neutral)
 Colors
The Qualitative data are further classified into two parts :

Nominal Data
Nominal Data is used to label variables without any order or quantitative value. The color of hair
can be considered nominal data, as one color can’t be compared with another color.

The name “nominal” comes from the Latin name “nomen,” which means “name.” With the help
of nominal data, we can’t do any numerical tasks or can’t give any order to sort the data. These
data don’t have any meaningful order; their values are distributed into distinct categories.

Examples of Nominal Data :

 Colour of hair (Blonde, red, Brown, Black, etc.)

 Marital status (Single, Widowed, Married)
 Nationality (Indian, German, American)
 Gender (Male, Female, Others)
 Eye Color (Black, Brown, etc.)
Ordinal Data
Ordinal data have natural ordering where a number is present in some kind of order by their
position on the scale. These data are used for observation like customer satisfaction, happiness,
etc., but we can’t do any arithmetical tasks on them.

Ordinal data is qualitative data for which their values have some kind of relative position. These
kinds of data can be considered “in-between” qualitative and quantitative data. The ordinal data
only shows the sequences and cannot use for statistical analysis. Compared to nominal data,
ordinal data have some kind of order that is not present in nominal data.

Examples of Ordinal Data :

 When companies ask for feedback, experience, or satisfaction on a scale of 1 to 10

 Letter grades in the exam (A, B, C, D, etc.)
 Ranking of people in a competition (First, Second, Third, etc.)
 Economic Status (High, Medium, and Low)
 Education Level (Higher, Secondary, Primary)
Difference between Nominal and Ordinal Data

Nominal Data Ordinal Data

Nominal data can’t be quantified, neither they Ordinal data gives some kind of sequential order by their
have any intrinsic ordering position on the scale

Nominal data is qualitative data or categorical Ordinal data is said to be “in-between” qualitative data and
data quantitative data

They don’t provide any quantitative value,

They provide sequence and can assign numbers to ordinal
neither can we perform any arithmetical
data but cannot perform the arithmetical operation
operation

Nominal data cannot be used to compare with Ordinal data can help to compare one item with another
one another by ranking or ordering

Examples: Eye color, housing style, gender, hair Examples: Economic status, customer satisfaction,
color, religion, marital status, ethnicity, etc education level, letter grades, etc

Quick Check – Introduction to Data Science

Quantitative Data
Quantitative data can be expressed in numerical values, making it countable and including
statistical data analysis. These kinds of data are also known as Numerical data. It answers the
questions like “how much,” “how many,” and “how often.” For example, the price of a phone,
the computer’s ram, the height or weight of a person, etc., falls under quantitative data.

Quantitative data can be used for statistical manipulation. These data can be represented on a
wide variety of graphs and charts, such as bar graphs, histograms, scatter plots, boxplots, pie
charts, line graphs, etc.
Examples of Quantitative Data :

 Height or weight of a person or object

 Room Temperature
 Scores and Marks (Ex: 59, 80, 60, etc.)
 Time
The Quantitative data are further classified into two parts :

Discrete Data
The term discrete means distinct or separate. The discrete data contain the values that fall under
integers or whole numbers. The total number of students in a class is an example of discrete data.
These data can’t be broken into decimal or fraction values.The discrete data are countable and
have finite values; their subdivision is not possible. These data are represented mainly by a bar
graph, number line, or frequency table.

Examples of Discrete Data :

 Total numbers of students present in a class

 Cost of a cell phone
 Numbers of employees in a company
 The total number of players who participated in a competition
 Days in a week
Continuous Data
Continuous data are in the form of fractional numbers. It can be the version of an android phone,
the height of a person, the length of an object, etc. Continuous data represents information that
can be divided into smaller levels. The continuous variable can take any value within a range.

The key difference between discrete and continuous data is that discrete data contains the integer
or whole number. Still, continuous data stores the fractional numbers to record different types of
data such as temperature, height, width, time, speed, etc.

Examples of Continuous Data :

 Height of a person
 Speed of a vehicle
 “Time-taken” to finish the work
 Wi-Fi Frequency
 Market share price
Difference between Discrete and Continuous Data
Discrete Data Continuous Data

Discrete data are countable and finite; they Continuous data are measurable; they are in the
are whole numbers or integers form of fractions or decimal

Discrete data are represented mainly by bar Continuous data are represented in the form of a
graphs histogram

The values cannot be divided into The values can be divided into subdivisions into
subdivisions into smaller pieces smaller pieces

Discrete data have spaces between the Continuous data are in the form of a continuous
values sequence

Examples: Total students in a class, number Example: Temperature of room, the weight of a
of days in a week, size of a shoe, etc person, length of an object, etc

Conclusion
In this article, we have discussed the data types and their differences. Working on data is crucial
because we need to figure out what kind of data it is and how to use it to get valuable output out
of it. It is also important to know what kind of plot is suitable for which data category; it helps in
data analysis and visualization. Working with data requires good data science skills and a deep
understanding of different types of data and how to work with them.

Different types of data are used in research, analysis, statistical analysis, data visualization, and
data science. This data helps a company analyze its business, design its strategies, and help build
a successful data-driven decision-making process. If these data-driven topics got you interested
in pursuing professional courses or a career in the field of Data Science. Log on to our website
and explore courses delivered by industry experts.

Forms of Data

Data can be categorized into two groups:

Structured data

Unstructured data

Unstructured Data

Unstructured data is not organized. We must organize the data for analysis purposes.

Structured Data

Structured data is organized and easier to work with.

How to Structure Data?

We can use an array or a database table to structure or present data.

Example of an array:

[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]

The following example shows how to create an array in Python:

Example

Array = [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]

print(Array)

It is common to work with very large data sets in Data Science.

Applications of Data Science
Data Science is the deep study of a large quantity of data, which involves extracting
some meaning from the raw, structured, and unstructured data. Extracting
meaningful data from large amounts use processing of data and this processing can
be done using statistical techniques and algorithm, scientific techniques, different
technologies, etc. It uses various tools and techniques to extract meaningful data
from raw data. Data Science is also known as the Future of Artificial Intelligence.
For Example, Jagroop loves books to read but every time he wants to buy some
books he is always confused about which book he should buy as there are plenty of
choices in front of him. This Data Science Technique will be useful. When he opens
Amazon he will get product recommendations on the basis of his previous data.
When he chooses one of them he also gets a recommendation to buy these books
with this one as this set is mostly bought. So all Recommendations of Products and
Showing sets of books purchased collectively is one of the examples of Data Science.

Real-world Applications of Data Science

1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we
want to search for something on the internet, we mostly use Search engines like
Google, Yahoo, Safari, Firefox, etc. So Data Science is used to get Searches faster.
For Example, When we search for something suppose “Data Structure and
algorithm courses ” then at that time on Internet Explorer we get the first link of
GeeksforGeeks Courses. This happens because the GeeksforGeeks website is visited
most in order to get information regarding Data Structure courses and Computer
related subjects. So this analysis is done using Data Science, and we get the Topmost
visited Web Links.
2. In Transport
Data Science is also entered in real-time such as the Transport field like Driverless
Cars. With the help of Driverless Cars, it is easy to reduce the number of Accidents.
For Example, In Driverless Cars the training data is fed into the algorithm and with
the help of Data Science techniques, the Data is analyzed like what as the speed limit
in highways, Busy Streets, Narrow Roads, etc. And how to handle different situations
while driving etc.
3. In Finance
Data Science plays a key role in Financial Industries. Financial Industries always
have an issue of fraud and risk of losses. Thus, Financial Industries needs to
automate risk of loss analysis in order to carry out strategic decisions for the
company. Also, Financial Industries uses Data Science Analytics tools in order to
predict the future. It allows the companies to predict customer lifetime value and
their stock market moves.
For Example, In Stock Market, Data Science is the main part. In the Stock Market,
Data Science is used to examine past behavior with past data and their goal is to
examine the future outcome. Data is analyzed in such a way that it makes it possible
to predict future stock prices over a set timetable.
4. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better
user experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get
suggestions similar to choices according to our past data and also we get
recommendations according to most buy the product, most rated, most searched,
etc. This is all done with the help of Data Science.
5. In Health Care
In the Healthcare Industry data science act as a boon. Data Science is used for:
 Detecting Tumor.
 Drug discoveries.
 Medical Image Analysis.
 Virtual Medical Bots.
 Genetics and Genomics.
 Predictive Modeling for Diagnosis etc.
6. Image Recognition
Currently, Data Science is also used in Image Recognition. For Example, When we
upload our image with our friend on Facebook, Facebook gives suggestions Tagging
who is in the picture. This is done with the help of machine learning and Data
Science. When an Image is Recognized, the data analysis is done on one’s Facebook
friends and after analysis, if the faces which are present in the picture matched with
someone else profile then Facebook suggests us auto-tagging.
7. Targeting Recommendation
Targeting Recommendation is the most important application of Data Science.
Whatever the user searches on the Internet, he/she will see numerous posts
everywhere. This can be explained properly with an example: Suppose I want a
mobile phone, so I just Google search it and after that, I changed my mind to buy
offline. In Real -World Data Science helps those companies who are paying for
Advertisements for their mobile. So everywhere on the internet in the social media,
in the websites, in the apps everywhere I will see the recommendation of that
mobile phone which I searched for. So this will force me to buy online.
8. Airline Routing Planning
With the help of Data Science, Airline Sector is also growing like with the help of it, it
becomes easy to predict flight delays. It also helps to decide whether to directly land
into the destination or take a halt in between like a flight can have a direct route
from Delhi to the U.S.A or it can halt in between after that reach at the destination.
9. Data Science in Gaming
In most of the games where a user will play with an opponent i.e. a Computer
Opponent, data science concepts are used with machine learning where with the
help of past data the Computer will improve its performance. There are many games
like Chess, EA Sports, etc. will use Data Science concepts.
10. Medicine and Drug Development
The process of creating medicine is very difficult and time-consuming and has to be
done with full disciplined because it is a matter of Someone’s life. Without Data
Science, it takes lots of time, resources, and finance or developing new Medicine or
drug but with the help of Data Science, it becomes easy because the prediction of
success rate can be easily determined based on biological data or factors. The
algorithms based on data science will forecast how this will react to the human body
without lab experiments.
11. In Delivery Logistics
Various Logistics companies like DHL, FedEx, etc. make use of Data Science. Data
Science helps these companies to find the best route for the Shipment of their
Products, the best time suited for delivery, the best mode of transport to reach the
destination, etc.
12. Autocomplete
AutoComplete feature is an important part of Data Science where the user will get
the facility to just type a few letters or words, and he will get the feature of auto-
completing the line. In Google Mail, when we are writing formal mail to someone so
at that time data science concept of Autocomplete feature is used where he/she is
an efficient choice to auto-complete the whole line. Also in Search Engines in social
media, in various apps, AutoComplete feature is widely used.

Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
No ratings yet
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
40 pages
Python Data Science Handbook Python Data Science Handbook
0% (1)
Python Data Science Handbook Python Data Science Handbook
5 pages
OpenTxt VIM Tables
No ratings yet
OpenTxt VIM Tables
20 pages
Create A Login and Registration Form in Android Using SQLite Database
100% (2)
Create A Login and Registration Form in Android Using SQLite Database
65 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
43 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
166 pages
Data Science and Big Data Analytics Unit 1 Notes
No ratings yet
Data Science and Big Data Analytics Unit 1 Notes
13 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
Iit Roorkee Full Stack Software Dev
No ratings yet
Iit Roorkee Full Stack Software Dev
17 pages
Unit 1 Data Science and Big Data
No ratings yet
Unit 1 Data Science and Big Data
23 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Testing Interview Questions
No ratings yet
Testing Interview Questions
25 pages
Unit 1 - Big Data Analytics - CCS334
No ratings yet
Unit 1 - Big Data Analytics - CCS334
35 pages
MTA Exam 98-361 Software Development
No ratings yet
MTA Exam 98-361 Software Development
5 pages
RedBook DB2 sg246905
No ratings yet
RedBook DB2 sg246905
366 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
Unit 2
No ratings yet
Unit 2
35 pages
Project Report
No ratings yet
Project Report
29 pages
FUNDAMENTALS OF BIG DATA ANALYTICS Digital Notes
No ratings yet
FUNDAMENTALS OF BIG DATA ANALYTICS Digital Notes
121 pages
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
No ratings yet
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
130 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Power BI and SSAS Tabular Interview Template: Data Modeling
No ratings yet
Power BI and SSAS Tabular Interview Template: Data Modeling
16 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Data, Big
No ratings yet
Data, Big
90 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
29 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
Unit-1 Final Sgs
No ratings yet
Unit-1 Final Sgs
24 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Unit 5 Da
No ratings yet
Unit 5 Da
41 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Unit 1
No ratings yet
Unit 1
55 pages
Business Requirements Document (BRD) Template: Tech Comm Templates
No ratings yet
Business Requirements Document (BRD) Template: Tech Comm Templates
9 pages
ETB 1 (Big Data)
No ratings yet
ETB 1 (Big Data)
28 pages
UNIT I BIG DATA Extra Content
No ratings yet
UNIT I BIG DATA Extra Content
15 pages
Applets: Unit - V
No ratings yet
Applets: Unit - V
19 pages
Data Science Notes
No ratings yet
Data Science Notes
56 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
38 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
31 pages
CTR 3 - Statistical Analysis With Software Application: I. Topic II. Learning Objectives/ Outcomes
No ratings yet
CTR 3 - Statistical Analysis With Software Application: I. Topic II. Learning Objectives/ Outcomes
10 pages
BIG DATA INTRODUCTION Hadoop
No ratings yet
BIG DATA INTRODUCTION Hadoop
24 pages
Data Science Unit-I
No ratings yet
Data Science Unit-I
13 pages
Big Data in Business
No ratings yet
Big Data in Business
13 pages
SAP Community Network Wiki - Enterprise Performance Management - Top 15 BW Transactions Useful For BPC NW
No ratings yet
SAP Community Network Wiki - Enterprise Performance Management - Top 15 BW Transactions Useful For BPC NW
6 pages
Big Data Analytics
No ratings yet
Big Data Analytics
10 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Access - Chapter 4 - Working With Forms and Reports
No ratings yet
Access - Chapter 4 - Working With Forms and Reports
4 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Building A Real-Time User Action Counti... - Pinterest Engineering Blog - Medium
No ratings yet
Building A Real-Time User Action Counti... - Pinterest Engineering Blog - Medium
13 pages
Unit I LM
No ratings yet
Unit I LM
12 pages
DBMS-Question Bank
No ratings yet
DBMS-Question Bank
12 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
4 pages
Python
No ratings yet
Python
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
4 pages
OBIEE Life Cycle
No ratings yet
OBIEE Life Cycle
6 pages
File 1
No ratings yet
File 1
3 pages
Introduction Part
No ratings yet
Introduction Part
5 pages
Roblox Operations Platform - People Schema
No ratings yet
Roblox Operations Platform - People Schema
8 pages
Juno Li
No ratings yet
Juno Li
5 pages
What Is Big Data
No ratings yet
What Is Big Data
4 pages
AWS For Beginners - Amazon AWS Course Content
No ratings yet
AWS For Beginners - Amazon AWS Course Content
4 pages
Resume Anitha
No ratings yet
Resume Anitha
3 pages
The ARFF Header Section
No ratings yet
The ARFF Header Section
4 pages
MB 107 Information Technology For Management: Understand Use Develop Analyze
No ratings yet
MB 107 Information Technology For Management: Understand Use Develop Analyze
2 pages
A Document Repository Can Provide Several Benefits To Our Work Especially at This Trying Times
No ratings yet
A Document Repository Can Provide Several Benefits To Our Work Especially at This Trying Times
2 pages
Workshop
No ratings yet
Workshop
2 pages
VELEZ Reflection
No ratings yet
VELEZ Reflection
1 page
Correction File
No ratings yet
Correction File
1 page
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

FDS - Unit-I - Notes

Uploaded by

FDS - Unit-I - Notes

Uploaded by

Introduction to Big Data

 Volume: Big Data contains an undecided and unfiltered volume of information.

Understanding the Benefits of Big Data

Simplify your Data Analysis with Hevo’s No-code Data Pipeline

Check out some of the cool features of Hevo:

What is Data Science?

Data Science is about data gathering, analysis and decision-making.

By using Data Science, companies are able to make:

Better decisions (should we choose A or B)

Predictive analysis (what will happen next?)

Pattern discoveries (find pattern, or maybe hidden information in the data)

Where is Data Science Needed?

Let us understand what is the need for Data Science?

Examples of where Data Science is needed:

For route planning: To discover the best routes to ship

To foresee delays for flight/ship/train etc. (through predictive analysis)

To create promotional offers

To find the best suited time to deliver goods

To forecast the next years revenue for a company

To analyze health benefit of training

To predict who will win elections

Here is how a Data Scientist works:

Ask the right questions - To understand the business problem.

Data Science - What is Data?

How to Structure Data?

Data Science Lifecycle

 What is data collection?

What is Data Collection?

 What is the goal of your research?

What is the Goal of your Research?

What Type of Data are you Collecting?

What Data Collection Methods will you use?

Benefits of Collecting Data

The data is classified into four categories:

The other examples of qualitative data are :

Examples of Nominal Data :

 Colour of hair (Blonde, red, Brown, Black, etc.)

Examples of Ordinal Data :

 When companies ask for feedback, experience, or satisfaction on a scale of 1 to 10

Nominal Data Ordinal Data

They don’t provide any quantitative value,

Quick Check – Introduction to Data Science

 Height or weight of a person or object

Examples of Discrete Data :

 Total numbers of students present in a class

Examples of Continuous Data :

Data can be categorized into two groups:

Structured data is organized and easier to work with.

How to Structure Data?

We can use an array or a database table to structure or present data.

The following example shows how to create an array in Python:

It is common to work with very large data sets in Data Science.

Real-world Applications of Data Science

You might also like