0% found this document useful (0 votes)
42 views24 pages

Introduction To Data Science

Data Science is an interdisciplinary field focused on extracting insights from data using scientific methods and algorithms, while Machine Learning is a subset of AI that enables systems to learn from data autonomously. The Data Science process includes steps such as discovery, preparation, model planning, building, operationalizing, and communicating results, with various job roles like Data Scientist and Data Analyst. Key applications of Data Science include internet search, recommendation systems, and image recognition, while challenges include data variety and talent shortages.

Uploaded by

Vrushali Barhate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views24 pages

Introduction To Data Science

Data Science is an interdisciplinary field focused on extracting insights from data using scientific methods and algorithms, while Machine Learning is a subset of AI that enables systems to learn from data autonomously. The Data Science process includes steps such as discovery, preparation, model planning, building, operationalizing, and communicating results, with various job roles like Data Scientist and Data Analyst. Key applications of Data Science include internet search, recommendation systems, and image recognition, while challenges include data variety and talent shortages.

Uploaded by

Vrushali Barhate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Science Tutorial for Beginners:

What is, Basics & Process


What is Data Science?
Data Science is the area of study which involves extracting insights from
vast amounts of data by the use of various scientific methods, algorithms,
and processes. It helps you to discover hidden patterns from the raw data.
The term Data Science has emerged because of the evolution of
mathematical statistics, data analysis, and big data.

Data Science is an interdisciplinary field that allows you to extract


knowledge from structured or unstructured data. Data science enables you to
translate a business problem into a research project and then translate it
back into a practical solution.

In this Data Science Tutorial for Beginners, you will learn Data Science basics:

●​ What is Data Science?


●​ Why Data Science?
●​ Data Science Components
●​ Data Science Process
●​ Data Science Jobs Roles
●​ Tools for DataScience
●​ Difference Between Data Science with BI (Business Intelligence)
●​ Applications of Data Science
●​ Challenges of Data Science Technology

Why Data Science?


Here, are significant advantages of using Data Analytics Technology:

●​ Data is the oil for today's world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive business
advantage
●​ Data Science can help you to detect fraud using advanced machine
learning algorithms
●​ It helps you to prevent any significant monetary losses
●​ Allows to build intelligence ability in machines
●​ You can perform sentiment analysis to gauge customer brand loyalty
●​ It enables you to take better and faster decisions
●​ Helps you to recommend the right product to the right customer to
enhance your business

Evolution of DataSciences

Data Science Components


Statistics:

Statistics is the most critical unit of Data Science basics. It is the method or
science of collecting and analyzing numerical data in large quantities to get
useful insights.

Visualization:

Visualization technique helps you to access huge amounts of data in easy to


understand and digestible visuals.

Machine Learning:

Machine Learning explores the building and study of algorithms which learn
to make predictions about unforeseen/future data.

Deep Learning:

Deep Learning method is new machine learning research where the algorithm
selects the analysis model to follow.

Data Science Process


Now in this Data Science Tutorial, we will learn the Data Science Process:

1. Discovery:

Discovery step involves acquiring data from all the identified internal &
external sources which helps you to answer the business question.

The data can be:

●​ Logs from webservers


●​ Data gathered from social media
●​ Census datasets
●​ Data streamed from online sources using APIs

2. Preparation:

Data can have lots of inconsistencies like missing value, blank columns,
incorrect data format which needs to be cleaned. You need to process,
explore, and condition data before modeling. The cleaner your data, the
better are your predictions.

3. Model Planning:
In this stage, you need to determine the method and technique to draw the
relation between input variables. Planning for a model is performed by using
different statistical formulas and visualization tools. SQL analysis services, R,
and SAS/access are some of the tools used for this purpose.

4. Model Building:

In this step, the actual model building process starts. Here, Data scientist
distributes datasets for training and testing. Techniques like association,
classification, and clustering are applied to the training data set. The model
once prepared is tested against the "testing" dataset.

5. Operationalize:

In this stage, you deliver the final baselined model with reports, code, and
technical documents. Model is deployed into a real-time production
environment after thorough testing.

6. Communicate Results

In this stage, the key findings are communicated to all stakeholders. This
helps you to decide if the results of the project are a success or a failure
based on the inputs from the model.

Data Science Jobs Roles


Most prominent Data Science job titles are:

●​ Data Scientist
●​ Data Engineer
●​ Data Analyst
●​ Statistician
●​ Data Architect
●​ Data Admin
●​ Business Analyst
●​ Data/Analytics Manager

Now in this Data Science Tutorial, let's learn what each role entails in detail:

Data Scientist:
Role:

A Data Scientist is a professional who manages enormous amounts of data


to come up with compelling business visions by using various tools,
techniques, methodologies, algorithms, etc.

Languages:

R, SAS, Python, SQL, Hive, Matlab, Pig, Spark

Data Engineer:

Role:

The role of data engineer is of working with large amounts of data. He


develops, constructs, tests, and maintains architectures like large scale
processing system and databases.

Languages:

SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +, and Perl

Data Analyst:

Role:

A data analyst is responsible for mining vast amounts of data. He or she will
look for relationships, patterns, trends in data. Later he or she will deliver
compelling reporting and visualization for analyzing the data to take the most
viable business decisions.

Languages:

R, Python, HTML, JS, C, C+ + , SQL

Statistician:

Role:

The statistician collects, analyses, understand qualitative and quantitative


data by using statistical theories and methods.
Languages:

SQL, R, Matlab, Tableau, Python, Perl, Spark, and Hive

Data Administrator:

Role:

Data admin should ensure that the database is accessible to all relevant
users. He also makes sure that it is performing correctly and is being kept
safe from hacking.

Languages:

Ruby on Rails, SQL, Java, C#, and Python

Business Analyst:

Role:

This professional need to improve business processes. He/she as an


intermediary between the business executive team and IT department.

Languages:

SQL, Tableau, Power BI and, Python

Tools for DataScience


Data Analysis Data warehousing Data Visualization Machine Learning
R, Spark, Python and SAS Hadoop, SQL, Hive R, Tableau, Raw Spark, Azure ML studio, Mahout

Difference Between Data Science with BI


(Business Intelligence)
Parameters Business Intelligence Data Science
Perception Looking Backward Looking Forward
Data Structured Data. Mostly SQL, but some Structured and Unstructured data. Like logs,
Sources time Data Warehouse) SQL, NoSQL, or text
Approach Statistics & Visualization Statistics, Machine Learning, and Graph
Emphasis Past & Present Analysis & Neuro-linguistic Programming
Tools Pentaho. Microsoft BI, QlikView, R, TensorFlow

Applications of Data Science


Now in this Data Science Tutorial, we will learn about Applications of Data
Science:

Internet Search:

Google search use Data science technology to search a specific result within
a fraction of a second

Recommendation Systems:

To create a recommendation system. Example, "suggested friends" on


Facebook or suggested videos" on YouTube, everything is done with the help
of Data Science.

Image & Speech Recognition:

Speech recognizes system like Siri, Google assistant, Alexa runs on the
technique of Data science. Moreover, Facebook recognizes your friend when
you upload a photo with them, with the help of Data Science.

Gaming world:
EA Sports, Sony, Nintendo, are using Data science technology. This enhances
your gaming experience. Games are now developed using Machine Learning
technique. It can update itself when you move to higher levels.

Online Price Comparison:

PriceRunner, Junglee, Shopzilla work on the Data science mechanism. Here,


data is fetched from the relevant websites using APIs.

Challenges of Data science Technology


●​ High variety of information & data is required for accurate analysis
●​ Not adequate data science talent pool available
●​ Management does not provide financial support for a data science team
●​ Unavailability of/difficult access to data
●​ Data Science results not effectively used by business decision makers
●​ Explaining data science to others is difficult
●​ Privacy issues
●​ Lack of significant domain expert
●​ If an organization is very small, they can't have a Data Science team

Summary
●​ Data Science is the area of study which involves extracting insights from
vast amounts of data by the use of various scientific methods,
algorithms, and processes.
●​ Statistics, Visualization, Deep Learning, Machine Learning, are
important Data Science concepts.
●​ Data Science Process goes through Discovery, Data Preparation, Model
Planning, Model Building, Operationalize, Communicate Results.
●​ Important Data Scientist job roles are: 1) Data Scientist 2) Data
Engineer 3) Data Analyst 4) Statistician 5) Data Architect 6) Data Admin
7) Business Analyst 8) Data/Analytics Manager
●​ R, SQL, Python, SaS, are essential Data science tools
●​ The predictions of Business Intelligence is looking backward while for
Data Science it is looking forward.
●​ Important applications of Data science are 1) Internet Search 2)
Recommendation Systems 3) Image & Speech Recognition 4) Gaming
world 5) Online Price Comparison.
●​ High variety of information & data is the biggest challenge of Data
Science technology.

Data Science vs Machine Learning: Must


Know Differences!
In this tutorial of difference between Data Science and Machine Learning, Let
us first learn:

What is Data Science?


Data Science is the area of study which involves extracting insights from vast
amounts of data by the use of various scientific methods, algorithms, and
processes. It helps you to discover hidden patterns from the raw data.

Data Science is an interdisciplinary field that allows you to extract knowledge


from structured or unstructured data. This technology enables you to translate
a business problem into a research project and then translate it back into a
practical solution. The term Data Science has emerged because of the
evolution of mathematical statistics, data analysis, and big data.
What is Data Science?

In this Data Science vs Machine Learning tutorial, you will learn:

●​ What is Data Science?


●​ What is Machine Learning?
●​ Roles and Responsibilities of a Data Scientist
●​ Role and Responsibilities of Machine Learning Engineers
●​ Difference Between Data Science and Machine Learning
●​ Challenges of Data Science Technology
●​ Challenges of Machine Learning
●​ Applications of Data Science
●​ Applications of Machine Learning
●​ Data Science or Machine Learning – Which is Better?

What is Machine Learning?


Machine Learning is a system that can learn from data through
self-improvement and without logic being explicitly coded by the programmer.
The breakthrough comes with the idea that a machine can singularly learn
from the example (i.e., data) to produce accurate results.

Machine learning combines data with statistical tools to predict an output. This
output is then used by corporate to makes actionable insights. Machine
learning is closely related to data mining and Bayesian predictive modeling.
The Machine receives data as input, uses an algorithm to formulate answers.

What is Machine Learning?

Check the following key differences between Machine Learning vs Data


Science.

KEY DIFFERENCE
●​ Data Science extracts insights from vast amounts of data by the use of
various scientific methods, algorithms, and processes On the other
hand, Machine Learning is a system that can learn from data through
self-improvement and without logic being explicitly coded by the
programmer.
●​ Data science can work with manual methods, though they are not very
useful while Machine learning algorithms hard to implement manually.
●​ Data science is not a subset of Artificial Intelligence (AI) while Machine
learning technology is a subset of Artificial Intelligence (AI).
●​ Data science technique helps you to create insights from data dealing
with all real-world complexities while Machine learning method helps
you to predict and the outcome for new database values.

Roles and Responsibilities of a Data Scientist


Here, are an important skill required to become Data Scientist

●​ Knowledge about unstructured data management


●​ Hands-on experience in SQL database coding
●​ Able to understand multiple analytical functions
●​ Data mining used for Processing, cleansing, and verifying the integrity
of data used for analysis
●​ Obtain data and recognize the strength
●​ Work with professional DevOps consultants to help customers
operationalize models

Role and Responsibilities of Machine Learning


Engineers
Here, are an important skill required to become Machine learning Engineers

●​ Knowledge of data evolution and statistical modelling


●​ Understanding and application of algorithms
●​ Natural language processing
●​ Data architecture design
●​ Text representation techniques
●​ In-depth knowledge of programming skills
●​ Knowledge of probability and statistics
●​ Design machine learning systems and knowledge of deep learning
technology
●​ Implement appropriate machine learning algorithms and tools

Difference Between Data Science and Machine


Learning
Here are the major differences between Data Science vs Machine learning:

Data Science vs Machine Learning

Data science Machine Learning

Data science is an interdisciplinary field that uses Machine learning is the scientific study of
scientific methods, algorithms, and systems to algorithms and statistical models. This method
extract knowledge from many structural and uses to perform a specific task.
unstructured data.

Data science technique helps you to create Machine learning method helps you to predict
insights from data dealing with all real-world and the outcome for new databases from
complexities. historical data with the help of mathematical
models.

Nearly all of the input data is generated in a Input data for Machine learning will be
human-readable format, which is read or transformed, especially for algorithms used.
analyzed by humans.
Data science can work with manual methods as Machine learning algorithms hard to implement
well, though they are not very useful. manually.

Data science is a complete process. Machine learning is a single step in the entire
data science process.

Data science is not a subset of Artificial Machine learning technology is a subset of


Intelligence (AI). Artificial Intelligence (AI).

In Data Science, high RAM and SSD used, which In Machine Learning, GPUs are used for
helps you to overcome I/O bottleneck problems. intensive vector operations.

Challenges of Data Science Technology


Here, are important challenges of Data Science Technology

●​ The wide variety of information & data is needed for accurate analysis
●​ Not adequate data science talent pool available
●​ Management does not provide financial support for a data science
team.
●​ Unavailability of/difficult access to data
●​ Data Science results not effectively used by business decision-makers
●​ Explaining data science to others is difficult
●​ Privacy issues
●​ Lack of significant domain expert
●​ If an organization is very small, it can't have a data science team.

Challenges of Machine Learning


Here, are primary challenges of Machine learning method:

●​ It lacks data or diversity in the dataset.


●​ Machine can't learn if there is no data available. Besides, a dataset with
a lack of diversity gives the Machine a hard time.
●​ A machine needs to have heterogeneity to learn meaningful insight.
●​ It is unlikely that an algorithm can extract information when there are no
or few variations.
●​ It is recommended to have at least 20 observations per group to help
the Machine learn.
●​ This constraint may lead to poor evaluation and prediction.

Applications of Data Science


Here, are the application of Data Science

Internet Search:

Google search uses data science technology to search a specific result within
a fraction of a second

Recommendation Systems:

To create a recommendation system. For example, "suggested friends" on


Facebook or suggested videos" on YouTube, everything is done with the help
of Data Science.

Image & Speech Recognition:

Speech recognizes systems like Siri, Google Assistant, Alexa runs on the
technique of data science. Moreover, Facebook recognizes your friend when
you upload a photo with them.

Gaming World:

EA Sports, Sony, Nintendo, are using data science technology. This enhances
your gaming experience. Games are now developed using machine learning
techniques. It can update itself when you move to higher levels.

Online Price Comparison:

PriceRunner, Junglee, Shopzilla work on the data science mechanism. Here,


data is fetched from the relevant websites using APIs.

Applications of Machine Learning


Here, are Application of Machine learning:

Automation:

Machine learning, which works entirely autonomously in any field without the
need for any human intervention. For example, robots performing the
essential process steps in manufacturing plants.

Finance Industry:

Machine learning is growing in popularity in the finance industry. Banks are


mainly using ML to find patterns inside the data but also to prevent fraud.

Government Organization:

The government makes use of ML to manage public safety and utilities. Take
the example of China with massive face recognition. The government uses
Artificial intelligence to prevent jaywalker.

Healthcare Industry:

Healthcare was one of the first industry to use machine learning with image
detection.

Data Science or Machine Learning – Which is


Better?
The machine learning method is ideal for analyzing, understanding, and
identifying a pattern in the data. You can use this model to train a machine to
automate tasks that would be exhaustive or impossible for a human being.
Moreover, machine learning can take decisions with minimal human
intervention.

On the other hand, data science can help you to detect fraud using advanced
machine learning algorithms. It also helps you to prevent any significant
monetary losses. It helps you to perform sentiment analysis to gauge
customer brand loyalty.
What is Data Analysis? Research | Types
| Methods | Techniques
What is Data Analysis?
Data analysis is defined as a process of cleaning, transforming, and
modeling data to discover useful information for business decision-making.
The purpose of Data Analysis is to extract useful information from data and
taking the decision based upon the data analysis.

A simple example of Data analysis is whenever we take any decision in our


day-to-day life is by thinking about what happened last time or what will
happen by choosing that particular decision. This is nothing but analyzing our
past or future and making decisions based on it. For that, we gather memories
of our past or dreams of our future. So that is nothing but data analysis. Now
same thing analyst does for business purposes, is called Data Analysis.

In this tutorial, you will learn:

●​ Why Data Analysis?


●​ Data Analysis Tools
●​ Types of Data Analysis: Techniques and Methods
●​ Data Analysis Process

Why Data Analysis?


To grow your business even to grow in your life, sometimes all you need to do
is Analysis!

If your business is not growing, then you have to look back and acknowledge
your mistakes and make a plan again without repeating those mistakes. And
even if your business is growing, then you have to look forward to making the
business to grow more. All you need to do is analyze your business data and
business processes.
Data Analysis Tools

Data Analysis Tools

Data analysis tools make it easier for users to process and manipulate data,
analyze the relationships and correlations between data sets, and it also helps
to identify patterns and trends for interpretation. Here is a complete list
of tools used for data analysis in research.

Types of Data Analysis: Techniques and Methods


There are several types of Data Analysis techniques that exist based on
business and technology. However, the major Data Analysis methods are:

●​ Text Analysis
●​ Statistical Analysis
●​ Diagnostic Analysis
●​ Predictive Analysis
●​ Prescriptive Analysis

Text Analysis
Text Analysis is also referred to as Data Mining. It is one of the methods of
data analysis to discover a pattern in large data sets using databases or data
mining tools. It used to transform raw data into business information. Business
Intelligence tools are present in the market which is used to take strategic
business decisions. Overall it offers a way to extract and examine data and
deriving patterns and finally interpretation of the data.

Statistical Analysis
Statistical Analysis shows "What happen?" by using past data in the form of
dashboards. Statistical Analysis includes collection, Analysis, interpretation,
presentation, and modeling of data. It analyses a set of data or a sample of
data. There are two categories of this type of Analysis - Descriptive Analysis
and Inferential Analysis.

Descriptive Analysis

Analyses complete data or a sample of summarized numerical data. It shows


mean and deviation for continuous data whereas percentage and frequency
for categorical data.

Inferential Analysis

analyses sample from complete data. In this type of Analysis, you can find
different conclusions from the same data by selecting different samples.

Diagnostic Analysis
Diagnostic Analysis shows "Why did it happen?" by finding the cause from the
insight found in Statistical Analysis. This Analysis is useful to identify behavior
patterns of data. If a new problem arrives in your business process, then you
can look into this Analysis to find similar patterns of that problem. And it may
have chances to use similar prescriptions for the new problems.

Predictive Analysis
Predictive Analysis shows "what is likely to happen" by using previous data.
The simplest data analysis example is like if last year I bought two dresses
based on my savings and if this year my salary is increasing double then I can
buy four dresses. But of course it's not easy like this because you have to
think about other circumstances like chances of prices of clothes is increased
this year or maybe instead of dresses you want to buy a new bike, or you
need to buy a house!

So here, this Analysis makes predictions about future outcomes based on


current or past data. Forecasting is just an estimate. Its accuracy is based on
how much detailed information you have and how much you dig in it.

Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to
determine which action to take in a current problem or decision. Most
data-driven companies are utilizing Prescriptive Analysis because predictive
and descriptive Analysis are not enough to improve data performance. Based
on current situations and problems, they analyze the data and make
decisions.

Data Analysis Process


The Data Analysis Process is nothing but gathering information by using a
proper application or tool which allows you to explore the data and find a
pattern in it. Based on that information and data, you can make decisions, or
you can get ultimate conclusions.

Data Analysis consists of the following phases:

●​ Data Requirement Gathering


●​ Data Collection
●​ Data Cleaning
●​ Data Analysis
●​ Data Interpretation
●​ Data Visualization

Data Requirement Gathering


First of all, you have to think about why do you want to do this data analysis?
All you need to find out the purpose or aim of doing the Analysis of data. You
have to decide which type of data analysis you wanted to do! In this phase,
you have to decide what to analyze and how to measure it, you have to
understand why you are investigating and what measures you have to use to
do this Analysis.

Data Collection
After requirement gathering, you will get a clear idea about what things you
have to measure and what should be your findings. Now it's time to collect
your data based on requirements. Once you collect your data, remember that
the collected data must be processed or organized for Analysis. As you
collected data from various sources, you must have to keep a log with a
collection date and source of the data.

Data Cleaning
Now whatever data is collected may not be useful or irrelevant to your aim of
Analysis, hence it should be cleaned. The data which is collected may contain
duplicate records, white spaces or errors. The data should be cleaned and
error free. This phase must be done before Analysis because based on data
cleaning, your output of Analysis will be closer to your expected outcome.

Data Analysis
Once the data is collected, cleaned, and processed, it is ready for Analysis.
As you manipulate data, you may find you have the exact information you
need, or you might need to collect more data. During this phase, you can
use data analysis tools and software which will help you to understand,
interpret, and derive conclusions based on the requirements.

Data Interpretation
After analyzing your data, it's finally time to interpret your results. You can
choose the way to express or communicate your data analysis either you can
use simply in words or maybe a table or chart. Then use the results of your
data analysis process to decide your best course of action.

Data Visualization
Data visualization is very common in your day to day life; they often appear in
the form of charts and graphs. In other words, data shown graphically so that
it will be easier for the human brain to understand and process it. Data
visualization often used to discover unknown facts and trends. By observing
relationships and comparing datasets, you can find a way to find out
meaningful information.

Summary:

●​ Data analysis means a process of cleaning, transforming and modeling


data to discover useful information for business decision-making
●​ Types of Data Analysis are Text, Statistical, Diagnostic, Predictive,
Prescriptive Analysis
●​ Data Analysis consists of Data Requirement Gathering, Data Collection,
Data Cleaning, Data Analysis, Data Interpretation, Data Visualization

Questions:

Q.1 What is Data Science? Discuss its components.

Q.2 Discuss the processes in Data Science?

Q. 3 Which roles are placed in data Science? Discuss the same.

Q.4 Differentiate between Data Science and Business Intelligence.

Q. 5 What are the challenges of Data science?

Q. 6 Discuss the Application of Data Science.

Q.7 What is Machine Learning? What is key difference between Data Science and Machine
Learning?

Q. 8 Differentiate between Data Science and Machine Learning?

Q. 9 What is Data Analysis? Discuss the different types of Data Analysis.

Q. 10 Discuss the different Process in the Data Analysis?

You might also like