Introduction To Data Science
Introduction To Data Science
In this Data Science Tutorial for Beginners, you will learn Data Science basics:
● Data is the oil for today's world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive business
advantage
● Data Science can help you to detect fraud using advanced machine
learning algorithms
● It helps you to prevent any significant monetary losses
● Allows to build intelligence ability in machines
● You can perform sentiment analysis to gauge customer brand loyalty
● It enables you to take better and faster decisions
● Helps you to recommend the right product to the right customer to
enhance your business
Evolution of DataSciences
Statistics is the most critical unit of Data Science basics. It is the method or
science of collecting and analyzing numerical data in large quantities to get
useful insights.
Visualization:
Machine Learning:
Machine Learning explores the building and study of algorithms which learn
to make predictions about unforeseen/future data.
Deep Learning:
Deep Learning method is new machine learning research where the algorithm
selects the analysis model to follow.
1. Discovery:
Discovery step involves acquiring data from all the identified internal &
external sources which helps you to answer the business question.
2. Preparation:
Data can have lots of inconsistencies like missing value, blank columns,
incorrect data format which needs to be cleaned. You need to process,
explore, and condition data before modeling. The cleaner your data, the
better are your predictions.
3. Model Planning:
In this stage, you need to determine the method and technique to draw the
relation between input variables. Planning for a model is performed by using
different statistical formulas and visualization tools. SQL analysis services, R,
and SAS/access are some of the tools used for this purpose.
4. Model Building:
In this step, the actual model building process starts. Here, Data scientist
distributes datasets for training and testing. Techniques like association,
classification, and clustering are applied to the training data set. The model
once prepared is tested against the "testing" dataset.
5. Operationalize:
In this stage, you deliver the final baselined model with reports, code, and
technical documents. Model is deployed into a real-time production
environment after thorough testing.
6. Communicate Results
In this stage, the key findings are communicated to all stakeholders. This
helps you to decide if the results of the project are a success or a failure
based on the inputs from the model.
● Data Scientist
● Data Engineer
● Data Analyst
● Statistician
● Data Architect
● Data Admin
● Business Analyst
● Data/Analytics Manager
Now in this Data Science Tutorial, let's learn what each role entails in detail:
Data Scientist:
Role:
Languages:
Data Engineer:
Role:
Languages:
Data Analyst:
Role:
A data analyst is responsible for mining vast amounts of data. He or she will
look for relationships, patterns, trends in data. Later he or she will deliver
compelling reporting and visualization for analyzing the data to take the most
viable business decisions.
Languages:
Statistician:
Role:
Data Administrator:
Role:
Data admin should ensure that the database is accessible to all relevant
users. He also makes sure that it is performing correctly and is being kept
safe from hacking.
Languages:
Business Analyst:
Role:
Languages:
Internet Search:
Google search use Data science technology to search a specific result within
a fraction of a second
Recommendation Systems:
Speech recognizes system like Siri, Google assistant, Alexa runs on the
technique of Data science. Moreover, Facebook recognizes your friend when
you upload a photo with them, with the help of Data Science.
Gaming world:
EA Sports, Sony, Nintendo, are using Data science technology. This enhances
your gaming experience. Games are now developed using Machine Learning
technique. It can update itself when you move to higher levels.
Summary
● Data Science is the area of study which involves extracting insights from
vast amounts of data by the use of various scientific methods,
algorithms, and processes.
● Statistics, Visualization, Deep Learning, Machine Learning, are
important Data Science concepts.
● Data Science Process goes through Discovery, Data Preparation, Model
Planning, Model Building, Operationalize, Communicate Results.
● Important Data Scientist job roles are: 1) Data Scientist 2) Data
Engineer 3) Data Analyst 4) Statistician 5) Data Architect 6) Data Admin
7) Business Analyst 8) Data/Analytics Manager
● R, SQL, Python, SaS, are essential Data science tools
● The predictions of Business Intelligence is looking backward while for
Data Science it is looking forward.
● Important applications of Data science are 1) Internet Search 2)
Recommendation Systems 3) Image & Speech Recognition 4) Gaming
world 5) Online Price Comparison.
● High variety of information & data is the biggest challenge of Data
Science technology.
Machine learning combines data with statistical tools to predict an output. This
output is then used by corporate to makes actionable insights. Machine
learning is closely related to data mining and Bayesian predictive modeling.
The Machine receives data as input, uses an algorithm to formulate answers.
KEY DIFFERENCE
● Data Science extracts insights from vast amounts of data by the use of
various scientific methods, algorithms, and processes On the other
hand, Machine Learning is a system that can learn from data through
self-improvement and without logic being explicitly coded by the
programmer.
● Data science can work with manual methods, though they are not very
useful while Machine learning algorithms hard to implement manually.
● Data science is not a subset of Artificial Intelligence (AI) while Machine
learning technology is a subset of Artificial Intelligence (AI).
● Data science technique helps you to create insights from data dealing
with all real-world complexities while Machine learning method helps
you to predict and the outcome for new database values.
Data science is an interdisciplinary field that uses Machine learning is the scientific study of
scientific methods, algorithms, and systems to algorithms and statistical models. This method
extract knowledge from many structural and uses to perform a specific task.
unstructured data.
Data science technique helps you to create Machine learning method helps you to predict
insights from data dealing with all real-world and the outcome for new databases from
complexities. historical data with the help of mathematical
models.
Nearly all of the input data is generated in a Input data for Machine learning will be
human-readable format, which is read or transformed, especially for algorithms used.
analyzed by humans.
Data science can work with manual methods as Machine learning algorithms hard to implement
well, though they are not very useful. manually.
Data science is a complete process. Machine learning is a single step in the entire
data science process.
In Data Science, high RAM and SSD used, which In Machine Learning, GPUs are used for
helps you to overcome I/O bottleneck problems. intensive vector operations.
● The wide variety of information & data is needed for accurate analysis
● Not adequate data science talent pool available
● Management does not provide financial support for a data science
team.
● Unavailability of/difficult access to data
● Data Science results not effectively used by business decision-makers
● Explaining data science to others is difficult
● Privacy issues
● Lack of significant domain expert
● If an organization is very small, it can't have a data science team.
Internet Search:
Google search uses data science technology to search a specific result within
a fraction of a second
Recommendation Systems:
Speech recognizes systems like Siri, Google Assistant, Alexa runs on the
technique of data science. Moreover, Facebook recognizes your friend when
you upload a photo with them.
Gaming World:
EA Sports, Sony, Nintendo, are using data science technology. This enhances
your gaming experience. Games are now developed using machine learning
techniques. It can update itself when you move to higher levels.
Automation:
Machine learning, which works entirely autonomously in any field without the
need for any human intervention. For example, robots performing the
essential process steps in manufacturing plants.
Finance Industry:
Government Organization:
The government makes use of ML to manage public safety and utilities. Take
the example of China with massive face recognition. The government uses
Artificial intelligence to prevent jaywalker.
Healthcare Industry:
Healthcare was one of the first industry to use machine learning with image
detection.
On the other hand, data science can help you to detect fraud using advanced
machine learning algorithms. It also helps you to prevent any significant
monetary losses. It helps you to perform sentiment analysis to gauge
customer brand loyalty.
What is Data Analysis? Research | Types
| Methods | Techniques
What is Data Analysis?
Data analysis is defined as a process of cleaning, transforming, and
modeling data to discover useful information for business decision-making.
The purpose of Data Analysis is to extract useful information from data and
taking the decision based upon the data analysis.
If your business is not growing, then you have to look back and acknowledge
your mistakes and make a plan again without repeating those mistakes. And
even if your business is growing, then you have to look forward to making the
business to grow more. All you need to do is analyze your business data and
business processes.
Data Analysis Tools
Data analysis tools make it easier for users to process and manipulate data,
analyze the relationships and correlations between data sets, and it also helps
to identify patterns and trends for interpretation. Here is a complete list
of tools used for data analysis in research.
● Text Analysis
● Statistical Analysis
● Diagnostic Analysis
● Predictive Analysis
● Prescriptive Analysis
Text Analysis
Text Analysis is also referred to as Data Mining. It is one of the methods of
data analysis to discover a pattern in large data sets using databases or data
mining tools. It used to transform raw data into business information. Business
Intelligence tools are present in the market which is used to take strategic
business decisions. Overall it offers a way to extract and examine data and
deriving patterns and finally interpretation of the data.
Statistical Analysis
Statistical Analysis shows "What happen?" by using past data in the form of
dashboards. Statistical Analysis includes collection, Analysis, interpretation,
presentation, and modeling of data. It analyses a set of data or a sample of
data. There are two categories of this type of Analysis - Descriptive Analysis
and Inferential Analysis.
Descriptive Analysis
Inferential Analysis
analyses sample from complete data. In this type of Analysis, you can find
different conclusions from the same data by selecting different samples.
Diagnostic Analysis
Diagnostic Analysis shows "Why did it happen?" by finding the cause from the
insight found in Statistical Analysis. This Analysis is useful to identify behavior
patterns of data. If a new problem arrives in your business process, then you
can look into this Analysis to find similar patterns of that problem. And it may
have chances to use similar prescriptions for the new problems.
Predictive Analysis
Predictive Analysis shows "what is likely to happen" by using previous data.
The simplest data analysis example is like if last year I bought two dresses
based on my savings and if this year my salary is increasing double then I can
buy four dresses. But of course it's not easy like this because you have to
think about other circumstances like chances of prices of clothes is increased
this year or maybe instead of dresses you want to buy a new bike, or you
need to buy a house!
Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to
determine which action to take in a current problem or decision. Most
data-driven companies are utilizing Prescriptive Analysis because predictive
and descriptive Analysis are not enough to improve data performance. Based
on current situations and problems, they analyze the data and make
decisions.
Data Collection
After requirement gathering, you will get a clear idea about what things you
have to measure and what should be your findings. Now it's time to collect
your data based on requirements. Once you collect your data, remember that
the collected data must be processed or organized for Analysis. As you
collected data from various sources, you must have to keep a log with a
collection date and source of the data.
Data Cleaning
Now whatever data is collected may not be useful or irrelevant to your aim of
Analysis, hence it should be cleaned. The data which is collected may contain
duplicate records, white spaces or errors. The data should be cleaned and
error free. This phase must be done before Analysis because based on data
cleaning, your output of Analysis will be closer to your expected outcome.
Data Analysis
Once the data is collected, cleaned, and processed, it is ready for Analysis.
As you manipulate data, you may find you have the exact information you
need, or you might need to collect more data. During this phase, you can
use data analysis tools and software which will help you to understand,
interpret, and derive conclusions based on the requirements.
Data Interpretation
After analyzing your data, it's finally time to interpret your results. You can
choose the way to express or communicate your data analysis either you can
use simply in words or maybe a table or chart. Then use the results of your
data analysis process to decide your best course of action.
Data Visualization
Data visualization is very common in your day to day life; they often appear in
the form of charts and graphs. In other words, data shown graphically so that
it will be easier for the human brain to understand and process it. Data
visualization often used to discover unknown facts and trends. By observing
relationships and comparing datasets, you can find a way to find out
meaningful information.
Summary:
Questions:
Q.7 What is Machine Learning? What is key difference between Data Science and Machine
Learning?