AI UNIT 1 Data Science
AI UNIT 1 Data Science
Data Science has become the most demanding job of the 21st century.
Every organization is looking for candidates with knowledge of data
science. In this tutorial, we are giving an introduction to data science, with
data science Job roles, tools for data science, components of data science,
application, etc.
So let's start,
Example:
Let suppose we want to travel from station A to station B by car. Now, we
need to take some decisions such as which route will be the best route to
reach faster at the location, in which route there will be no traffic jam, and
which will be cost-effective. All these decision factors will act as input
data, and we will get an appropriate answer from these decisions, so this
analysis of data is called the data analysis, which is a part of data science.
Now, handling of such huge amount of data is a challenging task for every
organization. So to handle, process, and analysis of this, we required
some complex, powerful, and efficient algorithms and technology, and
that technology came into existence as data Science. Following are some
main reasons for using data science technology:
o With the help of data science technology, we can convert the massive
amount of raw and unstructured data into meaningful insights.
o Data science technology is opting by various companies, whether it is a
big brand or a startup. Google, Amazon, Netflix, etc, which handle the
huge amount of data, are using data science algorithms for better
customer experience.
o Data science is working for automating transportation such as creating a
self-driving car, which is the future of transportation.
o Data science can help in different predictions such as various survey,
elections, flight ticket confirmation, etc.
Data science Jobs:
As per various surveys, data scientist job is becoming the most
demanding Job of the 21st century due to increasing demands for data
science. Some people also called it "the hottest job title of the 21st
century". Data scientists are the experts who can use various statistical
tools and machine learning algorithms to understand and analyze the
data.
The average salary range for data scientist will be approximately $95,000
to $ 165,000 per annum, and as per different researches, about 11.5
millions of job will be created by the year 2026.
1. Data Scientist
2. Data Analyst
3. Machine learning expert
4. Data engineer
5. Data Architect
6. Data Administrator
7. Business Analyst
8. Business Intelligence Manager
1. Data Analyst:
Skill required: For becoming a data analyst, you must get a good
background in mathematics, business intelligence, data mining, and
basic knowledge of statistics. You should also be familiar with some
computer languages and tools such as MATLAB, Python, SQL, Hive,
Pig, Excel, SAS, R, JS, Spark, etc.
2. Machine Learning Expert:
The machine learning expert is the one who works with various machine
learning algorithms used in data science such as regression, clustering,
classification, decision tree, random forest, etc.
3. Data Engineer:
A data engineer works with massive amount of data and responsible for
building and maintaining the data architecture of a data science project.
Data engineer also works for the creation of data set processes used in
modelling, mining, acquisition, and verification.
4. Data Scientist:
Data Business intelligence deals with Data science deals with structured
Source structured data, e.g., data warehouse. unstructured data, e.g., webl
feedback, etc.
Skills Statistics and Visualization are the two Statistics, Visualization, and Mac
skills required for business intelligence. learning are the required skills for d
science.
o Regression
o Decision tree
o Clustering
o Principal component analysis
o Support vector machines
o Naive Bayes
o Artificial neural network
o Apriority
We will provide you some brief introduction for few of the important
algorithms here,
1. Y= exec
In the decision tree algorithm, we can solve the problem, by using tree
representation in which, each node represents a feature, each branch
represents a decision, and each leaf represents the outcome.
If we are given a data set of items, with certain features and values, and
we need to categorize those set of items into groups, so such type of
problems can be solved using k-means clustering algorithm.
Is this A or B? :
We can refer to this type of problem which has only two fixed solutions
such as Yes or No, 1 or 0, may or may not. And this type of problems can
be solved using classification algorithms.
Is this different? :
Now if you have a problem which needs to deal with the organization of
data, then it can be solved using clustering algorithms.
The main phases of data science life cycle are given below:
1. Discovery: The first phase is discovery, which involves asking the right
questions. When you start any data science project, you need to
determine what are the basic requirements, priorities, and project budget.
In this phase, we need to determine all the requirements of the project
such as the number of people, technology, time, data, an end goal, and
then we can frame the business problem on first hypothesis level.
o Data cleaning
o Data Reduction
o Data integration
o Data transformation,
After performing all the above tasks, we can easily use this data for our
further processes.