0% found this document useful (0 votes)
27 views

Data Science Notes

Uploaded by

Roboticol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Data Science Notes

Uploaded by

Roboticol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Science

Class X 10
WHAT IS PROJECT CYCLE
Project Cycle is a step-by-step process to solve problems using proven
scientific methods and drawing inferences about them.

COMPONENTS OF PROJECT CYCLE

Components of the project cycle are the steps that contribute to completing
the Project. The Components of AI Project Cycle are:-

Problem Scoping - Understanding the problem


Data Acquisition - Collecting accurate and reliable data
Data Exploration - Arranging the data uniformly
Modelling - Creating Models from the data
Evaluation - Evaluating the project

PROBLEM SCOPING

Problem Scoping refers to


understanding a problem,
finding out various factors which
affect the problem, define the
goal or aim of the project.
1 SUSTAINABLE DEVELOPMENT GOALS

Sustainable Development: To Develop for the present without exploiting the


resources of the future.
17 goals announced by United Nations.
Aim to achieve them by 2030.
Pledge taken by all the member nations of the UN.
The Sustainable Development Goals (SDGs), also known as the Global Goals,
were adopted by all United Nations Member States in 2015 as a universal call to
action to end poverty, protect the planet and ensure that all people enjoy peace
and prosperity
2 4 W'S OF PROBLEM SCOPING

The 4W’s of Problem Scoping are Who, What, Where, and Why. This Ws
helps in identifying and understanding the problem in a better and efficient
manner.
Who - “Who” part helps us in comprehending and categorizing who all
are affected directly and indirectly with the problem and who are called
the Stake Holders

What - “What” part helps us in understanding and identifying the nature


of the problem and under this block, you also gather evidence to prove
that the problem you have selected exists.

Where - "Where” does the problem arise, situation, context, and location.

Why - “Why” is the given problem worth solving.

4 PROBLEM STATEMENT TEMPLATE

The Problem Statement Template helps us to summarize all the key points into
one single template.

So that in the future, whenever there is a need to look back at the basis of the
problem, we can take a look at the Problem Statement Template and
understand its key elements of it.

Have a look at Problem Statement Template.


The Stakeholder Who

Have a problem Issue/Problem What

When/While Context/Situation/Location Where

Ideal Solution How the Solution will help Stakeholders Why

[Problem Statement Template]

DATA ACQUISITION
2 Types of Data Sets
The process of collecting accurate and reliable data to work with.

Data features ⚆_⚆ Refer to the type of data you want to collect.

Ex: salary amount, increment percentage, increment period, bonus, etc.

Big Data ⚆_⚆


It includes data with sizes that exceed the
capacity of traditional software to process
within an acceptable time and value.

The main focus is on unstructured type of Amount of


Types of
Speed of
Data
data. Data
Produced
Data
Produced
Produced
2 DATA SOURCES

Web Scraping Sensors


Web Scraping means collecting data Sensors are very Important but
from web using some technologies. very simple to understand.
We use it for monitoring prices, Sensors are the part of IoT
news and etc. (Internet of things)
Example: Web Scrapping. using Sensers collect the physical
beautiful soup in python. data and detect the changes.

Cameras Observations
Camera captures the visual When we observe something
information and then that information carefully we get some information
which is called image is used as a For ex: Scientists Observe
source of data. creatures to study them.
Cameras are used to capture raw Observations is a time consuming
visual data. data source.

API Surveys
Application Programming interface. The survey is a method of
gathering specific information
API is a messenger which takes
from a sample of people.
requests and tells the system about
requests and gives the response. Example, a census survey for
analyzing the population.
Ex: Twitter API, Google Search API
DATA EXPLORATION

Data Exploration is the process of arranging the gathered data uniformly for a
better understanding. Data can be arranged in the form of a table, plotting a
chart, or making a database.
To analyse the data, you need to visualise it in some user-friendly format so
that you can:
Quickly get a sense of the trends, relationships and patterns
Define strategy for which model to use at a later stage
Communicate the same to others effectively
1 DATA VISUALISATION TOOLS

The tools used to visualise the acquired data are known as data visualisation
or exploration tools.
Rule Based Approach
AI MODELLING → 2 ways/Approaches
Learning Based Approach

Modelling is the process in which different models based on the visualized


data can be created and even checked for the advantages and disadvantages
of the model.

1 RULE BASED APPROACH

Rule Based Approach Refers to the AI modelling where the relationship


or patterns in data are defined by the developer.
That means the machine works on the rules and information given by the
developer and performs the task accordingly.

Ex: You trained your model with 100 images of apples and bananas. Now If you test
it by showing an apple, it will figure out and tell if it's an apple or not. Here Labeled
images of apple and banana were fed, due to which the model could detect the fruit.

*Labeled Images: Simply, when the model is told about what the image is.
Data Sets

Dataset is a collection of related sets of information that is composed of


separate elements but can be manipulated by a computer as a unit.

Training Data – A subset required


to train the model
Testing Data – A subset required
while testing the trained the model
1 LEARNING BASED APPROACH

The learning-Based Approach is based on a Machine learning experience with


the data fed.

Machine Learning (ML)


Machine learning is a subset of artificial intelligence (AI) that provides
machines the ability to learn automatically and improve from experience
without being programmed for it.

Types of Machine Learning

3 types of Machine Learning:-


Supervised Learning
Unsupervised Learning
Semi-supervised or
Reinforcement Learning
Classification
Supervised learning → 2 Categories
Regression
Supervised learning is where a computer algorithm is trained on input data that
has been labeled for a particular output.

→ Classification
Here, Data is categorized under different labels
according to some parameters given in the input
and then the labels are predicted for the data.

Example: To predict which of them is apple


and banana.

→ Regression
Regression is a type of supervised learning
which is used to predict continuous value.

Example: To predict your next salary, put in the


data of your previous salary, any increments, etc.,
train the model.
Example: Weather Prediction using past data.
Here, the data which has been fed to the machine is continuous.
Unsupervised Learning

In terms of machine learning, unsupervised learning is in which a system learns


through data sets created on its own. In this, the training is not labeled.
Important Points:
An unsupervised learning model works on unlabelled dataset.
This means that the data which is fed to the machine is random and there
is a possibility that the person who is training the model does not have
any information regarding it.
The unsupervised learning models are used to identify relationships,
patterns and trends out of the data which is fed into it
What the data is about
It helps the user in understanding
What are the major features identified by
the machine
Example: Suppose a boy sees someone performing tricks with a ball, so he
also learnt the tricks by himself. This is what we call unsupervised learning.

→ Clustering
Its an algorithm which can cluster the unknown
data according to the patterns or trends identified
out of it
The patterns observed can be known to the
developer or it can be unique.

Note: Classification ≈ Division, Clustering ≈ Grouping


→ Dimensionality Reduction:
We can visualize up to 3-Dimensions only.
To reduce the dimensions and still be able to make
sense of the data, we use Dimensionality Reduction.
The ball in our hand is 3-Dimensions. But if we click
its picture, the data transform to 2-D.

Note: Classification ≈ Division, Clustering ≈ Grouping

Reinforcement Learning

Learning through feedback or trial and error method is called Reinforcement


Learning.
The system works on Reward or Penalty policy. In this an agent performs an
action positive or negative, in the environment which is taken as input from the
system, then the system changes the state in the environment and the agent is
provided with a reward or penalty.

Example: A very good example of


these is Vending machines.
Training vs Testing Data

Training Set Testing Set


Base

Used for Training the Model Used for Testing the Model after it is trained
Use

Is a lot bigger than testing data and It is smaller than Training Set andconstitutes
Size constitutes about 70% to 80% about 20% to 30%

EVALUATION

Evaluation is the process of understanding the reliability of any AI model, based on


outputs by feeding the test data into the model and comparing it with actual answers.
There can be different Evaluation techniques, depending on the type and
purpose of the model.

You might also like