5-Supervised and Unsupervised
5-Supervised and Unsupervised
Supervised Unsupervised
Background:
Unlabeled data consists of data which is either taken from nature or created by
human to explore the scientific patterns behind it. Some examples of unlabeled data
might include photos, audio recordings, videos, news articles, tweets, x-rays, etc. The
main concept is there is no explanation, label, tag, class or name for the
features in data.
Labeled data consists of unlabeled data with a description, label or name of features in
the data. E.g. In a labeled image dataset, an image is labeled as it is a cat’s photo and it’s
a dog’s photo. Labels for data are often obtained by asking humans to make judgments
about a given piece of unlabeled data (e.g., "Does this photo contain a horse or a cow?")
and are significantly more expensive to obtain than the raw unlabeled data. After
obtaining a labeled dataset, machine learning models can be applied to the data so that
new unlabeled data can be presented to the model and a likely label can be guessed or
predicted for that piece of unlabeled data.
1.2.1 Regression:
Example: You can use regression to predict the house price from training data. The
input variables will be locality, size of a house, etc.
1.2.2. Classification:
Example:
Unsupervised learning algorithms allow you to perform more complex processing tasks
compared to supervised learning. Although, unsupervised learning can be more
unpredictable compared with other learning methods.
1.5.2 Association
Association rules allow you to establish associations amongst data objects inside large
databases. This unsupervised technique is about discovering exciting relationships
between variables in large databases. For example, people that buy a new home most
likely to buy new furniture.
Example:
Clustering: A clustering problem is where you want to discover the inherent groupings
in the data
Main Drawback Classifying big data can You cannot get precise
be a real challenge in information regarding data
Supervised Learning. sorting, and the output as
data used in unsupervised
learning is labeled and not
known.
Further study:
https://www.youtube.com/watch?v=cfj6yaYE86U
https://www.youtube.com/watch?v=kE5QZ8G_78c