Module -1 Lecture-1
Module -1 Lecture-1
•Supervised learning
•Reinforcement learning
•Unsupervised learning
•Machine learning enables a computer system to make predictions
or take some decisions using historical data without being explicitly
programmed.
•Machine learning uses a massive amount of structured and semi-
structured data so that a machine learning model can generate
accurate result or give predictions based on that data.
•Machine learning works on algorithm which learn by it’s own using
historical data.
•Machine learning is being used in various places such as for online
recommender system, for Google search algorithms, Email spam
filter, Facebook Auto friend tagging suggestion, etc
Key differences between Artificial Intelligence (AI) and Machine learning (ML):
How to get datasets for Machine Learning
•The key to success in the field of machine learning or to
become a great data scientist is to practice with different types
of datasets. But discovering a suitable dataset for each kind of
machine learning project is a difficult task.
What is a dataset?
•A dataset is a collection of data in which data is arranged in
some order. A dataset can contain any data from a series of an
array to a database table.
•A tabular dataset can be understood as a database table or
matrix, where each column corresponds to a particular
variable, and each row corresponds to the fields of the
dataset. The most supported file type for a tabular dataset
is "Comma Separated File," or CSV. But to store a "tree-like
data," we can use the JSON file more efficiently.
Types of data in datasets
•Numerical data:Such as house price, temperature, etc.
•Categorical data:Such as Yes/No, True/False, Blue/green, etc.
•Ordinal data:These data are similar to categorical data but can be measured on the
basis of comparison.
Need of Dataset
•To work with machine learning projects, we need a huge amount of data, because,
without the data, one cannot train ML/AI models. Collecting and preparing the
dataset is one of the most crucial parts while creating an ML/AI project.
•The technology applied behind any ML projects cannot work properly if the dataset
is not well prepared and pre-processed.
•During the development of the ML project, the developers completely rely on the
datasets. In building ML applications, datasets are divided into two parts:
•Training dataset:
•Test Dataset
Popular sources for Machine Learning datasets
1. Kaggle Datasets
2. UCI Machine Learning Repository
3. Datasets via AWS
4. Google's Dataset Search Engine
5. Microsoft Datasets
6. Awesome Public Dataset Collection
7. Government Datasets
8. Computer Vision Datasets
9. Scikit-learn dataset
Data Preprocessing in Machine learning
Data preprocessing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine learning
model.
Why do we need Data Preprocessing?
A real-world data generally contains noises, missing values, and maybe in an unusable
format which cannot be directly used for machine learning models. Data preprocessing is
required tasks for cleaning the data and making it suitable for a machine learning model
which also increases the accuracy and efficiency of a machine learning model.
It involves below steps:
What is Deep Learning????
•Artificial Intelligence is the concept of creating smart
intelligent machines.
•Machine Learning is a subset of artificial intelligence
that helps you build AI-driven applications.
•Deep Learning is a subset of machine learning that uses
vast volumes of data and complex algorithms to train a
model.
Difference between Supervised
and Unsupervised Learning