0% found this document useful (0 votes)

7 views10 pages

Unit 1-1

This document introduces machine learning, explaining its significance and the reasons for using Python in this field. It covers various machine learning concepts such as supervised and unsupervised learning, essential libraries like scikit-learn, NumPy, and pandas, and outlines the steps involved in building a machine learning model. Additionally, it provides an example of classifying iris species using a machine learning approach.

Uploaded by

darshnmit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Unit 1-1

Uploaded by

darshnmit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT 1

INTRODUCTION
SYLLABUS
➢ Why Machine Learning?
➢ Why Python?
➢ Essentials libraries and tools
➢ Experiment : Basics of python libraries

MACHINE LEARNING
➢ Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to
“self-learn” from training data and improve over time, without being explicitly programmed.
➢ Machine learning is about extracting knowledge from data.

WHY MACHINE LEARNING?

Problems Machine Learning Can Solve

Supervised Learning
➢ It takes a known set of input data (the learning set) and known responses to the data (the
output), and forms a model to generate reasonable predictions for the response to the new
input data.
Examples of Supervised Machine Learning tasks:
1. Identifying the zip code from handwritten digits on an envelope.
➢ Here the input is a scan of the handwriting, and the desired output is the actual digits in
the zip code.
➢ To create a dataset for building a ML model, collect many envelopes. Then read the zip
codes and store the digits as desired outcomes.
2. Determining whether a tumor is benign based on a medical image
➢ Here the input is the image, and the output is whether the tumor is benign.
➢ To create a dataset for building a model, a database of medical images is needed.
➢ An expert opinion is needed, so a doctor needs to look at all of the images and decide
which tumors are benign and which are not.
➢ It might even be necessary to do additional diagnosis beyond the content of the image to
determine whether the tumor in the image is cancerous or not.
3. Detecting fraudulent activity in credit card transactions
➢ Input is a record of the credit card transaction, and the output is whether it is likely to be
fraudulent or not.
➢ Collecting a dataset means storing all transactions and recording if a user reports any
transaction as fraudulent.
Unsupervised Learning
➢ It is a type of machine learning in which models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.
Examples of unsupervised learning include:
1. Identifying topics in a set of blog posts
➢ When there is a large collection of text data, want to summarize it and find prevalent themes
in it. Topics are unknown or no clue about the number of topics. Therefore, there are no
known outputs.
2. Segmenting customers into groups with similar preferences
➢ Given a set of customer records, it is required to identify which customers are similar, and
whether there are groups of customers with similar preferences. There is no information about
the groups.
3. Detecting abnormal access patterns to a website
➢ To identify abuse or bugs, it is often helpful to find access patterns that are different from the
norm. Each abnormal pattern might be very different, and there might not have any recorded
instances of abnormal behavior.

Representation of input data that a computer can understand

➢ The data should be considered as a table.
➢ Each data point infers a row, and each property that describes that data point is a column.
➢ Each entity or row here is known as a Sample (or data point) in machine learning, while the
columns the properties that describe these entities are called Features.
➢ Building a good representation of the data is called Feature Extraction or Feature
Engineering.
➢ No machine learning algorithm will be able to make a prediction on data for which it has no
information.
➢ Ex: If the only feature that have for a patient is their last name, no algorithm will be able to
predict their gender.
Adding another feature that contains the patient’s first name, will have much better luck, as it
is often possible to tell the gender by a person’s first name.

Knowing the Task and Knowing the Data

➢ Important part in the machine learning process is understanding the data and how it relates to
the task.
➢ It will not be effective to randomly choose an algorithm and throw data at it.
➢ It is necessary to understand what is going on in dataset before building a model.
➢ Each algorithm is different in terms of what kind of data and what problem setting it works
best for.
➢ While building a machine learning solution, answer the following questions:
1. What question(s) am I trying to answer?
2. Do I think the data collected can answer that question?
3. What is the best way to phrase my question(s) as a machine learning problem?
4. Have I collected enough data to represent the problem I want to solve?
5. What features of the data did I extract, and will these enable the right predictions?
6. How will I measure success in my application?
7. How will the machine learning solution interact with other parts of my research or
business product?

WHY PYTHON?
➢ Python combines the power of general-purpose programming languages with the ease of use
of domain-specific scripting languages like MATLAB or R.
➢ Python has libraries for data loading, visualization, statistics, natural language processing,
image processing, and more.
➢ It provides data scientists with a large array of general-and special-purpose functionality.
➢ Advantages of using Python is the ability to interact directly with the code, using a terminal
or other tools like the Jupyter Notebook.
➢ Machine learning and data analysis are iterative processes, in which the data drives the
analysis. It is essential to have tools that allow quick iteration and easy interaction.
➢ As a general-purpose programming language, Python also allows for the creation of complex
graphical user interfaces (GUIs) and web services, and for integration into existing systems.
Scikit – learn
➢ scikit-learn is an open source project, it is free to use and distribute, and source code is easily
available. The scikit-learn project is constantly being developed and improved.
➢ It has a very active user community.
➢ It contains a number of state-of-the-art machine learning algorithms, as well as
comprehensive documentation about each algorithm.
➢ It is a very popular tool, and the most prominent Python library for machine learning.
➢ It is widely used in industry and academia, tutorials and code snippets are available online.
➢ It works well with a number of other scientific Python tools.
➢ scikit-learn depends on two other Python packages, NumPy and SciPy.
➢ For plotting and interactive development, install matplotlib, IPython, and the Jupyter
Notebook.
➢ It is recommended to use one of the following prepackaged Python distributions, which will
provide the necessary packages:
1. Anaconda
2. Enthought Canopy
3. Python(x,y)
1. Anaconda
➢ A Python distribution made for large-scale data processing, predictive analytics, and
scientific computing.
➢ Anaconda comes with NumPy, SciPy, matplotlib, pandas, IPython, Jupyter Notebook, and
scikit-learn.
➢ Available on Mac OS, Windows, and Linux.
➢ It is a very convenient solution and without an existing installation of the scientific Python
packages.
➢ Anaconda includes the commercial Intel MKL library for free.
➢ MKL can give significant speed improvements for many algorithms in scikit-learn.
2.Enthought Canopy
➢ Python distribution for scientific computing.
➢ This comes with NumPy, SciPy, matplotlib, pandas, and IPython, but the free version does
not come with scikit-learn.
➢ Academic, degree-granting institution, can request an academic license and get free access to
the paid subscription version of Enthought Canopy.
➢ Enthought Canopy is available for Python 2.7.x, and works on Mac OS, Windows, and Linux.
3. Python(x,y)
➢ A free Python distribution for scientific computing, specifically for Windows.
➢ Python(x,y) comes with NumPy, SciPy, matplotlib, pandas, IPython, and scikit-learn.

ESSENTIAL LIBRARIES AND TOOLS

➢ scikit-learn is built on top of the NumPy and SciPy scientific Python libraries.
➢ In addition to NumPy and SciPy, pandas and matplotlib can be used.
➢ Jupyter Notebook, is a browser-based interactive programming environment.

Jupyter Notebook
➢ It is an interactive environment for running code in the browser.
➢ It is a great tool for exploratory data analysis and is widely used by data scientists.
➢ It supports many programming languages, only need the Python support.
➢ The Jupyter Notebook makes it easy to incorporate code, text, and images.

NumPy
➢ It is one of the fundamental packages for scientific computing in Python.
➢ It contains functionality for multidimensional arrays, high-level mathematical functions such
as linear algebra operations and the Fourier transform, and pseudorandom number generators.
➢ In scikit-learn, the NumPy array is the fundamental data structure.
➢ scikit-learn takes in data in the form of NumPy arrays.
➢ Any data have to be converted to a NumPy array.
➢ The core functionality of NumPy is the ndarray class, a multidimensional (n-dimensional)
array.
➢ All elements of the array must be of the same type.

NumPy
➢ A NumPy array looks like this:

➢ Objects of the NumPy ndarray class are referred as “NumPy arrays” or just “Arrays”.
SciPy
➢ SciPy is a collection of functions for scientific computing in Python.
➢ It provides advanced linear algebra routines, mathematical function optimization, signal
processing, special mathematical functions, and statistical distributions.
➢ The most important part of SciPy is scipy.sparse: This provides sparse matrices, another
representation that is used for data in scikitlearn.
➢ Sparse matrices are used to store a 2D array that contains mostly zeros:

➢ It is not possible to create dense representations of sparse data so need to create sparse
representations directly.
➢ To create the same sparse matrix, using the COO format:
matplotlib
➢ It is the primary scientific plotting library in Python.
➢ It provides functions for making publication-quality visualizations such as line charts,
histograms, scatter plots, and so on.
➢ Visualizing data and different aspects of analysis can give important insights.
➢ When working inside the Jupyter Notebook, figures can be showed directly in the browser by
using the %matplotlib notebook and %matplotlib inline commands.
➢ Using %matplotlib notebook, provides an interactive environment.

pandas
➢ pandas is a Python library for data wrangling and analysis.
➢ It is built around a data structure called the DataFrame that is modeled after the R
DataFrame.
➢ pandas DataFrame is a table, similar to an Excel spreadsheet.
➢ pandas provides a great range of methods to modify and operate on this table; it allows SQL-
like queries and joins of tables.
➢ pandas allows each column to have a separate type (for example, integers, dates, floating-
point numbers, and strings).
➢ Ability to ingest from a great variety of file formats and data‐ bases, like SQL, Excel files,
and comma-separated values (CSV) files.

mglearn
➢ Helper functions or a library of utility functions.

Application: Classifying Iris Species

Assume that a hobby botanist is interested in distinguishing the species of some iris flowers that she
has found. She has collected some measurements associated with each iris: the length and width of the
petals and the length and width of the sepals, all measured in centimeters. She also has the
measurements of some irises that have been previously identified by an expert botanist as belonging
to the species setosa, versicolor, or virginica. For these measurements, she can be certain of which
species each iris belongs to. Let’s assume that these are the only species hobby botanist will encounter
in the wild. Build a machine learning model that can learn from the measurements of these irises whose
species is known, so that can predict the species for a new iris.
Meet the Data:
➢ It is included in scikit-learn in the datasets module.
➢ Can load it by calling the load_iris function:

➢ It contains keys and values:

➢ The value of the key DESCR is a short description of the dataset.

➢ The value of the key target_names is an array of strings, containing the species of flower that
is to be predicted:

➢ The value of feature_names is a list of strings, giving the description of each feature:

➢ The data itself is contained in the target and data fields. data contains the numeric
measurements of sepal length, sepal width, petal length, and petal width in a NumPy array:

➢ The rows in the data array correspond to flowers, while the columns represent the four
measurements that were taken for each flower:

➢ Here are the feature values for the first four samples:

➢ The target array contains the species of each of the flowers that were measured, also as a
NumPy array:

➢ Target is a one-dimensional array, with one entry per flower:

➢ The species are encoded as integers from 0 to 2:

Measuring Success: Training and Testing Data:

➢ Call train_test_split on the data and assign the outputs using the nomenclature:

Steps involved in Machine Learning modelling:

1. Collecting Data:
➢ Machines initially learn from the data. Collect reliable data so that machine learning
model can find the correct patterns.
➢ The quality of the data that is fed to the machine will determine how accurate the model
is.
➢ If the data is incorrect or outdated, the model gives wrong outcomes or predictions which
are not relevant.
➢ Data should be from a reliable source, as it will directly affect the outcome of the model.
➢ Good data is relevant, contains very few missing and repeated values, and has a good
representation of the various subcategories/classes present.
2. Preparing the Data:
➢ Putting together all the data and randomizing it - data is evenly distributed, and the
ordering does not affect the learning process.
➢ Cleaning the data to remove unwanted data, missing values, rows, and columns, duplicate
values, data type conversion, etc. Restructure the dataset and change the rows and
columns or index of rows and columns.
➢ Visualize the data to understand how it is structured and understand the relationship
between various variables and classes present.
➢ Splitting the cleaned data into two sets - a training set and a testing set. The training set is
the set to train the model. A testing set is used to check the accuracy of the model after
training.
3. Choosing a Model:
➢ It is important to choose a model which is relevant to the task. Also check whether the
model is suitable for numerical or categorical data and choose accordingly.
4. Training the Model:
➢ Pass the prepared data to the machine learning model to find patterns and make
predictions. The model learns from the data and accomplish the task set. Over time, with
training, the model gets better at predicting.
5. Evaluating the Model:
➢ Test the performance of the model on previously unseen data. The unseen data used is the
testing set. If testing was done on the same training data, the model gives accurate
measure, as the model is already used to the data, and finds the same patterns in it, as it
previously did. This will give you disproportionately high accuracy.
➢ When used on testing data, will get an accurate measure of how the model will perform
and its speed.
6. Parameter Tuning:
➢ Accuracy can be improved by tuning the parameters present in the model.
➢ Parameters are the variables in the model that the programmer generally decides.
➢ At a particular value of the parameter, the accuracy will be the maximum. Parameter
tuning refers to finding these values.
7. Making Predictions
➢ Use model on unseen data to make predictions accurately.

FE Practice 1 Completed PDF
No ratings yet
FE Practice 1 Completed PDF
20 pages
Machine Learning Using Python Project (PPT)
0% (2)
Machine Learning Using Python Project (PPT)
8 pages
Statistics Machine Learning Python Draft
No ratings yet
Statistics Machine Learning Python Draft
319 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Best Python Libraries For Machine Learning - GeeksforGeeks
No ratings yet
Best Python Libraries For Machine Learning - GeeksforGeeks
18 pages
Statistics and Machine Learning in Python
No ratings yet
Statistics and Machine Learning in Python
300 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
No ratings yet
100 Must-Know PythonMl Interview Questions and Answers 2024 - Devinterview - Io
1 page
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
219 pages
Mackay Hazel PythoMachine Learning With Pytorch and Scikit Learn A Co
No ratings yet
Mackay Hazel PythoMachine Learning With Pytorch and Scikit Learn A Co
135 pages
Python - Follow Dr. AngShu (@drangshu) For More
100% (1)
Python - Follow Dr. AngShu (@drangshu) For More
300 pages
Inteligen NT BB - NTC BB Datasheet PDF
No ratings yet
Inteligen NT BB - NTC BB Datasheet PDF
4 pages
Excel and Exceed Basel-II
No ratings yet
Excel and Exceed Basel-II
20 pages
Python Libraries and Packages For Data Science
100% (1)
Python Libraries and Packages For Data Science
5 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Python Tutorial For Students Machine Learning Course Holzinger
100% (1)
Python Tutorial For Students Machine Learning Course Holzinger
46 pages
MAchine Learning
No ratings yet
MAchine Learning
120 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Role & Responsibility of A Functional Consultant
No ratings yet
Role & Responsibility of A Functional Consultant
2 pages
ML Introduction
No ratings yet
ML Introduction
47 pages
Senior SAP Basis Administrator in Greater Chicago IL Resume Paul McCrimmon
No ratings yet
Senior SAP Basis Administrator in Greater Chicago IL Resume Paul McCrimmon
3 pages
Printer File AS400
No ratings yet
Printer File AS400
154 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
Standard P&I Cyber-Security-Poster-2017 - 05 PDF
100% (2)
Standard P&I Cyber-Security-Poster-2017 - 05 PDF
1 page
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Lecture # 2
No ratings yet
Lecture # 2
21 pages
Statistics Machine Learning Python Draft
No ratings yet
Statistics Machine Learning Python Draft
329 pages
Machine Learningusing Python
No ratings yet
Machine Learningusing Python
18 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
Python Unit 5
No ratings yet
Python Unit 5
23 pages
TASK 2 - BSBMGT517 Manage Operational Plan TASK 2 - BSBMGT517 Manage Operational Plan
68% (19)
TASK 2 - BSBMGT517 Manage Operational Plan TASK 2 - BSBMGT517 Manage Operational Plan
14 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
Diya Basera
No ratings yet
Diya Basera
15 pages
PPT-Final Project - DT - Done All Final
No ratings yet
PPT-Final Project - DT - Done All Final
14 pages
Machine Learning Masterclass 2023
No ratings yet
Machine Learning Masterclass 2023
6 pages
Report Print
No ratings yet
Report Print
22 pages
PDS Labmanualword
No ratings yet
PDS Labmanualword
32 pages
Data Sets
No ratings yet
Data Sets
36 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
Wa0003.
No ratings yet
Wa0003.
12 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
04 MLModelingBasics
No ratings yet
04 MLModelingBasics
61 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
SBWP Create Substitute For Workflow
No ratings yet
SBWP Create Substitute For Workflow
7 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
2 Machine Learning
No ratings yet
2 Machine Learning
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Zero Lab ML
No ratings yet
Zero Lab ML
3 pages
Unit 1
No ratings yet
Unit 1
28 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Memory and I/O Interfacing
No ratings yet
Memory and I/O Interfacing
37 pages
1725629890-Unit1 Machine Learning Introduction CU 3.0
No ratings yet
1725629890-Unit1 Machine Learning Introduction CU 3.0
38 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
ML Exp
No ratings yet
ML Exp
9 pages
Knowing The Machine Learning
No ratings yet
Knowing The Machine Learning
15 pages
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
No ratings yet
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
119 pages
ML Lab Manual (Vim)
No ratings yet
ML Lab Manual (Vim)
13 pages
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
No ratings yet
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
30 pages
Machine Learning Lab Programs
No ratings yet
Machine Learning Lab Programs
6 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
Matrix Data Analysis Diagram
75% (4)
Matrix Data Analysis Diagram
11 pages
Bending Plate Analysis in ABAQUS Package: Tomasz Żebro, Jerzy Pamin Version 1.1, 2017-05-18
No ratings yet
Bending Plate Analysis in ABAQUS Package: Tomasz Żebro, Jerzy Pamin Version 1.1, 2017-05-18
9 pages
Trixbox2 - Without - Tears User Manual
No ratings yet
Trixbox2 - Without - Tears User Manual
248 pages
Online Order Management
No ratings yet
Online Order Management
76 pages
Certification Question
0% (1)
Certification Question
7 pages
SAP HANA Modeling Guide For SAP HANA XS Advanced Model en V1.0 SPS12
No ratings yet
SAP HANA Modeling Guide For SAP HANA XS Advanced Model en V1.0 SPS12
86 pages
CPLEX Interactive Optimizer
No ratings yet
CPLEX Interactive Optimizer
3 pages
How To Install and Activate PAN-DB For URL Filt... Palo Alto Networks Live
No ratings yet
How To Install and Activate PAN-DB For URL Filt... Palo Alto Networks Live
2 pages
CCIE Security v5 Configure LAB1 Questions
No ratings yet
CCIE Security v5 Configure LAB1 Questions
23 pages
Enumeration
No ratings yet
Enumeration
33 pages
Machine Learning On Big Data: Opportunities and Challenges: Version of Record
No ratings yet
Machine Learning On Big Data: Opportunities and Challenges: Version of Record
27 pages
On Data
No ratings yet
On Data
30 pages
Location Samsung
No ratings yet
Location Samsung
3 pages
Unit 4
No ratings yet
Unit 4
19 pages
Chapter 3 - Block Ciphers and The Data Encryption Standard
No ratings yet
Chapter 3 - Block Ciphers and The Data Encryption Standard
15 pages
Mounting Cdrom Unix
No ratings yet
Mounting Cdrom Unix
7 pages
Site Plan Idea
No ratings yet
Site Plan Idea
14 pages
Debugging and The Scientific Method
No ratings yet
Debugging and The Scientific Method
7 pages
BMI Autopsy: 1 1 3 1 1 1 1 2 1 3 1 1 1 2 2 1 2 3 3 2 4 1 2 1 1 2 1 2 1 1 1 Total Result 23 27 50
No ratings yet
BMI Autopsy: 1 1 3 1 1 1 1 2 1 3 1 1 1 2 2 1 2 3 3 2 4 1 2 1 1 2 1 2 1 1 1 Total Result 23 27 50
4 pages
Monte Carlo Simulation Example For Queuing Analysis
No ratings yet
Monte Carlo Simulation Example For Queuing Analysis
4 pages

Unit 1-1

Uploaded by

Unit 1-1

Uploaded by

UNIT 1

WHY MACHINE LEARNING?

Problems Machine Learning Can Solve

Representation of input data that a computer can understand

Knowing the Task and Knowing the Data

ESSENTIAL LIBRARIES AND TOOLS

Application: Classifying Iris Species

➢ It contains keys and values:

➢ The value of the key DESCR is a short description of the dataset.

➢ Target is a one-dimensional array, with one entry per flower:

Measuring Success: Training and Testing Data:

Steps involved in Machine Learning modelling:

You might also like