SlideShare a Scribd company logo
Welcome to Ducat India
Language | Industrial Training | Digital Marketing | Web Technology |
Testing+ | Database | Networking | Mobile Application | ERP | Graphic |
Big Data | Cloud Computing
Apply Now
Training and Certification
Call us:
70-70-90-50-90
www.ducatindia.com
Data Science Using Scikit-Learn
Several Python libraries offer solid execution of a range of machine learning algorithms. One of the best called is
Scikit-Learn, a package that supports accurate versions of a large number of standard algorithms. A clean, uniform
features and Scikit-Learn, and streamlined API, as well as by beneficial and complete online documentation.
Data Representation in Scikit-Learn
Machine learning is about generating models from data: for that reason, we will start by discussing how data can be
represented to be learned by the computer. The best method to thought about data within Scikit-Learn is in terms
of tables of data.
Data as table
A virtual table is a two-dimensional grid of data, in which the rows describe single elements of the dataset, and the
columns describe quantities associated with each of these elements. For example, consider the Iris dataset,
popularly analyzed by Ronald Fisher in 1936. We can download this dataset in the form of a Pandas DataFrame using
the Seaborn library:
In[1]: import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
Out[1]: sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Therefore, each row of the data defines a single observed flower, and the multiple rows are the total number of
flowers in the dataset. In general we will define the rows of the matrix as samples and the number of rows as
n_samples.
Each column of the data refers to a particular quantitative piece of information that describes each sample. In general,
we will refer to the columns of the matrix as features, and the number of columns as n_features.
Features matrix
This table layout makes clear that the information can be thought of as a two-dimensional numerical array or
matrix, which we will call the features matrix. By convention, this features matrix is often stored in a variable named
X.
The features matrix is assumed to be two-dimensional, with shape [n_samples, n_features], and is included in a
NumPy array or a Pandas DataFrame. However, some ScikitLearn models also accept SciPy sparse matrices. The
samples (i.e., rows) always defines the individual objects defined by the dataset.
For example, the sample can be a flower, a person, a document, an image, a sound file, a video, an astronomical
object, or anything else we can define with a set of quantitative measurements. The features (i.e., columns) always
describes the distinct observations that quantitatively represent each sample. Features are generally real-valued but
can be Boolean or discrete-valued in some methods.
Target array
In addition to the feature matrix X, we also generally work with a label or target array, which by convention we will
usually call y. The target array is usually one dimensional, with length n_samples, and is generally contained in a
NumPy array or Pan‐ das Series.
The target array can have continuous analytical values or discrete classes/labels. While some Scikit-Learn estimators
do handle multiple target values in the form of a two-dimensional [n_samples, n_targets] target array, we will
generally be working with the typical case of a one-dimensional target array.
For example, in the primary data, we can wish to generate a model that can predict the species of the flower
depends on the other measurements; in this case,the species column can be considered the feature.
In[3]: X_iris = iris.drop('species', axis=1)
X_iris.shape
Out[3]: (150, 4)
In[4]: y_iris = iris['species']
y_iris.shape
Out[4]: (150,)
Basics of the API
Most generally, the steps in using the Scikit-Learn estimator API are as follows:
Select a class of model by importing the appropriate estimator class from ScikitLearn.
Select model hyperparameters by instantiating this class with desired values.
Sequence the data into a features matrix and target vector following the discussion from before.
Fit the model to our data by calling the fit() method of the model instance.
Apply the model to new data:
For supervised learning, we predict labels for new data using the predict() method.
For unsupervised learning, we often transform or infer properties of the data using the transform() or predict()
method.
Read More: https://tutorials.ducatindia.com/data-science/data-science-using-scikit/
Thank You
Call us:
70-70-90-50-90
www.ducatindia.com

More Related Content

PPTX
Comparing EDA with classical and Bayesian analysis.pptx
PDF
Analysis using r
PPTX
Unit 3_Numpy_VP.pptx
PPTX
Introduction to a Python Libraries and python frameworks
PPTX
fINAL ML PPT.pptx
PDF
Congrats ! You got your Data Science Job
PPTX
Unit 3_Numpy_VP.pptx
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
Comparing EDA with classical and Bayesian analysis.pptx
Analysis using r
Unit 3_Numpy_VP.pptx
Introduction to a Python Libraries and python frameworks
fINAL ML PPT.pptx
Congrats ! You got your Data Science Job
Unit 3_Numpy_VP.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx

Similar to Data Science Using Scikit-Learn (20)

DOCX
Data Manipulation with Numpy and Pandas in PythonStarting with N
PDF
Introduction to Data Science With R Notes
PPTX
Machine Learning - Simple Linear Regression
PDF
Python pandas I .pdf gugugigg88iggigigih
PPTX
Lecture3.pptx
DOCX
employee turnover prediction document.docx
PDF
4 Descriptive Statistics with R
PDF
Start machine learning in 5 simple steps
PPTX
Introduction to ML_Data Preprocessing.pptx
PPTX
Bsc cs ii dfs u-1 introduction to data structure
PPTX
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
PPTX
Data mining approaches and methods
PDF
data science with python_UNIT 2_full notes.pdf
PPTX
Bca ii dfs u-1 introduction to data structure
PPTX
Unit 3_Numpy_Vsp.pptx
PPTX
Mca ii dfs u-1 introduction to data structure
PPTX
Unit 2 - Data Manipulation with R.pptx
PPTX
python-numpyandpandas-170922144956 (1).pptx
PDF
Introduction to Machine Learning with SciKit-Learn
DOC
Bt0065
Data Manipulation with Numpy and Pandas in PythonStarting with N
Introduction to Data Science With R Notes
Machine Learning - Simple Linear Regression
Python pandas I .pdf gugugigg88iggigigih
Lecture3.pptx
employee turnover prediction document.docx
4 Descriptive Statistics with R
Start machine learning in 5 simple steps
Introduction to ML_Data Preprocessing.pptx
Bsc cs ii dfs u-1 introduction to data structure
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Data mining approaches and methods
data science with python_UNIT 2_full notes.pdf
Bca ii dfs u-1 introduction to data structure
Unit 3_Numpy_Vsp.pptx
Mca ii dfs u-1 introduction to data structure
Unit 2 - Data Manipulation with R.pptx
python-numpyandpandas-170922144956 (1).pptx
Introduction to Machine Learning with SciKit-Learn
Bt0065
Ad

More from Ducat India (20)

PPTX
Join MCSA Server 2016 And 2019 Course In Noida
PPTX
Apply now for dot net training classes in Noida
PPTX
Apply now for linux training classes in noida
PPTX
Apply Now for DevOps Training Classes in Noida
PPTX
Apply Now for AutoCAD Training Course in Noida
PPTX
Amazon Elastic Load Balancing
PPTX
AWS Relation Database Services
PPTX
Microsoft Dynamics CRM – Web Resources
PPTX
Field Types
PPTX
Sprint in jira
PPTX
JIRA Versions
PPTX
Kanban Board in Jira
PPTX
Test Report Preparation
PPTX
What is Text Analysis?
PPTX
Struts 2 – Database Access
PPTX
Struts 2 – Interceptors
PPTX
Struts 2 – Architecture
PPTX
Hibernate 5 – merge() Example
PPTX
Hibernate Object States – Transient,Persistent and Detached
PPTX
Spring – Java-based Container Configuration
Join MCSA Server 2016 And 2019 Course In Noida
Apply now for dot net training classes in Noida
Apply now for linux training classes in noida
Apply Now for DevOps Training Classes in Noida
Apply Now for AutoCAD Training Course in Noida
Amazon Elastic Load Balancing
AWS Relation Database Services
Microsoft Dynamics CRM – Web Resources
Field Types
Sprint in jira
JIRA Versions
Kanban Board in Jira
Test Report Preparation
What is Text Analysis?
Struts 2 – Database Access
Struts 2 – Interceptors
Struts 2 – Architecture
Hibernate 5 – merge() Example
Hibernate Object States – Transient,Persistent and Detached
Spring – Java-based Container Configuration
Ad

Recently uploaded (20)

PPTX
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
PDF
English Language Teaching from Post-.pdf
PPTX
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
PDF
Phylum Arthropoda: Characteristics and Classification, Entomology Lecture
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
How to Manage Bill Control Policy in Odoo 18
PDF
Sunset Boulevard Student Revision Booklet
PDF
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PPTX
IMMUNIZATION PROGRAMME pptx
PPTX
Presentation on Janskhiya sthirata kosh.
PPTX
How to Manage Global Discount in Odoo 18 POS
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PPTX
Introduction and Scope of Bichemistry.pptx
PDF
5.Universal-Franchise-and-Indias-Electoral-System.pdfppt/pdf/8th class social...
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
English Language Teaching from Post-.pdf
UNDER FIVE CLINICS OR WELL BABY CLINICS.pptx
Phylum Arthropoda: Characteristics and Classification, Entomology Lecture
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
How to Manage Bill Control Policy in Odoo 18
Sunset Boulevard Student Revision Booklet
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
102 student loan defaulters named and shamed – Is someone you know on the list?
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
IMMUNIZATION PROGRAMME pptx
Presentation on Janskhiya sthirata kosh.
How to Manage Global Discount in Odoo 18 POS
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
Introduction and Scope of Bichemistry.pptx
5.Universal-Franchise-and-Indias-Electoral-System.pdfppt/pdf/8th class social...
UPPER GASTRO INTESTINAL DISORDER.docx

Data Science Using Scikit-Learn

  • 1. Welcome to Ducat India Language | Industrial Training | Digital Marketing | Web Technology | Testing+ | Database | Networking | Mobile Application | ERP | Graphic | Big Data | Cloud Computing Apply Now Training and Certification Call us: 70-70-90-50-90 www.ducatindia.com
  • 2. Data Science Using Scikit-Learn Several Python libraries offer solid execution of a range of machine learning algorithms. One of the best called is Scikit-Learn, a package that supports accurate versions of a large number of standard algorithms. A clean, uniform features and Scikit-Learn, and streamlined API, as well as by beneficial and complete online documentation. Data Representation in Scikit-Learn Machine learning is about generating models from data: for that reason, we will start by discussing how data can be represented to be learned by the computer. The best method to thought about data within Scikit-Learn is in terms of tables of data. Data as table A virtual table is a two-dimensional grid of data, in which the rows describe single elements of the dataset, and the columns describe quantities associated with each of these elements. For example, consider the Iris dataset, popularly analyzed by Ronald Fisher in 1936. We can download this dataset in the form of a Pandas DataFrame using the Seaborn library: In[1]: import seaborn as sns iris = sns.load_dataset('iris') iris.head()
  • 3. Out[1]: sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa Therefore, each row of the data defines a single observed flower, and the multiple rows are the total number of flowers in the dataset. In general we will define the rows of the matrix as samples and the number of rows as n_samples. Each column of the data refers to a particular quantitative piece of information that describes each sample. In general, we will refer to the columns of the matrix as features, and the number of columns as n_features.
  • 4. Features matrix This table layout makes clear that the information can be thought of as a two-dimensional numerical array or matrix, which we will call the features matrix. By convention, this features matrix is often stored in a variable named X. The features matrix is assumed to be two-dimensional, with shape [n_samples, n_features], and is included in a NumPy array or a Pandas DataFrame. However, some ScikitLearn models also accept SciPy sparse matrices. The samples (i.e., rows) always defines the individual objects defined by the dataset. For example, the sample can be a flower, a person, a document, an image, a sound file, a video, an astronomical object, or anything else we can define with a set of quantitative measurements. The features (i.e., columns) always describes the distinct observations that quantitatively represent each sample. Features are generally real-valued but can be Boolean or discrete-valued in some methods. Target array In addition to the feature matrix X, we also generally work with a label or target array, which by convention we will usually call y. The target array is usually one dimensional, with length n_samples, and is generally contained in a NumPy array or Pan‐ das Series. The target array can have continuous analytical values or discrete classes/labels. While some Scikit-Learn estimators do handle multiple target values in the form of a two-dimensional [n_samples, n_targets] target array, we will generally be working with the typical case of a one-dimensional target array.
  • 5. For example, in the primary data, we can wish to generate a model that can predict the species of the flower depends on the other measurements; in this case,the species column can be considered the feature. In[3]: X_iris = iris.drop('species', axis=1) X_iris.shape Out[3]: (150, 4) In[4]: y_iris = iris['species'] y_iris.shape Out[4]: (150,)
  • 6. Basics of the API Most generally, the steps in using the Scikit-Learn estimator API are as follows: Select a class of model by importing the appropriate estimator class from ScikitLearn. Select model hyperparameters by instantiating this class with desired values. Sequence the data into a features matrix and target vector following the discussion from before. Fit the model to our data by calling the fit() method of the model instance. Apply the model to new data: For supervised learning, we predict labels for new data using the predict() method. For unsupervised learning, we often transform or infer properties of the data using the transform() or predict() method. Read More: https://tutorials.ducatindia.com/data-science/data-science-using-scikit/