
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Transform Input Data Array Using Scikit-Learn Pipelining Tools
Scikit−learn, commonly known as sklearn is a library in Python that is used for the purpose of implementing machine learning algorithms. It is an open−source library hence it can be used free of cost.
It is powerful and robust, since it provides a wide variety of tools to perform statistical modelling. This includes classification, regression, clustering, dimensionality reduction, and much more with the help of a powerful, and stable interface in Python.
This library is built on Numpy, SciPy and Matplotlib libraries.
It can be installed using the ‘pip’ command as shown below −
pip install scikit−learn
This library focuses on data modelling.
The streamlining operation can be implemented using the ‘Pipeline’ function, that can convert an array of specific dimensions to an array of different dimensions.
Following is an example −
Example
from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline import numpy as np print("Creating object of the tool pipeline") Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))]) x = np.arange(6) print("The size of the original ndarray is") print(x.shape) y = 4 − 2 * x + x ** 2 - x ** 3.5 Stream_model = Stream_model.fit(x[:, np.newaxis], y) print("Input polynomial coefficients are") print(Stream_model.named_steps['linear'].coef_)
Output
Creating object of the tool pipeline The size of the original ndarray is (6,) Input polynomial coefficients are [ 4.31339202 −7.82933051 7.96372751 −3.39570215]
Explanation
The required packages are imported, and they are given alias names for ease of use.
The ‘Pipeline’ function is used to create a pipeline of the entire process.
The values for data points ‘x’ and ‘y’ are generated using NumPy library.
The ‘LinearRegression’ function is called.
The details of the data generated is displayed on the console.
The model created using the ‘Pipeline’ function is fit to the data.
The Linear coefficients of the data are displayed on the console.