How To Convert Sklearn Dataset To Pandas Dataframe In Python
Last Updated :
27 Jun, 2022
In this article, we look at how to convert sklearn dataset to a pandas dataframe in Python.
Sklearn and pandas are python libraries that are used widely for data science and machine learning operations. Pandas is majorly focused on data processing, manipulation, cleaning, and visualization whereas sklearn library provides a vast list of tools and functions to train machine learning models.
Example 1: Convert Sklearn Dataset(iris) To Pandas Dataframe
Here we imported the iris dataset from the sklearn library. We then load this data by calling the load_iris() method and saving it in the iris_data named variable. This variable has the type sklearn.utils._bunch.Bunch. The iris_data has different attributes, namely, data, target, frame, target_names, DESCR, feature_names, filename, data_module. We will make use of the data and feature_names attribute. The data attribute returns the complete data matrix for the iris dataset. The feature_names attribute returns a list of column names to consider for the data.
Python3
# IMPORT THE PANDAS LIBRARY
# TO USE THE DATAFRAME TOOL
import pandas as pd
# IMPORT THE IRIS DATA FROM THE
# SKLEARN MODULE
from sklearn.datasets import load_iris
# LOAD THE IRIS DATASET BY CALLING
# THE FUNCTION
iris_data = load_iris()
# PLACE THE IRIS DATA IN A PANDAS
# DATAFRAME
df = pd.DataFrame(data=iris_data.data,
columns=iris_data.feature_names)
# DISPLAY FIRST 5 RECORDS OF THE
# DATAFRAME
df.head()
Output:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Example 2: Convert Sklearn Dataset(diabetes) To Pandas Dataframe
In this example, we will create a function named convert_to_dataframe that will help us to convert the sklearn datasets to pandas dataframe. This function will require one parameter i.e. sk_data which is the sklearn dataset and return a pandas dataframe format of this data. We are using sklearn's diabetes dataset in this example.
Python3
# IMPORT THE PANDAS LIBRARY
# TO USE THE DATAFRAME TOOL
import pandas as pd
# IMPORT THE DIABETES DATA FROM THE
# SKLEARN MODULE
from sklearn.datasets import load_diabetes
# CREATE THE `convert_to_dataframe()
# FUNCTION
from sklearn.utils._bunch import Bunch
def convert_to_dataframe(sk_data: Bunch):
if not isinstance(sk_data, Bunch):
raise Exception("Not a sklearn dataset")
return pd.DataFrame(data=sk_data.data,
columns=sk_data.feature_names)
# LOAD THE DIABETES DATA USING
# THE `convert_to_dataframe()` FUNCTION
diabetes_data = convert_to_dataframe(sk_data=load_diabetes())
# DISPLAY FIRST 5 RECORDS OF THE DATAFRAME
diabetes_data.head()
Output:
    age    sex    bmi     bp     s1     s2     s3  \
0 Â 0.038076 Â 0.050680 Â 0.061696 Â 0.021872 -0.044223 -0.034821 -0.043401 Â Â
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 Â 0.074412 Â Â
2 Â 0.085299 Â 0.050680 Â 0.044451 -0.005670 -0.045599 -0.034194 -0.032356 Â Â
3 -0.089063 -0.044642 -0.011595 -0.036656 Â 0.012191 Â 0.024991 -0.036038 Â Â
4 Â 0.005383 -0.044642 -0.036385 Â 0.021872 Â 0.003935 Â 0.015596 Â 0.008142 Â Â
     s4     s5     s6 Â
0 -0.002592 Â 0.019907 -0.017646 Â
1 -0.039493 -0.068332 -0.092204 Â
2 -0.002592 Â 0.002861 -0.025930 Â
3 Â 0.034309 Â 0.022688 -0.009362 Â
4 -0.002592 -0.031988 -0.046641Â
Similar Reads
How to convert pandas DataFrame into SQL in Python? In this article, we aim to convert the data frame into an SQL database and then try to read the content from the SQL database using SQL queries or through a table. Convert Pandas DataFrame into SQL in PythonBelow are some steps by which we can export Python dataframe to SQL file in Python: Step 1: I
4 min read
How to Convert Floats to Strings in Pandas DataFrame? In this post, we'll see different ways to Convert Floats to Strings in Pandas Dataframe? Pandas Dataframe provides the freedom to change the data type of column values. We can change them from Integers to Float type, Integer to String, String to Integer, Float to String, etc. There are three methods
4 min read
How to Convert Pandas to PySpark DataFrame ? In this article, we will learn How to Convert Pandas to PySpark DataFrame. Sometimes we will get csv, xlsx, etc. format data, and we have to store it in PySpark DataFrame and that can be done by loading data in Pandas then converted PySpark DataFrame. For conversion, we pass the Pandas dataframe int
3 min read
Python - Convert dict of list to Pandas dataframe In this article, we will discuss how to convert a dictionary of lists to a pandas dataframe. Method 1: Using DataFrame.from_dict() We will use the from_dict method. This method will construct DataFrame from dict of array-like or dicts. Syntax: pandas.DataFrame.from_dict(dictionary) where dictionary
2 min read
How to Convert a List to a DataFrame Row in Python? In this article, we will discuss how to convert a list to a dataframe row in Python. Method 1: Using T function This is known as the Transpose function, this will convert the list into a row. Here each value is stored in one column. Syntax: pandas.DataFrame(list).T Example: Python3 # import pandas m
3 min read