Randomly Select Columns from Pandas DataFrame
Last Updated :
28 Mar, 2022
In this article, we will discuss how to randomly select columns from the Pandas Dataframe.
According to our requirement, we can randomly select columns from a pandas Database method where pandas df.sample() method helps us randomly select rows and columns.
Syntax of pandas sample() method:
Return a random selection of elements from an object's axis. For repeatability, you may use the random_state parameter.
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Parameters:
- n: int value, Number of random rows to generate.
- frac: Float value, Returns (float value * length of data frame values ). frac cannot be used with n.
- replace: Boolean value, return sample with replacement if True.
- random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.
- axis: 0 or ‘row’ for Rows and 1 or ‘column’ for Columns.
Method 1: Select a single column at random
In this approach firstly the Pandas package is read with which the given CSV file is imported using pd.read_csv() method is used to read the dataset. df.sample() method is used to randomly select rows and columns. axis =' columns' says that we're selecting columns. when "n" isn't specified the method returns one random column by default.
To download the CSV file click here
Python3
# import packages
import pandas as pd
# reading csv file
df =pd.read_csv('fossilfuels.csv')
pd.set_option('display.max_columns', None)
print(df.head())
# randomly selecting columns
df = df.sample(axis='columns')
print(df)
Output:

Method 2: Select a number of columns at a random state
In this approach, If the user wants to select a certain number of columns more than 1 we use the parameter 'n' for this purpose. In the below example, we give n as 5. randomly selecting 5 columns from the database.
Python3
# import packages
import pandas as pd
# reading csv file
df =pd.read_csv('fossilfuels.csv')
pd.set_option('display.max_columns', None)
print(df.head())
print()
# randomly selecting columns
df = df.sample(n=5, axis='columns')
print(df.head())
Output:

Method 3: Allow a random selection of the same column more than once (by setting replace=True)
Here, in this approach, If the user wants to select a column more than once, or if repeatability is needed in our selection we should set the replace parameter to 'True' in the df.sample() method. Column 'Bunkerfields' is repeated twice.
Python3
# import packages
import pandas as pd
# reading csv file
df =pd.read_csv('fossilfuels.csv')
pd.set_option('display.max_columns', None)
print(df.head())
print()
# randomly selecting columns
df = df.sample(n=5, axis='columns',replace='True')
print(df.head())
Output:

Method 4: Select a portion of the total number of columns at random:
Here in this approach, if the user wants to select a portion of the dataset, the frac parameter should be used. In the below example our dataset has 10 columns. 0.25 of 10 is 2.5, it is further rounded to 2. A year and GasFlaring columns are returned.
Python3
# import packages
import pandas as pd
# reading csv file
df =pd.read_csv('fossilfuels.csv')
pd.set_option('display.max_columns', None)
print(df.head())
print()
# randomly selecting columns
df = df.sample(frac=0.25, axis='columns')
print(df.head())
Output:

Similar Reads
How to Randomly Select rows from Pandas DataFrame In Pandas, it is possible to select rows randomly from a DataFrame with different methods. Randomly selecting rows can be useful for tasks like sampling, testing or data exploration.Creating Sample Pandas DataFrameFirst, we will create a sample Pandas DataFrame that we will use further in our articl
3 min read
Pandas DataFrame.columns In Pandas, DataFrame.columns attribute returns the column names of a DataFrame. It gives access to the column labels, returning an Index object with the column labels that may be used for viewing, modifying, or creating new column labels for a DataFrame.Note: This attribute doesn't require any param
2 min read
How to select multiple columns in a pandas dataframe Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. In this article, we will discuss all the different ways of selecting multiple columns
5 min read
Add zero columns to Pandas Dataframe Prerequisites: Pandas The task here is to generate a Python program using its Pandas module that can add a column with all entries as zero to an existing dataframe. A Dataframe is a two-dimensional, size-mutable, potentially heterogeneous tabular data.It is used to represent data in tabular form lik
2 min read
Show all columns of Pandas DataFrame Pandas sometimes hides some columns by default if the DataFrame is too wide. To view all the columns in a DataFrame pandas provides a simple way to change the display settings using the pd.set_option() function. This function allow you to control how many rows or columns are displayed in the output.
2 min read
Get unique values from a column in Pandas DataFrame In Pandas, retrieving unique values from DataFrame is used for analyzing categorical data or identifying duplicates. Let's learn how to get unique values from a column in Pandas DataFrame. Get the Unique Values of Pandas using unique()The.unique()method returns a NumPy array. It is useful for identi
5 min read