0% found this document useful (0 votes)
22 views7 pages

2-Introduction To Data Cleaning P02

The document provides an introduction to various DataFrame methods used for data cleaning in Python. It includes functions for summarizing data, counting non-null values, retrieving column names, and calculating cumulative sums, among others. Each method is briefly described, highlighting its purpose and functionality.

Uploaded by

mymopop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

2-Introduction To Data Cleaning P02

The document provides an introduction to various DataFrame methods used for data cleaning in Python. It includes functions for summarizing data, counting non-null values, retrieving column names, and calculating cumulative sums, among others. Each method is briefly described, highlighting its purpose and functionality.

Uploaded by

mymopop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Introduction to Data Cleaning

1. df.info()
Displays a summary of the DataFrame, including the number of non-null
values per column, data types, and memory usage.

2. df.count()
Counts the number of non-null (non-missing) values in each column.

3. df.columns
Returns a list of column names in the DataFrame.
4. df.cumsum()
Computes the cumulative sum for each numerical column.

5. df.nsmallest(n, column_name)
Returns the n smallest rows based on the values in the specified column.
6. df.nlargest(n, column_name)
Returns the n largest rows based on the values in the specified column.

7. df.sum()
Computes the sum of values for each numerical column.

8. df.idxmax()
Returns the index (row label) of the maximum value in each column.
9. df.idxmin()
Returns the index (row label) of the minimum value in each column.

10. df.shape
Returns the number of rows and columns in the DataFrame as a tuple.

11. df.head(n=5)
Returns the first n rows of the DataFrame (default is 5).

12. df.tail(n=5)
Returns the last n rows of the DataFrame (default is 5).
13. df.sample(frac=0.5)
Returns a random sample containing frac percentage of rows (here, 50%).

14. df.index
Returns an Index object representing the row labels of the DataFrame.

15. df.value_counts()
Counts the number of occurrences of each unique value in a specific
column (usually used as df['column_name'].value_counts()).

16. df.isnull().sum()
Counts the number of missing (NaN) values in each column

You might also like