GR Py 14
GR Py 14
SRN No 31242529
Roll No 110
Division E
Assignment No 14
Project Based Learning-Python
Assignment - 14
Title/Problem Statement:
Write a program to create a DataFrame from a dictionary of lists. Use methods like head(), tail(), info(), and
describe() to explore and summarize the DataFrame
Description:
In this exercise, you will write a program to create a DataFrame using a dictionary of lists in Python's
pandas library. You will utilize methods such as `head()` to view the first few rows,
`tail()` to see the last few rows, `info()` to get a summary of the DataFrame’s structure and data types, and
`describe()` to generate descriptive statistics. This exercise helps in understanding DataFrame creation and
basic data exploration techniques.
Theory:
Creating a DataFrame from a dictionary of lists is a common task in data analysis using the pandas library
in Python. A DataFrame is a two-dimensional labeled data structure with columns of potentially different
types, similar to a table in a database or an Excel spreadsheet. Here’s a step-by-step guide to achieve this:
1. Importing pandas: First, you need to import the pandas library, which provides the
DataFrame structure.
import pandas as pd
2. Creating a Dictionary of Lists: Construct a dictionary where the keys are column names and the
values are lists representing the data for each column.
3. Creating the DataFrame: Use the pandas `DataFrame` constructor to create a DataFramefrom
the dictionary.
df = pd.DataFrame(data)
Project Based Learning-Python
4. Exploring the DataFrame: Utilize various methods to explore and summarize theDataFrame:
head(): Displays the first few rows of the DataFrame (default is 5).
print(df.head())
tail(): Displays the last few rows of the DataFrame (default is 5).
print(df.tail())
info(): Provides a concise summary of the DataFrame, including the index dtype
andcolumn dtypes, non-null values, and memory usage.
print(df.info())
print(df.describe())
These methods are fundamental for initial data exploration, helping you understand the structure and basic
statistics of the DataFrame. This process is essential in the data analysis pipeline, providing insights and
guiding further data cleaning, processing, and analysis steps.
By following these steps, you can efficiently create, explore, and summarize data using pandas.
Project Based Learning-Python
Experimental Setup / Experimental Outcome:
#Write a program to create a DataFrame from a dictionary of lists. Use methods like head(),tail(),
info(), and
#describe() to explore and summarize the
DataFrameimport pandas as pd
import pandas as pd
df = pd.DataFrame(data)
OUTPUT:
Project Based Learning-Python
Explanation:
1. Creating the Dictionary: The data dictionary contains three keys ('Name', 'Age', 'Salary'),
each mapped to a list of values.
2. Creating the DataFrame: The pd.DataFrame(data) function converts the dictionary into a
DataFrame.
3. Exploring the DataFrame:
o head(): Shows the first 5 rows of the DataFrame.
o tail(): Shows the last 5 rows of the DataFrame.
o info(): Provides a summary of the DataFrame, including column data types and non-null
counts.
o describe(): Computes and displays descriptive statistics for numeric columns, such as
mean, standard deviation, and range.
This program provides a comprehensive way to initialize a DataFrame and analyze its basic structure and
statistics, making it easier to understand the dataset's characteristics.
1. Creating a DataFrame: The program starts by defining a dictionary called data with keys representing
column names ('Name', 'Age', and 'Salary') and values as lists containing the data for each column. Then,
pd.DataFrame(data) creates a DataFrame from this dictionary.
2. Displaying the First Few Rows: Using df.head() shows the first 5 rows of the DataFrame, giving a quick
look at the initial entries.
3. Displaying the Last Few Rows: Using df.tail() shows the last 5 rows, useful for viewing recent or ending
data in the dataset.
4. Displaying DataFrame Information: df.info() displays metadata, such as column names, data types,
number of non-null entries, and memory usage. This is helpful for understanding the structure and format
of the data.
5. Displaying Basic Statistics: df.describe() calculates summary statistics (count, mean, std, min, max, etc.)
for numeric columns ('Age' and 'Salary' in this case), providing a quick statistical overview of the data's
distribution.
Project Based Learning-Python
Conclusion:
In conclusion, creating a DataFrame from a dictionary of lists and utilizing methods like`head()`, `tail()`,
`info()`, and `describe()` provides a powerful way to manage and analyze data in Python using the pandas library.
By constructing a DataFrame, you can effectively organize data into a structured format, and these methods offer
valuable tools for initial data exploration. `head()` and `tail()` allow you to quickly view the beginning and end of
your dataset, while `info()` provides a summary of the DataFrame's structure and data types. The `describe()`
method gives a statistical overview of numeric data, helping to understand data distribution and key metrics.
Together, these techniques facilitate a comprehensive understanding of your data, supporting effective analysis and
decision- making.
In conclusion, this program demonstrates how to create a DataFrame from a dictionary of lists and explore it using
pandas methods. By displaying the first and last few rows (head() and tail()), examining structural information
(info()), and summarizing numeric data (describe()), we gain a quick yet comprehensive view of the dataset. This
process helps identify data patterns, check for missing values, and understand the distribution of numeric columns.
Such initial exploration is essential for effectively preparing data for further analysis or visualization
Project Based Learning-Python