0% found this document useful (0 votes)
31 views25 pages

Acknowledgement

Uploaded by

deyr9295
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views25 pages

Acknowledgement

Uploaded by

deyr9295
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Acknowledgement

I, Debangshu Karmakar of class "XII-Sci," would like to extend


my heartfelt gratitude to everyone who guided and supported
me in completing this project on the topic "Dataframe."

I am grateful to my teacher, "Mr. Neeraj Sharsar," for providing


valuable insights and fostering a learning environment that
encourages creativity and exploration. I also want to thank my
parents for their unwavering encouragement, patience, and
understanding, which have been the foundations of my
achievements.

Lastly, I appreciate my friends for their valuable ideas and


perspectives that contributed to the project’s enrichment.
Thank you all for making this project a rewarding learning
experience.

Debangshu Karmakar
“XII” – Science

1.
Index

Sl No. Topic Page No.


1 Introduction 3
2 Programs 4-23
3 Conclusion 24
4 Bibliography 25

2.
Introduction

What is a DataFrame?

A DataFrame is a structured data format that organizes information


into a two-dimensional table of rows and columns, resembling a
spreadsheet. This structure is one of the most widely used in modern
data analytics, as it provides a flexible and intuitive way to handle and
analyze data.

Each DataFrame has a "schema," which acts as a blueprint that outlines


the name and data type of each column. DataFrames in Spark, for
example, can contain generic data types like StringType and
IntegerType, as well as Spark-specific types, such as StructType.
Missing or incomplete values are typically stored as null values in a
DataFrame.

A useful analogy is to think of a DataFrame as a digital spreadsheet


with labeled columns. Unlike traditional spreadsheets, though,
DataFrames can span thousands of computers, allowing large-scale
data analytics and computation across distributed clusters.

The main reason for distributing data across multiple computers is


either because the data volume is too large to fit on a single machine
or because performing calculations on a single machine would take too
long.

3.
Programs

import pandas as pd

dt= {'City’: ['Delhi','Mumbai','Kolkata','Chennai'],'Hospitals’: [189,208,149,157],'schools’:


[7916,8508,7226,7617]}

dtf=pd.DataFrame(dt)
print(dtf)

4.
import pandas as pd
dt={'Yr1':[34500,56000,47000,49000],'Yr2':[44900,46100,57000,59000]}
dtf=pd.DataFrame(dt,index=['Qtr1','Qtr2','Qtr3','Qtr4'])
print(dtf)

5.
import pandas as pd
dt={'Rollno':[115,236,307,422],'Name':['Pavni','Rishi','Preet','Parul'],'Marks':[97.5,98.0,98.5,98.0]}
dtf=pd.DataFrame(dt)
print(dtf)

6.
import pandas as pd
dt={'Zone1':[56000,58000],'Zone2':[70000,68000],'Zone3':[75000,78000],'Zone4':[60000,61000]}
dtf=pd.DataFrame(dt,index=['Target','Sales'])
print(dtf)

7.
import pandas as pd
r1=[101,113,124]
r2=[130,140,200]
r3=[115,216,217]
combine=[r1,r2,r3]
df=pd.DataFrame(combine)
print(df)

8.
import pandas as pd
df={'city':['Delhi','Bengaluru','Chennai','Mumbai'],'Maxtemp':[40,31,35,29],'Mintemp':[32,25,27,21],'
Rainfall':[24.1,36.2,40.8,35.2]}
temp=pd.DataFrame(df)
print(temp)

9.
import pandas as pd
data = {
'A': [50,110],
'B': [80,120],
'C': [120,130],
'D': [180,140],
}
df = pd.DataFrame(data)
print(df)
df['E'] = [14, 220]
print("DataFrame after adding column E:")
print(df)
new_row = {'A': 2, 'B': 130, 'C': 140, 'D': 150, 'E': 300}
df = df._append(new_row, ignore_index=True)
print("DataFrame after adding a new row:")
print(df)
df = df.drop(columns=['A', 'C'])
print("DataFrame after removing columns A and C:")
print(df)
df = df.drop([0, 1])
print("DataFrame after removing the first and second rows:")
print(df)

10.
11.
import pandas as pd

data = {

'Product': ['cpu', 'mouse', 'keyboard', 'printer', 'hdd','cd', 'scanner', 'speaker'],

'Company': ['compaq', 'compaq', 'dell', 'hp', 'sony', 'sony','hp', 'dell'],

'qty': [40, 20, 10, 2, 500, 1000,4, 6],

'price': [9000, 400, 700, 20000, 450, 25,5500, 900]

index = [101, 102, 103, 104, 105, 106, 107,108]

df1 = pd.DataFrame(data, index=index)

print(df1)

print("Details of records 102, 104, and 106:")

print(df1.loc[[102, 104, 106]])

print("Product and Company details of records 101 and 104:")

print(df1.loc[[101, 104], ['Product', 'Company']])

print("First and third records of df1:")

12.
print(df1.iloc[[0, 2]])

print("Quantity and Company details of all records:")

print(df1[['qty', 'Company']])

df1.at[104, 'price'] = 50000

print("Updated DataFrame with modified price for record 104:")

print(df1)

print("Details of record 104:")

print(df1.loc[104])

df1.loc[[101, 102], 'Company'] = 'acer'

df1.loc[[101, 102], 'qty'] = 400

print("Updated DataFrame for company name and quantity of records 101 and 102:")

print(df1)

df1.loc[108] = ["mic", "dell", 100, 450]


print("DataFrame after adding new record:")

print(df1)

13.
14.
15.
import pandas as pd
data2 = {
'Bno': [1, 2, 3, 4],
'name': ['Sunil Grover', 'sourav ganguli', 'virat kohli', 'rahul dravid'],
'score1': [60, 65, 70, 80],
'score2': [70, 45, 90, 70]
}
batsman = pd.DataFrame(data2)
print(batsman)
batsman['total'] = batsman['score1'] + batsman['score2']

16.
print('Dataframe after adding total column is')
print(batsman)
print('lowest score of score 1 is', batsman['score1'].min())
print("Highest score of score2:", batsman['score2'].max())
batsman.index = ['player1', 'player2', 'player3', 'player4']
print('DataFrame with new index:')
print(batsman)
print("Details of batsmen with score1 < 75:")
print(batsman[batsman['score1'] < 75])
print("Names of batsmen with score1 < 75:")
print(batsman.loc[batsman['score1'] < 75, 'name'])
print("Name and score1 of batsmen with score1 < 75:")
print(batsman.loc[batsman['score1'] < 75, ['name', 'score1']])
batsman_sorted = batsman.sort_values(by='score2', ascending=False)
print("DataFrame in descending order of score2:")
print(batsman_sorted)
batsman.columns = ['batsmanno', 'bname', 's1', 's2', 'sum']
print("DataFrame after renaming columns:")
print(batsman)
batsman.loc[batsman['s2'] > 75, 's1'] += 5
print("DataFrame after adding 5 to s1 where s2 > 75:")
print(batsman)

17.
18.
import pandas as pd

data_df1 = {'mark1': [10, 40, 15, 40, 10], 'mark2': [15, 45, 30, 70, 50]}

data_df2 = {'mark1': [30,20,20,40,50], 'mark2': [20, 25, 30, 10, 30]}

df1 = pd.DataFrame(data_df1,index=[0,1,2,3,5])

df2 = pd.DataFrame(data_df2,index=[0,1,2,4,3])

print('df1')

print(df1)

print('df2')

print(df2)

df_sum = df1 + df2

print("Result of adding df1 and df2:")

print(df_sum)

df1 += 10

print("DataFrame df1 after adding 10 to all values:")

print(df1)

19.
df1['mark1'] += 5

print("DataFrame df1 after adding 5 to mark1 column:")

print(df1)

d2 = df1.add(df2, fill_value=0)

print("Result of adding df1 into df2:")

print(d2)

20.
21.
import matplotlib.pyplot as plt

overs = [5, 10, 15, 20]


runs = [45, 79, 145, 234]
plt.figure(figsize=(8, 5))
plt.plot(overs, runs, marker='o', linestyle='-', color='b', label='Runs')
plt.xlabel('Overs')
plt.ylabel('Runs')
plt.title('Run Rate of T20 Match')
plt.legend()
plt.grid(True)
plt.show()

22.
23.
Conclusion

The DataFrame is a highly versatile and widely adopted data structure that serves
as a cornerstone in data manipulation and analysis across various programming
languages and frameworks. In Python, it is a central component of the pandas
library, which is one of the most popular tools for data analysis and manipulation
in the data science ecosystem.

A DataFrame can be thought of as a table-like structure, similar to a spreadsheet


or a SQL table, where data is organized into rows and columns. It allows users to
store and manipulate large datasets efficiently while providing powerful
functionality for tasks such as data cleaning, filtering, aggregation, and
visualization. The ease of use and flexibility of DataFrames make them an essential
tool for data scientists, analysts, and engineers.

Beyond Python, the concept of DataFrames is also implemented in other


programming environments. For example, the R programming language offers a
DataFrame structure that has been a fundamental part of statistical computing for
decades. Similarly, the Apache Spark framework in Scala (and other languages)
provides a DataFrame API designed for large-scale data processing and distributed
computing. These implementations share the common goal of enabling users to
handle structured data intuitively while maintaining high performance.

The universality of DataFrames across different programming ecosystems


underscores their importance in the field of data analysis, providing a consistent
and powerful toolset regardless of the specific language or framework being used.
This consistency simplifies the learning curve for users transitioning between tools
while fostering collaboration among data professionals working in diverse
environments.

24.
Bibliography

1. Databricks - DataFrame Documentation


2. Pandas - Python Data Analysis Library
3. DataCamp - Online Data Science Learning
Platform

25.

You might also like