Ip Project
Ip Project
Teachers signature :
2
I Kristi Saha of class "XII-Sci" would like to express my
sincere gratitude to all these individuals for mentoring and
supporting me in completing this project on the topic -
"Dataframe". My teacher ,"Mr. Neeraj Sharsar" for
providing me with invaluable insights and direction and for
fostering an environment of learning and creativity within
our school. To my parents, their constant encouragement,
patience, and understanding have been the pillars of my
success. I am grateful to my friends who contributed ideas
and perspectives that enriched the project. Thank you
everyone for shaping this project and enhancing my
learning experience.
Kristi Saha
Class- XII „Sci‟
3
SL NO. Content page no.
1. Dataframe 5
2. Programs 6-19
3. Conclusion 20
4. Bibliograph 21
4
A DataFrame is a data structure that organizes data into a
2 dimensional table of rows and columns, much like a
spreadsheet. DataFrames are one of the most common
data structures used in modern data analytics because they
are a flexible and intuitive way of storing and working with
data.
Every DataFrame contains a blueprint, known as a
schema, that defines the name and data type of each
column. Spark DataFrames can contain universal data
types like StringType and IntegerType, as well as data
types that are specific to Spark, such as StructType.
Missing or incomplete values are stored as null values in
the DataFrame.
A simple analogy is that a DataFrame is like a spreadsheet
with named columns. However, the difference between
them is that while a spreadsheet sits on one computer in
one specific location, a DataFrame can span thousands of
computers.
In this way, DataFrames make it possible to do analytics
on big data, using distributed computing clusters. The
reason for putting the data on more than one computer
should be intuitive: either the data is too large to fit on one
machine or it would simply take too long to perform that
computation on one machine.
5
import pandas as pd
dt= *'City’: ['Delhi','Mumbai','Kolkata','Chennai'],
'Hospitals’: [189,208,149,157],
'schools’: [7916,8508,7226,7617]}
dtf=pd.DataFrame(dt)
print(dtf)
6
import pandas as pd
dt={'Yr1':[34500,56000,47000,49000],
'Yr2':[44900,46100,57000,59000]}
dtf=pd.DataFrame(dt,index=['Qtr1','Qtr2','Qtr3','Qtr4'])
print(dtf)
7
import pandas as pd
dt={'Rollno':[115,236,307,422],
'Name':['Pavni','Rishi','Preet','Parul'],
'Marks':[97.5,98.0,98.5,98.0]}
dtf=pd.DataFrame(dt)
print(dtf)
8
import pandas as pd
dt={'Zone1':[56000,58000],
'Zone2':[70000,68000],
'Zone3':[75000,78000],
'Zone4':[60000,61000]}
dtf=pd.DataFrame(dt,index=['Target','Sales'])
print(dtf)
9
import pandas as pd
r1=[101,113,124]
r2=[130,140,200]
r3=[115,216,217]
combine=[r1,r2,r3]
df=pd.DataFrame(combine)
print(df)
10
import pandas as pd
df={'city':['Delhi','Bengaluru','Chennai','Mumbai'],
'Maxtemp':[40,31,35,29],
'Mintemp':[32,25,27,21],
' Rainfall':[24.1,36.2,40.8,35.2]}
temp=pd.DataFrame(df)
print(temp)
11
import pandas as pd
data = { 'A': [50,110], 'B': [80,120], 'C': [120,130], 'D': [180,140], }
df = pd.DataFrame(data)
print(df)
df['E'] = [14, 220]
print("DataFrame after adding column E:")
print(df)
new_row = {'A': 2, 'B': 130, 'C': 140, 'D': 150, 'E': 300}
df = df._append(new_row, ignore_index=True)
print("DataFrame after adding a new row:")
print(df)
df = df.drop(columns=['A', 'C''])
print("DataFrame after removing columns A and C:")
print(df)
df = df.drop([0, 1])
print("DataFrame after removing the first and second rows:")
print (df)
12
import pandas as pd
data = { 'Product': ['cpu', 'mouse', 'keyboard', 'printer', 'hdd','cd',
'scanner', 'speaker'], 'Company': ['compaq', 'compaq', 'dell', 'hp',
'sony', 'sony','hp', 'dell'], 'qty': [40, 20, 10, 2, 500, 1000,4, 6],
'price': [9000, 400, 700, 20000, 450, 25,5500, 900] } index = [101,
102, 103, 104, 105, 106, 107,108] df1 = pd.DataFrame(data,
index=index)
print(df1)
print("Details of records 102, 104, and 106:")
print(df1.loc[[102, 104, 106]])
print("Product and Company details of records 101 and 104:")
print(df1.loc[[101, 104], ['Product', 'Company']])
print("First and third records of df1:”)
print(df1.iloc[[0, 2]])
print("Quantity and Company details of all records:")
print(df1[['qty', 'Company']])
df1.at[104, 'price'] = 50000
print("Updated DataFrame with modified price for record 104:")
print(df1)
13
print("Details of record 104:")
print(df1.loc[104])
df1.loc[[101, 102], 'Company'] = 'acer'
df1.loc[[101, 102], 'qty'] = 400
print("Updated DataFrame for company name and quantity of
records 101 and 102:")
print(df1)
df1.loc[108] = ["mic", "dell", 100, 450]
print("DataFrame after adding new record:")
print(df1)
14
import pandas as pd
data2 = { 'Bno': [1, 2, 3, 4], 'name': ['Sunil Grover', 'sourav
ganguli', 'virat kohli', 'rahul dravid'], 'score1': [60, 65, 70, 80],
'score2': [70, 45, 90, 70] }
batsman = pd.DataFrame(data2)
print(batsman)
batsman['total'] = batsman['score1'] + batsman['score2']
print('Dataframe after adding total column is')
print(batsman)
print('lowest score of score 1 is', batsman['score1'].min())
print("Highest score of score2:", batsman['score2'].max())
batsman.index = ['player1', 'player2', 'player3', 'player4']
print('DataFrame with new index:')
print(batsman)
print("Details of batsmen with score1 < 75:")
print(batsman[batsman['score1'] < 75])
print("Names of batsmen with score1 < 75:")
15
print(batsman.loc[batsman['score1'] < 75, 'name'])
print("Name and score1 of batsmen with score1 < 75:")
print(batsman.loc[batsman['score1'] < 75, ['name', 'score1']])
batsman_sorted = batsman.sort_values(by='score2',
ascending=False)
print("DataFrame in descending order of score2:")
print(batsman_sorted) batsman.columns = ['batsmanno',
'bname', 's1', 's2', 'sum'] print("DataFrame after renaming
columns:")
print(batsman)
batsman.loc[batsman['s2'] > 75, 's1'] += 5
print("DataFrame after adding 5 to s1 where s2 > 75:")
print(batsman)
16
import pandas as pd
data_df1 = {'mark1': [10, 40, 15, 40, 10],
'mark2': [15, 45, 30, 70, 50]}
data_df2 = {'mark1': [30,20,20,40,50],
'mark2': [20, 25, 30, 10, 30]}
df1 = pd.DataFrame(data_df1,index=[0,1,2,3,5])
df2 = pd.DataFrame(data_df2,index=[0,1,2,4,3])
print('df1')
print(df1)
print('df2')
print(df2) df_sum = df1 + df2
print("Result of adding df1 and df2:")
print(df_sum) df1 += 10
print("DataFrame df1 after adding 10 to all values:")
print(df1)
df1['mark1'] += 5 print("DataFrame df1 after adding 5 to mark1
column:")
print(df1)
d2 = df1.add(df2, fill_value=0)
print("Result of adding df1 into df2:")
print(d2)
17
18
import matplotlib.pyplot as plt
overs = [5, 10, 15, 20]
runs = [45, 79, 145, 234]
plt.figure(figsize=(8, 5))
plt.plot(overs, runs, marker='o', linestyle='-', color='b',
label='Runs')
plt.xlabel('Overs')
plt.ylabel('Runs')
plt.title('Run Rate of T20 Match')
plt.legend()
plt.grid(True)
plt.show()
19
The concept of a DataFrame is common across many
different languages and frameworks. DataFrames are the
main data type used in pandas, the popular Python data
analysis library, and DataFrames are also used in R, Scala,
and other languages.
20
> https://www.databricks.com
> docs.python.org
> pandas.pydata.org
> realpython.com
> www.datacamp.com
21