0% found this document useful (0 votes)
23 views10 pages

Data Frame Creation

A DataFrame is a 2D collection of rows and columns used in Pandas for data manipulation. The document outlines how to create a DataFrame, perform operations such as adding columns, handling null values, and merging columns. It also covers scaling categorical values and generating random data for DataFrames.

Uploaded by

qubefexe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views10 pages

Data Frame Creation

A DataFrame is a 2D collection of rows and columns used in Pandas for data manipulation. The document outlines how to create a DataFrame, perform operations such as adding columns, handling null values, and merging columns. It also covers scaling categorical values and generating random data for DataFrames.

Uploaded by

qubefexe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DATA FRAME

CREATION
WHAT IS A
DATA FRAME ?

•A Data frame is a
collection of rows and
columns , it loads into
2D row and column
format

•Pandas reads any file


that we upload into
Data frame format.
CREATING A DATA FRAME

Creating Data frame

df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],


index=[4, 5, 6], columns=['A', 'B', 'C'])
df

d= {'col1': [0, 1, 2, 3], 'col2': pd.Series ([2, 3],


index=[2, 3])}
df=pd.DataFrame(data=d, index-[0, 1, 2, 3])
OPERATIONS ON FROM DATA FRAME
Use an existing columns to create a new one

df['new_column'] = df['column_1'] + df['column_2’]


ADDING A COLUMN IN DATAFRAME

df.append({'column_1': 1, 'column_2': 2},ignore_index=True)

df.insert(loc=1, column="Stars", value=[2,2,3,4])


df
NULL VALUES HANDLING
find all the null values present in the
columns

df.isnull( ).sum( )

drop rows with missing values


df.dropna(inplace=true)
df.drop(index=0,axis=0)
FILLING MISSING VALUES IN A DATAFRAME

To fill the missing values from mean

df.fillna(df.mean())

df.rename(columns={'old_name':'new_name’})

drop the particular column from particular data frame


df.drop(['column3'], axis=1,inplace=True)
HANDLING DATA SET
• Merging two columns into a single column
df['new_column'] = df['column_1'] + df['column_2’]

set the index to adefault integer index starting from 0

df.reset_index()
Renaming the column
df.rename(columns={'old_name': 'new_name'})
HANDLING CATEGORICAL VALUES

• Standard scaler removes the mean and scales each feature/variable to


unit variance
• 'rem" with no value and scaling the column1. eg: if I have a vlues in
column such as REMOO1
• df['column1’] = df['column1'].str.replace('rem’, ‘ ‘ )

ss= StandardScaler()

df['column'] = ss.fit_transform(df['column'].values.reshape(-1, 1))


Generating multiple values
import random
import pandas as pd
[w1,w2,w3,w4,w5,w6,w7] = [0.367,-0.0327,0.509,0.491,-0.226,1.142,-0.169]
vals = []
for i in range(1000000):
x1 = random.randint(1, 10)
x2 = random.randint(1, 5)
x3 = random.randint(0, 100)
x4 = random.randint(0, 70)
x5 = random.randint(1, 2)
x6 = random.randint(1, 10)
x7 = random.randint(1, 2)
eq = w1*x1+w2*x2+w3*x3+w4*x4+w5*x5+w6*x6+w7*x7
vals.append([x1,x2,x3,x4,x5,x6,x7,eq])
df =
pd.DataFrame(vals,columns=['Column1','Column2','Column3','Column4','Column5','Column6'
,'Column7','Column8'])
df.to_csv(‘File Name.csv',index=False)

You might also like