0% found this document useful (0 votes)
9 views

session-1 DataFrame

Data Frame

Uploaded by

Ssk Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

session-1 DataFrame

Data Frame

Uploaded by

Ssk Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

EDA

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process.

In [ ]: ====================== Data Analysis =======================


1.Pandas ------- Dataframe read and write operations
2.Numpy ------- Numerical python Math operations
3.Matplotlib ------- plots , graphs , visualization
4.Seaborn ------- plots
5.Plotly ------- plots
6.Bokhe ------- plots

====================== Machine Learning =====================


7.Sickit-learn (sklearn) ------ Model development
8.stats packages ------ Linear Regression

====================== Webscrapping and Database connection ======


9.Sqlite ------ SQL Connection
10.Beautiful soup ------ scrap the data
11.websocket ------ scrap the data

====================== Deep Learning ==========================


12.Tensorflow ------ Deep learning models development(google)
13.keras
14.pytorch ------ develop by
15.Opencv ------ computer vision(reading and writing images)
16.Pillow ------ reading images

====================== NLP ======================================


17.NLTK ----- Natural language tool kit
18.SpaCy ----- NLP Models
19.wordcloud -----

====================== Web development - API ======================


20.Flask
21.Django
22.Fask API
23.Gradio

====================== Apps creation ==============================


24.Streamlit

====================== Transformers BERT (NLP models) ==============


25.Transformers ------ Huggingface (Google)

====================== DL:Pretarained Models bject Detections =======


26.vgg16
27.Mobilenet
28.Yolo ----- Ultralytics

====================== NLP pretrained Models ========================


29.Word2Vec ----- Google
30.GloVe ----- StandforUniversity

====================== Model save ==================================


31.Pickle
32.Joblib

====================== GenAI LLM ====================================


33.Azure openAI
34.Google Gemini
35.Amazon BedRock
36.LLAMA Meta
37.Langchain Framework
====================== Model Deployment ================================
38.MLFlow

====================== Cloud Services ==================================


39.Azure ML Related packages
40.GCP vertex ai packages
41.Amazon sagemaker packages

====================== Alle NLP ======================================


42. Allen NLP packages

====================== ML using Pyspark ================================


43.MLlib package

====================== Small packages ==================================


44.random
45.math
46.time
47.logger

Step-1 : Import Packages

In [1]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Step-2 : Create a DataFrame using List

In [7]: import pandas as pd


pd.DataFrame

Out[7]: pandas.core.frame.DataFrame

In [9]: import pandas as pd


pd.DataFrame()

Out[9]:

In [13]: import pandas as pd


data=pd.DataFrame()
data

# we created a DataFrame
# But no data (no rows and no columns)
# we saved our DataFrame with a name 'data'

Out[13]:

Step-3 : Provide The Data

In [16]: name=['Navya','Sneha','Yamu']
pd.DataFrame()
Out[16]:

In [18]: name=['Navya','Sneha','Yamu']
pd.DataFrame(name)

Out[18]: 0

0 Navya

1 Sneha

2 Yamu

In [20]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
pd.DataFrame(zip(name,age))

Out[20]: 0 1

0 Navya 20

1 Sneha 21

2 Yamu 22

In [22]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
pd.DataFrame(zip(name,age,city))

Out[22]: 0 1 2

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [24]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
data=[name,age,city]
pd.DataFrame(data)

Out[24]: 0 1 2

0 Navya Sneha Yamu

1 20 21 22

2 Hyd Delhi Pune

In [26]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
df=pd.DataFrame(zip(name,age,city))
df
Out[26]: 0 1 2

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

Step-4 : Provide The Columns

Columns we need to provide in a list

The number of columns exactly match with data

Here we have 3 columns , so we need to create a list with 3 names

In [30]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
df=pd.DataFrame(zip(name,age,city),columns=cols)
df

Out[30]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

Step-5 : Provide the Index

In [33]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=[1,2,3]
df=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df

Out[33]: Names Age City

1 Navya 20 Hyd

2 Sneha 21 Delhi

3 Yamu 22 Pune

In [35]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df
Out[35]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

Step-6 : How to provide a New Column to already existed dataframe

Here we already has a dataframe with name df

It has 3 columns

Now we want to add a new column Marks

we need to create new array or list

That length of list should be equal to length of rows

so here we have 3 rows , so new list also must have 3 values

In [ ]: # df['<new column name>']=<list>

In [38]: marks=[100,200,300]
df['Marks']=marks
df

Out[38]: Names Age City Marks

A Navya 20 Hyd 100

B Sneha 21 Delhi 200

C Yamu 22 Pune 300

Step-7 : Create a DataFrame using empty DataFrame

In above case we created a list

we create a dataframe by passing list

In [41]: df1=pd.DataFrame()
df1

Out[41]:

In [43]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
df1['Name']=name
df1['Age']=age
df1['City']=city
df1
Out[43]: Name Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

Step-8 : Create a DataFrame using Dictionary

In [50]: dict1={'Names':['Navya','Sneha','Yamu'],'Age':[20,21,22],'City':['Hyd','Delhi','Pune']}
dict1

Out[50]: {'Names': ['Navya', 'Sneha', 'Yamu'],


'Age': [20, 21, 22],
'City': ['Hyd', 'Delhi', 'Pune']}

In [52]: df2=pd.DataFrame(dict1)
df2

Out[52]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [54]: df2=pd.DataFrame(dict1,index=['A','B','C'])
df2

Out[54]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

Keys Behaves as Columns

Values Behaves as Rows

In [57]: dict2={'Name':'Navya','Age':20,'City':'Hyd'}
dict2

Out[57]: {'Name': 'Navya', 'Age': 20, 'City': 'Hyd'}

In [61]: df3=pd.DataFrame(dict2)
df3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[61], line 1
----> 1 df3=pd.DataFrame(dict2)
2 df3

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:778, in DataFrame.__init__(self, data, in


dex, columns, dtype, copy)
772 mgr = self._init_mgr(
773 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
774 )
776 elif isinstance(data, dict):
777 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 778 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
779 elif isinstance(data, ma.MaskedArray):
780 from numpy.ma import mrecords

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:503, in dict_to_mgr(dat


a, index, columns, dtype, typ, copy)
499 else:
500 # dtype check to exclude e.g. range objects, scalars
501 arrays = [x.copy() if hasattr(x, "dtype") else x for x in arrays]
--> 503 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:114, in arrays_to_mgr(ar


rays, columns, index, dtype, verify_integrity, typ, consolidate)
111 if verify_integrity:
112 # figure out the index, if necessary
113 if index is None:
--> 114 index = _extract_index(arrays)
115 else:
116 index = ensure_index(index)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:667, in _extract_index(d


ata)
664 raise ValueError("Per-column arrays must each be 1-dimensional")
666 if not indexes and not raw_lengths:
--> 667 raise ValueError("If using all scalar values, you must pass an index")
669 if have_series:
670 index = union_indexes(indexes)

ValueError: If using all scalar values, you must pass an index

In [63]: dict2={'Name':'Navya','Age':20,'City':'Hyd'}
pd.DataFrame(dict2,index=[1])

# If using all scalar values, you must pass an index

Out[63]: Name Age City

1 Navya 20 Hyd

In [65]: dict2={'Name':'Navya','Age':20,'City':'Hyd'}
pd.DataFrame(dict2,index=[1,2])
Out[65]: Name Age City

1 Navya 20 Hyd

2 Navya 20 Hyd

Data in the form of array can print 3 ways :

list : Normal way

numpy: Numpy package

tensor: Tensorflow

In [68]: l1=[1,2,3]
import numpy as np
np.array(l1)

Out[68]: array([1, 2, 3])

In [70]: l1=[1,2,3]
l2=[11,12,13]
l1+l2

Out[70]: [1, 2, 3, 11, 12, 13]

In [72]: import numpy as np


np.array(l1)
np.array(l2)
np.array(l1+l2)

Out[72]: array([ 1, 2, 3, 11, 12, 13])

In [74]: l1=[1,2,3]
a=np.array(l1)
l2=[11,12,13]
b=np.array(l2)
a+b

Out[74]: array([12, 14, 16])

In [76]: l1=[1,2,3]
a=np.array(l1)
l2=[11,12,13]
b=np.array(l2)
a*b

Out[76]: array([11, 24, 39])

In [78]: l1=[1,2,3]
a=np.array(l1)
l2=[11,12,13]
b=np.array(l2)
a+b,a*b

Out[78]: (array([12, 14, 16]), array([11, 24, 39]))


Step-9 : Drop the column

In order to drop a column we need to use drop method

All the methods based on dataframe names similar as the string names

It requires mainly 3 arguments

1.Column name

2.axis

axis = 1 represents column

axis = 0 represents rows

3.Inplace

once you drop the column , dataframe affected

The modified dataframe wants to save in a same or different name

if you want to keep at same name then inplace=True

In [ ]: # create a dataframe and drop any column

In [81]: df4=pd.DataFrame()
df4

Out[81]:

In [ ]: df4.drop()

In [87]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4

Out[87]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

In [97]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('City',axis=1)
Out[97]: Names Age

A Navya 20

B Sneha 21

C Yamu 22

In [103… name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('A',axis=0)

Out[103… Names Age City

B Sneha 21 Delhi

C Yamu 22 Pune

In [107… name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('A',axis=0,inplace=True)

In [109… df4

Out[109… Names Age City

B Sneha 21 Delhi

C Yamu 22 Pune

In [4]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('A',axis=0,inplace=False)

Out[4]: Names Age City

B Sneha 21 Delhi

C Yamu 22 Pune

In [ ]: # create two dataframes df1 and df2


# add those dataframes

############# df1 ###########


Names Age City
Ramesh 20 Hyd
############ df2 ###########
Names Age City
Suresh 21 Blr

Names Age City


Ramesh 20 Hyd
Suresh 21 Blr

append
concate
join

In [34]: dict1={'Name':'Ramesh','Age':20,'City':'Hyd'}
df5=pd.DataFrame(dict1,index=[1])
dict2={'Name':'Suresh','Age':21,'City':'Blr'}
df6=pd.DataFrame(dict2,index=[2])
result=pd.concat([df5,df6],ignore_index=True)
print(result)

Name Age City


0 Ramesh 20 Hyd
1 Suresh 21 Blr

Step-10 : How to overwrite existed column

we already has a dataframe

now we want to replace all the values of specific column with new values

first create a list with new values

Then update the column with new values , in the same way of how to create a new column

df[new col]=data , to create a new column

df[old col]=new data , to overwrite theold column

In [8]: df4['Age']=[33,44,34]
df4

Out[8]: Names Age City

A Navya 33 Hyd

B Sneha 44 Delhi

C Yamu 34 Pune

In [48]: df4['Names']=['anshu','chinni','adya']
df4

Out[48]: Names Age City Name

A anshu 33 Hyd anshu

B chinni 44 Delhi chinni

C adya 34 Pune adya


Step-11 : How to save the DataFrame

we can save the dataframe using 2 ways

csv:comma seperated value

excel

For csv : to_csv extension = .csv

For excel : read_csv extension = .xlsx

In [51]: # create a dataframe

name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df

Out[51]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

Csv Format

In [56]: # DataFramename.methodname
# where you want to save
# in what name you want to save

df.to_csv('data12.csv')

Excel sheet

In [61]: df.to_excel('data13.xlsx')

Step-12 : Read the data

read_csv

read_excel

both available on pandas

In [65]: pd.read_csv('data12.csv')
Out[65]: Unnamed: 0 Names Age City

0 A Navya 20 Hyd

1 B Sneha 21 Delhi

2 C Yamu 22 Pune

In [69]: pd.read_excel('data13.xlsx')

Out[69]: Unnamed: 0 Names Age City

0 A Navya 20 Hyd

1 B Sneha 21 Delhi

2 C Yamu 22 Pune

Step-13 : How to avoid extra column

while we are saving the data , we have argument name index

keep index=False

In [74]: # Give the different name , provide index=False

df.to_csv('data21.csv',index=False)
pd.read_csv('data21.csv')

Out[74]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [76]: df.to_excel('data31.xlsx',index=False)
pd.read_excel('data31.xlsx')

Out[76]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [ ]:

You might also like