0% found this document useful (0 votes)

12 views13 pages

session-1 DataFrame

Data Frame

Uploaded by

Ssk Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views13 pages

session-1 DataFrame

Data Frame

Uploaded by

Ssk Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

EDA

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process.

In [ ]: ====================== Data Analysis =======================

1.Pandas ------- Dataframe read and write operations
2.Numpy ------- Numerical python Math operations
3.Matplotlib ------- plots , graphs , visualization
4.Seaborn ------- plots
5.Plotly ------- plots
6.Bokhe ------- plots

====================== Machine Learning =====================

7.Sickit-learn (sklearn) ------ Model development
8.stats packages ------ Linear Regression

====================== Webscrapping and Database connection ======

9.Sqlite ------ SQL Connection
10.Beautiful soup ------ scrap the data
11.websocket ------ scrap the data

====================== Deep Learning ==========================

12.Tensorflow ------ Deep learning models development(google)
13.keras
14.pytorch ------ develop by
15.Opencv ------ computer vision(reading and writing images)
16.Pillow ------ reading images

====================== NLP ======================================

17.NLTK ----- Natural language tool kit
18.SpaCy ----- NLP Models
19.wordcloud -----

====================== Web development - API ======================

20.Flask
21.Django
22.Fask API
23.Gradio

====================== Apps creation ==============================

24.Streamlit

====================== Transformers BERT (NLP models) ==============

25.Transformers ------ Huggingface (Google)

====================== DL:Pretarained Models bject Detections =======

26.vgg16
27.Mobilenet
28.Yolo ----- Ultralytics

====================== NLP pretrained Models ========================

29.Word2Vec ----- Google
30.GloVe ----- StandforUniversity

====================== Model save ==================================

31.Pickle
32.Joblib

====================== GenAI LLM ====================================

33.Azure openAI
34.Google Gemini
35.Amazon BedRock
36.LLAMA Meta
37.Langchain Framework
====================== Model Deployment ================================
38.MLFlow

====================== Cloud Services ==================================

39.Azure ML Related packages
40.GCP vertex ai packages
41.Amazon sagemaker packages

====================== Alle NLP ======================================

42. Allen NLP packages

====================== ML using Pyspark ================================

43.MLlib package

====================== Small packages ==================================

44.random
45.math
46.time
47.logger

Step-1 : Import Packages

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Step-2 : Create a DataFrame using List

In [7]: import pandas as pd

pd.DataFrame

Out[7]: pandas.core.frame.DataFrame

In [9]: import pandas as pd

pd.DataFrame()

Out[9]:

In [13]: import pandas as pd

data=pd.DataFrame()
data

# we created a DataFrame
# But no data (no rows and no columns)
# we saved our DataFrame with a name 'data'

Out[13]:

Step-3 : Provide The Data

In [16]: name=['Navya','Sneha','Yamu']
pd.DataFrame()
Out[16]:

In [18]: name=['Navya','Sneha','Yamu']
pd.DataFrame(name)

Out[18]: 0

0 Navya

1 Sneha

2 Yamu

In [20]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
pd.DataFrame(zip(name,age))

Out[20]: 0 1

0 Navya 20

1 Sneha 21

2 Yamu 22

In [22]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
pd.DataFrame(zip(name,age,city))

Out[22]: 0 1 2

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [24]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
data=[name,age,city]
pd.DataFrame(data)

Out[24]: 0 1 2

0 Navya Sneha Yamu

1 20 21 22

2 Hyd Delhi Pune

In [26]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
df=pd.DataFrame(zip(name,age,city))
df
Out[26]: 0 1 2

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

Step-4 : Provide The Columns

Columns we need to provide in a list

The number of columns exactly match with data

Here we have 3 columns , so we need to create a list with 3 names

In [30]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
df=pd.DataFrame(zip(name,age,city),columns=cols)
df

Out[30]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

Step-5 : Provide the Index

In [33]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=[1,2,3]
df=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df

Out[33]: Names Age City

1 Navya 20 Hyd

2 Sneha 21 Delhi

3 Yamu 22 Pune

In [35]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df
Out[35]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

Step-6 : How to provide a New Column to already existed dataframe

Here we already has a dataframe with name df

It has 3 columns

Now we want to add a new column Marks

we need to create new array or list

That length of list should be equal to length of rows

so here we have 3 rows , so new list also must have 3 values

In [ ]: # df['<new column name>']=<list>

In [38]: marks=[100,200,300]
df['Marks']=marks
df

Out[38]: Names Age City Marks

A Navya 20 Hyd 100

B Sneha 21 Delhi 200

C Yamu 22 Pune 300

Step-7 : Create a DataFrame using empty DataFrame

In above case we created a list

we create a dataframe by passing list

In [41]: df1=pd.DataFrame()
df1

Out[41]:

In [43]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
df1['Name']=name
df1['Age']=age
df1['City']=city
df1
Out[43]: Name Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

Step-8 : Create a DataFrame using Dictionary

In [50]: dict1={'Names':['Navya','Sneha','Yamu'],'Age':[20,21,22],'City':['Hyd','Delhi','Pune']}
dict1

Out[50]: {'Names': ['Navya', 'Sneha', 'Yamu'],

'Age': [20, 21, 22],
'City': ['Hyd', 'Delhi', 'Pune']}

In [52]: df2=pd.DataFrame(dict1)
df2

Out[52]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [54]: df2=pd.DataFrame(dict1,index=['A','B','C'])
df2

Out[54]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

Keys Behaves as Columns

Values Behaves as Rows

In [57]: dict2={'Name':'Navya','Age':20,'City':'Hyd'}
dict2

Out[57]: {'Name': 'Navya', 'Age': 20, 'City': 'Hyd'}

In [61]: df3=pd.DataFrame(dict2)
df3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[61], line 1
----> 1 df3=pd.DataFrame(dict2)
2 df3

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:778, in DataFrame.init(self, data, in

dex, columns, dtype, copy)
772 mgr = self._init_mgr(
773 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
774 )
776 elif isinstance(data, dict):
777 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 778 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
779 elif isinstance(data, ma.MaskedArray):
780 from numpy.ma import mrecords

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:503, in dict_to_mgr(dat

a, index, columns, dtype, typ, copy)
499 else:
500 # dtype check to exclude e.g. range objects, scalars
501 arrays = [x.copy() if hasattr(x, "dtype") else x for x in arrays]
--> 503 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:114, in arrays_to_mgr(ar

rays, columns, index, dtype, verify_integrity, typ, consolidate)
111 if verify_integrity:
112 # figure out the index, if necessary
113 if index is None:
--> 114 index = _extract_index(arrays)
115 else:
116 index = ensure_index(index)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:667, in _extract_index(d

ata)
664 raise ValueError("Per-column arrays must each be 1-dimensional")
666 if not indexes and not raw_lengths:
--> 667 raise ValueError("If using all scalar values, you must pass an index")
669 if have_series:
670 index = union_indexes(indexes)

ValueError: If using all scalar values, you must pass an index

In [63]: dict2={'Name':'Navya','Age':20,'City':'Hyd'}
pd.DataFrame(dict2,index=[1])

# If using all scalar values, you must pass an index

Out[63]: Name Age City

1 Navya 20 Hyd

In [65]: dict2={'Name':'Navya','Age':20,'City':'Hyd'}
pd.DataFrame(dict2,index=[1,2])
Out[65]: Name Age City

1 Navya 20 Hyd

2 Navya 20 Hyd

Data in the form of array can print 3 ways :

list : Normal way

numpy: Numpy package

tensor: Tensorflow

In [68]: l1=[1,2,3]
import numpy as np
np.array(l1)

Out[68]: array([1, 2, 3])

In [70]: l1=[1,2,3]
l2=[11,12,13]
l1+l2

Out[70]: [1, 2, 3, 11, 12, 13]

In [72]: import numpy as np

np.array(l1)
np.array(l2)
np.array(l1+l2)

Out[72]: array([ 1, 2, 3, 11, 12, 13])

In [74]: l1=[1,2,3]
a=np.array(l1)
l2=[11,12,13]
b=np.array(l2)
a+b

Out[74]: array([12, 14, 16])

In [76]: l1=[1,2,3]
a=np.array(l1)
l2=[11,12,13]
b=np.array(l2)
a*b

Out[76]: array([11, 24, 39])

In [78]: l1=[1,2,3]
a=np.array(l1)
l2=[11,12,13]
b=np.array(l2)
a+b,a*b

Out[78]: (array([12, 14, 16]), array([11, 24, 39]))

Step-9 : Drop the column

In order to drop a column we need to use drop method

All the methods based on dataframe names similar as the string names

It requires mainly 3 arguments

1.Column name

2.axis

axis = 1 represents column

axis = 0 represents rows

3.Inplace

once you drop the column , dataframe affected

The modified dataframe wants to save in a same or different name

if you want to keep at same name then inplace=True

In [ ]: # create a dataframe and drop any column

In [81]: df4=pd.DataFrame()
df4

Out[81]:

In [ ]: df4.drop()

In [87]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4

Out[87]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

In [97]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('City',axis=1)
Out[97]: Names Age

A Navya 20

B Sneha 21

C Yamu 22

In [103… name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('A',axis=0)

Out[103… Names Age City

B Sneha 21 Delhi

C Yamu 22 Pune

In [107… name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('A',axis=0,inplace=True)

In [109… df4

Out[109… Names Age City

B Sneha 21 Delhi

C Yamu 22 Pune

In [4]: name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df4=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df4.drop('A',axis=0,inplace=False)

Out[4]: Names Age City

B Sneha 21 Delhi

C Yamu 22 Pune

In [ ]: # create two dataframes df1 and df2

# add those dataframes

############# df1 ###########

Names Age City
Ramesh 20 Hyd
############ df2 ###########
Names Age City
Suresh 21 Blr

Names Age City

Ramesh 20 Hyd
Suresh 21 Blr

append
concate
join

In [34]: dict1={'Name':'Ramesh','Age':20,'City':'Hyd'}
df5=pd.DataFrame(dict1,index=[1])
dict2={'Name':'Suresh','Age':21,'City':'Blr'}
df6=pd.DataFrame(dict2,index=[2])
result=pd.concat([df5,df6],ignore_index=True)
print(result)

Name Age City

0 Ramesh 20 Hyd
1 Suresh 21 Blr

Step-10 : How to overwrite existed column

we already has a dataframe

now we want to replace all the values of specific column with new values

first create a list with new values

Then update the column with new values , in the same way of how to create a new column

df[new col]=data , to create a new column

df[old col]=new data , to overwrite theold column

In [8]: df4['Age']=[33,44,34]
df4

Out[8]: Names Age City

A Navya 33 Hyd

B Sneha 44 Delhi

C Yamu 34 Pune

In [48]: df4['Names']=['anshu','chinni','adya']
df4

Out[48]: Names Age City Name

A anshu 33 Hyd anshu

B chinni 44 Delhi chinni

C adya 34 Pune adya

Step-11 : How to save the DataFrame

we can save the dataframe using 2 ways

csv:comma seperated value

excel

For csv : to_csv extension = .csv

For excel : read_csv extension = .xlsx

In [51]: # create a dataframe

name=['Navya','Sneha','Yamu']
age=[20,21,22]
city=['Hyd','Delhi','Pune']
cols=['Names','Age','City']
id=['A','B','C']
df=pd.DataFrame(zip(name,age,city),index=id,columns=cols)
df

Out[51]: Names Age City

A Navya 20 Hyd

B Sneha 21 Delhi

C Yamu 22 Pune

Csv Format

In [56]: # DataFramename.methodname
# where you want to save
# in what name you want to save

df.to_csv('data12.csv')

Excel sheet

In [61]: df.to_excel('data13.xlsx')

Step-12 : Read the data

read_csv

read_excel

both available on pandas

In [65]: pd.read_csv('data12.csv')
Out[65]: Unnamed: 0 Names Age City

0 A Navya 20 Hyd

1 B Sneha 21 Delhi

2 C Yamu 22 Pune

In [69]: pd.read_excel('data13.xlsx')

Out[69]: Unnamed: 0 Names Age City

0 A Navya 20 Hyd

1 B Sneha 21 Delhi

2 C Yamu 22 Pune

Step-13 : How to avoid extra column

while we are saving the data , we have argument name index

keep index=False

In [74]: # Give the different name , provide index=False

df.to_csv('data21.csv',index=False)
pd.read_csv('data21.csv')

Out[74]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [76]: df.to_excel('data31.xlsx',index=False)
pd.read_excel('data31.xlsx')

Out[76]: Names Age City

0 Navya 20 Hyd

1 Sneha 21 Delhi

2 Yamu 22 Pune

In [ ]:

Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
TU170 - Final
100% (1)
TU170 - Final
42 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
No ratings yet
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
9 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
DATASCIENCE_INTERNSHIP[1]
No ratings yet
DATASCIENCE_INTERNSHIP[1]
43 pages
ip study
No ratings yet
ip study
18 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
final dev record
No ratings yet
final dev record
49 pages
PR Final File
No ratings yet
PR Final File
70 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
Notebook PYTHON DATA SCIENCE
No ratings yet
Notebook PYTHON DATA SCIENCE
16 pages
Dataframe Notes
No ratings yet
Dataframe Notes
39 pages
Dataframe Notes
No ratings yet
Dataframe Notes
26 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Pandas
No ratings yet
Pandas
27 pages
Python-Numpy & Pandas
No ratings yet
Python-Numpy & Pandas
78 pages
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas Documentation PDF
No ratings yet
Pandas Documentation PDF
86 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
MLT Lab Manual
No ratings yet
MLT Lab Manual
41 pages
1.5
No ratings yet
1.5
39 pages
data science
No ratings yet
data science
42 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Class 1 - 2024 Business Analytics
No ratings yet
Class 1 - 2024 Business Analytics
8 pages
dev record final (3)
No ratings yet
dev record final (3)
34 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
DA lab
No ratings yet
DA lab
27 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Profound Python Libraries
From Everand
Profound Python Libraries
Onder Teker
No ratings yet
MySQL Crash Course: A Hands-on Introduction to Database Development
From Everand
MySQL Crash Course: A Hands-on Introduction to Database Development
Rick Silva
No ratings yet
1729401471516
No ratings yet
1729401471516
98 pages
Pig Is Big On Books
No ratings yet
Pig Is Big On Books
14 pages
Cyber Attacks Network Security 29-30 June 2017
No ratings yet
Cyber Attacks Network Security 29-30 June 2017
1 page
Economic Survey of Maharashtra 2016-17
No ratings yet
Economic Survey of Maharashtra 2016-17
306 pages
Animals in The Zoo
No ratings yet
Animals in The Zoo
4 pages
Institutional Framework For The Transformation of Transport Administration in Metropolitan Lagos
No ratings yet
Institutional Framework For The Transformation of Transport Administration in Metropolitan Lagos
10 pages
Environmentally Senitive Maintenance
No ratings yet
Environmentally Senitive Maintenance
25 pages
SAP Internal Tables
No ratings yet
SAP Internal Tables
6 pages
Brownfields and Land Revitalization
No ratings yet
Brownfields and Land Revitalization
2 pages
Sigma Enterprise - 4G 5G Usecases
No ratings yet
Sigma Enterprise - 4G 5G Usecases
24 pages
Valeo Interview Q
No ratings yet
Valeo Interview Q
21 pages
Math 6 Performance Task 2
No ratings yet
Math 6 Performance Task 2
6 pages
Pix Trajets
No ratings yet
Pix Trajets
3 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
LAS 1 Computer Basic
No ratings yet
LAS 1 Computer Basic
10 pages
FAQs Chrome River
No ratings yet
FAQs Chrome River
19 pages
ESE Method Statement Topographic Rev3
No ratings yet
ESE Method Statement Topographic Rev3
38 pages
Velodyne LiDAR VLP 16 User Manual
100% (1)
Velodyne LiDAR VLP 16 User Manual
138 pages
Massive MIMO
100% (1)
Massive MIMO
18 pages
CPU Scheduling: Basic Concepts
No ratings yet
CPU Scheduling: Basic Concepts
9 pages
Log 8187A Last
No ratings yet
Log 8187A Last
2 pages
SEW Gear Units
No ratings yet
SEW Gear Units
176 pages
Number Systems: Location in Course Textbook
No ratings yet
Number Systems: Location in Course Textbook
64 pages
MT8952B DataSheet Aug11
No ratings yet
MT8952B DataSheet Aug11
32 pages
The Difference Between Standard Sales Rush Orders
No ratings yet
The Difference Between Standard Sales Rush Orders
6 pages
Gage R&R Tool - Average and Range Method (Control Chart Method, Xbar and R Method)
No ratings yet
Gage R&R Tool - Average and Range Method (Control Chart Method, Xbar and R Method)
4 pages
3DS 2018 SWK Launch2019 DataSheet PCB
No ratings yet
3DS 2018 SWK Launch2019 DataSheet PCB
2 pages
QOS Management Protocol For Mobile Ad Hoc Networks Using Mobile Agents - SpringerLink
No ratings yet
QOS Management Protocol For Mobile Ad Hoc Networks Using Mobile Agents - SpringerLink
9 pages
Computer Generated in Nursing Care Plans
100% (2)
Computer Generated in Nursing Care Plans
4 pages
Constellations Cootie Catcher
No ratings yet
Constellations Cootie Catcher
6 pages
Product Manager Resume Shreyas
No ratings yet
Product Manager Resume Shreyas
2 pages
CHAPTER 2: Data Collection and Basic Concepts in Sampling Design
No ratings yet
CHAPTER 2: Data Collection and Basic Concepts in Sampling Design
14 pages
How To Validate Your Startup Idea - by Todd Jackson
No ratings yet
How To Validate Your Startup Idea - by Todd Jackson
72 pages
Traditional Applications - Electronic Mail (SMTP, POP3, IMAP, MIME) - HTTP - Web Services - DNS-SNMP
No ratings yet
Traditional Applications - Electronic Mail (SMTP, POP3, IMAP, MIME) - HTTP - Web Services - DNS-SNMP
39 pages
1
No ratings yet
1
79 pages
Shortest Path Problem - Wikipedia
No ratings yet
Shortest Path Problem - Wikipedia
50 pages
Looking and Acting Michael Land download pdf
100% (8)
Looking and Acting Michael Land download pdf
85 pages
Joint Khmer Word Segmentation and Part-Of-Speech T
No ratings yet
Joint Khmer Word Segmentation and Part-Of-Speech T
12 pages

session-1 DataFrame

Uploaded by

session-1 DataFrame

Uploaded by

EDA

In [ ]: ====================== Data Analysis =======================

====================== Machine Learning =====================

====================== Webscrapping and Database connection ======

====================== Deep Learning ==========================

====================== NLP ======================================

====================== Web development - API ======================

====================== Apps creation ==============================

====================== Transformers BERT (NLP models) ==============

====================== DL:Pretarained Models bject Detections =======

====================== NLP pretrained Models ========================

====================== Model save ==================================

====================== GenAI LLM ====================================

====================== Cloud Services ==================================

====================== Alle NLP ======================================

====================== ML using Pyspark ================================

====================== Small packages ==================================

Step-1 : Import Packages

In [1]: import pandas as pd

Step-2 : Create a DataFrame using List

In [7]: import pandas as pd

In [9]: import pandas as pd

In [13]: import pandas as pd

Step-3 : Provide The Data

0 Navya Sneha Yamu

2 Hyd Delhi Pune

Step-4 : Provide The Columns

Columns we need to provide in a list

The number of columns exactly match with data

Here we have 3 columns , so we need to create a list with 3 names

Out[30]: Names Age City

Step-5 : Provide the Index

Out[33]: Names Age City

Step-6 : How to provide a New Column to already existed dataframe

Here we already has a dataframe with name df

Now we want to add a new column Marks

we need to create new array or list

That length of list should be equal to length of rows

so here we have 3 rows , so new list also must have 3 values

In [ ]: # df['<new column name>']=<list>

Out[38]: Names Age City Marks

A Navya 20 Hyd 100

B Sneha 21 Delhi 200

C Yamu 22 Pune 300

Step-7 : Create a DataFrame using empty DataFrame

In above case we created a list

we create a dataframe by passing list

Step-8 : Create a DataFrame using Dictionary

Out[50]: {'Names': ['Navya', 'Sneha', 'Yamu'],

Out[52]: Names Age City

Out[54]: Names Age City

Keys Behaves as Columns

Values Behaves as Rows

Out[57]: {'Name': 'Navya', 'Age': 20, 'City': 'Hyd'}

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:778, in DataFrame.__init__(self, data, in

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:503, in dict_to_mgr(dat

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:114, in arrays_to_mgr(ar

File ~\anaconda3\Lib\site-packages\pandas\core\internals\construction.py:667, in _extract_index(d

ValueError: If using all scalar values, you must pass an index

# If using all scalar values, you must pass an index

Out[63]: Name Age City

Data in the form of array can print 3 ways :

list : Normal way

numpy: Numpy package

Out[68]: array([1, 2, 3])

Out[70]: [1, 2, 3, 11, 12, 13]

In [72]: import numpy as np

Out[72]: array([ 1, 2, 3, 11, 12, 13])

Out[74]: array([12, 14, 16])

Out[76]: array([11, 24, 39])

Out[78]: (array([12, 14, 16]), array([11, 24, 39]))

In order to drop a column we need to use drop method

It requires mainly 3 arguments

axis = 1 represents column

axis = 0 represents rows

once you drop the column , dataframe affected

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:778, in DataFrame.init(self, data, in