0% found this document useful (0 votes)

31 views13 pages

Dsbda 4

The document contains a Python script that processes a Facebook dataset using pandas, including reading a CSV file and displaying its structure. It provides insights into various metrics such as post types, engagement statistics, and interactions. The data is manipulated through filtering, merging, and sorting operations to analyze user engagement on posts.

Uploaded by

gagan.d0077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views13 pages

Dsbda 4

Uploaded by

gagan.d0077

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

TI18_DSBDA_4th

February 5, 2025

[5]: import pandas as pd

[6]: df=pd.read_csv("/home/dpl11/facebook.csv")

[7]: df.head()

[7]: Unnamed: 0 Page total likes Type Category Post Month Post Weekday \
0 0 139441 Photo 2 12 4
1 1 139441 Status 2 12 3
2 2 139441 Photo 3 12 3
3 3 139441 Photo 2 12 2
4 4 139441 Photo 2 12 2

Post Hour Paid Lifetime Post Total Reach \

0 3 0.0 2752
1 10 0.0 10460
2 3 0.0 2413
3 10 1.0 50128
4 3 0.0 7244

Lifetime Post Total Impressions Lifetime Engaged Users \

0 5091 178
1 19057 1457
2 4373 177
3 87991 2211
4 13594 671

Lifetime Post Consumers Lifetime Post Consumptions \

0 109 159
1 1361 1674
2 113 154
3 790 1119
4 410 580

Lifetime Post Impressions by people who have liked your Page \

0 3078
1 11710

1
2 2812
3 61027
4 6228

Lifetime Post reach by people who like your Page \

0 1640
1 6112
2 1503
3 32048
4 3200

Lifetime People who have liked your Page and engaged with your post \
0 119
1 1108
2 132
3 1386
4 396

comment like share Total Interactions

0 4 79.0 17.0 100
1 5 130.0 29.0 164
2 0 66.0 14.0 80
3 58 1572.0 147.0 1777
4 19 325.0 49.0 393

[8]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 20 columns):
# Column Non-
Null Count Dtype
--- ------
-------------- -----
0 Unnamed: 0 500
non-null int64
1 Page total likes 500
non-null int64
2 Type 500
non-null object
3 Category 500
non-null int64
4 Post Month 500
non-null int64
5 Post Weekday 500
non-null int64
6 Post Hour 500

2
non-null int64
7 Paid 499
non-null float64
8 Lifetime Post Total Reach 500
non-null int64
9 Lifetime Post Total Impressions 500
non-null int64
10 Lifetime Engaged Users 500
non-null int64
11 Lifetime Post Consumers 500
non-null int64
12 Lifetime Post Consumptions 500
non-null int64
13 Lifetime Post Impressions by people who have liked your Page 500
non-null int64
14 Lifetime Post reach by people who like your Page 500
non-null int64
15 Lifetime People who have liked your Page and engaged with your post 500
non-null int64
16 comment 500
non-null int64
17 like 499
non-null float64
18 share 496
non-null float64
19 Total Interactions 500
non-null int64
dtypes: float64(3), int64(16), object(1)
memory usage: 78.2+ KB

[9]: df.shape

[9]: (500, 20)

[10]: df.dtypes

[10]: Unnamed: 0 int64

Page total likes int64
Type object
Category int64
Post Month int64
Post Weekday int64
Post Hour int64
Paid float64
Lifetime Post Total Reach int64
Lifetime Post Total Impressions int64
Lifetime Engaged Users int64

3
Lifetime Post Consumers int64
Lifetime Post Consumptions int64
Lifetime Post Impressions by people who have liked your Page int64
Lifetime Post reach by people who like your Page int64
Lifetime People who have liked your Page and engaged with your post int64
comment int64
like float64
share float64
Total Interactions int64
dtype: object

[11]: df.isnull().sum()

[11]: Unnamed: 0 0
Page total likes 0
Type 0
Category 0
Post Month 0
Post Weekday 0
Post Hour 0
Paid 1
Lifetime Post Total Reach 0
Lifetime Post Total Impressions 0
Lifetime Engaged Users 0
Lifetime Post Consumers 0
Lifetime Post Consumptions 0
Lifetime Post Impressions by people who have liked your Page 0
Lifetime Post reach by people who like your Page 0
Lifetime People who have liked your Page and engaged with your post 0
comment 0
like 1
share 4
Total Interactions 0
dtype: int64

[12]: #Creating the subset of data

df_subset1 = df[['like','share']]

[13]: df_subset1

[13]: like share

0 79.0 17.0
1 130.0 29.0
2 66.0 14.0
3 1572.0 147.0
4 325.0 49.0
.. … …

4
495 53.0 26.0
496 53.0 22.0
497 93.0 18.0
498 91.0 38.0
499 91.0 28.0

[500 rows x 2 columns]

[14]: df_subset2 = df[['comment','Type']]

[15]: df_subset2

[15]: comment Type

0 4 Photo
1 5 Status
2 0 Photo
3 58 Photo
4 19 Photo
.. … …
495 5 Photo
496 0 Photo
497 4 Photo
498 7 Photo
499 0 Photo

[500 rows x 2 columns]

[16]: #Merging the DataFrames

merged_data = pd.merge(df_subset2, df_subset1, left_on='comment',␣
↪right_on='like')

merged_data

[16]: comment Type like share

0 4 Photo 4.0 2.0
1 4 Photo 4.0 1.0
2 4 Photo 4.0 0.0
3 4 Photo 4.0 1.0
4 4 Status 4.0 2.0
… … … … …
1462 56 Photo 56.0 17.0
1463 56 Photo 56.0 8.0
1464 56 Photo 56.0 12.0
1465 56 Photo 56.0 9.0
1466 56 Photo 56.0 25.0

[1467 rows x 4 columns]

5
[17]: # Sort by 'Likes' in descending order
sorted_df = df.sort_values(by='like', ascending=False)
print("\nSorted Data by Likes (Descending):")
print(sorted_df)

Sorted Data by Likes (Descending):

Unnamed: 0 Page total likes Type Category Post Month Post Weekday \
244 244 130791 Photo 2 7 3
379 379 111620 Photo 3 4 1
349 349 117764 Photo 3 5 5
168 168 135428 Photo 1 9 3
3 3 139441 Photo 2 12 2
.. … … … … … …
21 21 138414 Photo 1 12 7
100 100 137020 Photo 1 10 4
441 441 98195 Photo 1 3 5
417 417 104070 Photo 1 3 3
111 111 136736 Photo 1 10 6

Post Hour Paid Lifetime Post Total Reach \

244 5 1.0 180480
379 14 1.0 105632
349 13 0.0 81856
168 10 0.0 41984
3 10 1.0 50128
.. … … …
21 10 0.0 1384
100 9 1.0 1357
441 4 1.0 1845
417 10 0.0 1874
111 8 0.0 1261

Lifetime Post Total Impressions Lifetime Engaged Users \

244 319133 8072
379 147918 3984
349 124753 3000
168 68290 3370
3 87991 2211
.. … …
21 2467 15
100 2453 37
441 2670 9
417 2474 25
111 2158 37

Lifetime Post Consumers Lifetime Post Consumptions \

6
244 4010 6242
379 2254 3391
349 1637 2718
168 2420 4074
3 790 1119
.. … …
21 15 20
100 37 55
441 9 9
417 25 31
111 37 49

Lifetime Post Impressions by people who have liked your Page \

244 108752
379 48575
349 52477
168 34802
3 61027
.. …
21 2196
100 2154
441 1614
417 1483
111 1911

Lifetime Post reach by people who like your Page \

244 51456
379 27328
349 27392
168 20928
3 32048
.. …
21 1172
100 1120
441 1008
417 1062
111 1077

Lifetime People who have liked your Page and engaged with your post \
244 3316
379 1936
349 1756
168 2126
3 1386
.. …
21 15
100 32
441 9

7
417 15
111 33

comment like share Total Interactions

244 372 5172.0 790.0 6334
379 51 1998.0 128.0 2177
349 45 1639.0 122.0 1806
168 144 1622.0 208.0 1974
3 58 1572.0 147.0 1777
.. … … … …
21 0 0.0 0.0 0
100 0 0.0 0.0 0
441 0 0.0 0.0 0
417 0 0.0 0.0 0
111 0 NaN NaN 0

[500 rows x 20 columns]

[18]: # Transpose the DataFrame

merged_data.transpose()

[18]: 0 1 2 3 4 5 6 7 8 \
comment 4 4 4 4 4 4 4 4 4
Type Photo Photo Photo Photo Status Status Status Status Status
like 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
share 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0 2.0

9 … 1457 1458 1459 1460 1461 1462 1463 1464 \

comment 4 … 51 51 51 146 146 56 56 56
Type Status … Photo Photo Photo Photo Photo Photo Photo Photo
like 4.0 … 51.0 51.0 51.0 146.0 146.0 56.0 56.0 56.0
share 1.0 … 11.0 6.0 6.0 9.0 15.0 17.0 8.0 12.0

1465 1466
comment 56 56
Type Photo Photo
like 56.0 56.0
share 9.0 25.0

[4 rows x 1467 columns]

[19]: merged_data.T

[19]: 0 1 2 3 4 5 6 7 8 \
comment 4 4 4 4 4 4 4 4 4
Type Photo Photo Photo Photo Status Status Status Status Status
like 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0

8
share 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0 2.0

9 … 1457 1458 1459 1460 1461 1462 1463 1464 \

1465 1466
comment 56 56
Type Photo Photo
like 56.0 56.0
share 9.0 25.0

[4 rows x 1467 columns]

[20]: #Shape
df.shape

[20]: (500, 20)

[21]: # Reshape
df_temp = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],

'bar': ['A', 'B', 'C', 'A', 'B', 'C'],

'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

[22]: df_temp

[22]: foo bar baz zoo

0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t

[23]: import matplotlib.pyplot as plt

import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import matplotlib.pyplot as plt
# import pandas library
import numpy as np

9
[25]: def remove_outliers(column):
Q1 = column.quantile(0.25)
Q3 = column.quantile(0.75)
IQR = Q3 - Q1
threshold = 1.5 * IQR
outlier_mask = (column < Q1 - threshold) | (column > Q3 + threshold)
return column[~outlier_mask]

[27]: df = df.drop_duplicates()

[29]: col_name = ['Category','Post Month','Post Weekday','Post Hour','Paid']

for col in col_name:
df[col] = remove_outliers(df[col])

[30]: plt.figure(figsize=(10, 6)) # Adjust the figure size if needed

for col in col_name:
sns.boxplot(data=df[col])
plt.title(col)
plt.show()

10
11
12
[ ]:

Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Python For Data Science Cheat Sheet 2.0
No ratings yet
Python For Data Science Cheat Sheet 2.0
11 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Twitter Project2
No ratings yet
Twitter Project2
339 pages
Series 1
No ratings yet
Series 1
408 pages
Assignment 1: Learning Goals
No ratings yet
Assignment 1: Learning Goals
5 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Document From Gr7
No ratings yet
Document From Gr7
29 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
PMT2 24
No ratings yet
PMT2 24
56 pages
Assignment1 Output
No ratings yet
Assignment1 Output
10 pages
Campus Connect Projects
No ratings yet
Campus Connect Projects
27 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
Python Unit 4&5 Que
No ratings yet
Python Unit 4&5 Que
33 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
TP 10 Big Data (Ega Sarmita) PDF
No ratings yet
TP 10 Big Data (Ega Sarmita) PDF
6 pages
Python Coding Interview Interview Questions Questions
No ratings yet
Python Coding Interview Interview Questions Questions
9 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Python Indepth Live Session
No ratings yet
Python Indepth Live Session
8 pages
Dododo
No ratings yet
Dododo
10 pages
B - 59 - SMA - Exp 4
No ratings yet
B - 59 - SMA - Exp 4
9 pages
Pandas Cheat Sheet Serves
No ratings yet
Pandas Cheat Sheet Serves
20 pages
Facebook - Jupyter Notebook
No ratings yet
Facebook - Jupyter Notebook
6 pages
Facebook Metrics
No ratings yet
Facebook Metrics
8 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
T I M e S T A M P G R o U P L A N D I N G - P A G e C o N V e R T e D
No ratings yet
T I M e S T A M P G R o U P L A N D I N G - P A G e C o N V e R T e D
6 pages
Pandas
No ratings yet
Pandas
94 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Train Reservation
No ratings yet
Train Reservation
16 pages
Case Study 1&2
No ratings yet
Case Study 1&2
10 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
12 pages
Py Spark
No ratings yet
Py Spark
8 pages
Apache Spark Builtin Functions
No ratings yet
Apache Spark Builtin Functions
9 pages
MLStack Cafe 2
No ratings yet
MLStack Cafe 2
11 pages
CH 3 2
No ratings yet
CH 3 2
17 pages
Numpy Boolean Indexing: Filter
No ratings yet
Numpy Boolean Indexing: Filter
39 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Ass 4
No ratings yet
Ass 4
9 pages
Error Correction
50% (2)
Error Correction
14 pages
Gathering Data: E-Predictions/image-Predictions - TSV
No ratings yet
Gathering Data: E-Predictions/image-Predictions - TSV
3 pages
Sma Exp 3
No ratings yet
Sma Exp 3
7 pages
Lab2 Pandas1
No ratings yet
Lab2 Pandas1
8 pages
Total Likes Type Category Post Month Post Weekday Post Hour Paid Lifetime Post Total Reach Lifetime Post Total Impressions
No ratings yet
Total Likes Type Category Post Month Post Weekday Post Hour Paid Lifetime Post Total Reach Lifetime Post Total Impressions
10 pages
Sac QB 2023-2024
No ratings yet
Sac QB 2023-2024
2 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Part C Assignment No 2 Mini Project On Twitter 1
No ratings yet
Part C Assignment No 2 Mini Project On Twitter 1
9 pages
Chandru Lab 3
No ratings yet
Chandru Lab 3
7 pages
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
No ratings yet
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
5 pages
Code2pdf 679e261343a22
No ratings yet
Code2pdf 679e261343a22
2 pages
A Performance Task in Grade 12A
100% (3)
A Performance Task in Grade 12A
3 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Sma 3
No ratings yet
Sma 3
3 pages
Assignment
No ratings yet
Assignment
10 pages
Practical File
No ratings yet
Practical File
20 pages
L-3 (Data Frame Part 2) .Ipynb - Colab
No ratings yet
L-3 (Data Frame Part 2) .Ipynb - Colab
5 pages
Face Reading Jyotish Astrology Hindi Sanskrit
No ratings yet
Face Reading Jyotish Astrology Hindi Sanskrit
89 pages
Design Calc S Manual
No ratings yet
Design Calc S Manual
285 pages
Civil III Surveying I (10cv34) Notes
No ratings yet
Civil III Surveying I (10cv34) Notes
109 pages
10 Streamlit
No ratings yet
10 Streamlit
7 pages
Savage Worlds RPG Battlestar Galactica
100% (8)
Savage Worlds RPG Battlestar Galactica
61 pages
Capacity Spectrum
100% (1)
Capacity Spectrum
27 pages
Mechanical HVAC Design & Drafting
No ratings yet
Mechanical HVAC Design & Drafting
2 pages
Verilog HDL
100% (1)
Verilog HDL
204 pages
Geometrical Optics ASSIGN - Student
No ratings yet
Geometrical Optics ASSIGN - Student
29 pages
Eve Mining PDF
No ratings yet
Eve Mining PDF
2 pages
Foreword: Frank G. Ripel: Nagualism
No ratings yet
Foreword: Frank G. Ripel: Nagualism
5 pages
The Occult and The Third Reich
No ratings yet
The Occult and The Third Reich
6 pages
Mini Project On Cross Culture in MNC'S: Subitted by P.Kalyan Kamalesh B.V.Karthik Mohanraj
No ratings yet
Mini Project On Cross Culture in MNC'S: Subitted by P.Kalyan Kamalesh B.V.Karthik Mohanraj
17 pages
1 1.2. Ecosystem Introduction: The Term Eco' Means Environment. The Immediate
No ratings yet
1 1.2. Ecosystem Introduction: The Term Eco' Means Environment. The Immediate
17 pages
Engg Mathematics 3 Jan 2014
No ratings yet
Engg Mathematics 3 Jan 2014
2 pages
UPCAT Mathematics Detailed Reviewer and Mock Test
No ratings yet
UPCAT Mathematics Detailed Reviewer and Mock Test
11 pages
Atmel - ATTINY PDF
No ratings yet
Atmel - ATTINY PDF
497 pages
Volunteering at Pehchaan The Street School
No ratings yet
Volunteering at Pehchaan The Street School
8 pages
M3 Playbook Risk Management Plan Template
No ratings yet
M3 Playbook Risk Management Plan Template
15 pages
Microsoft CRM 2011 Installation Guide
No ratings yet
Microsoft CRM 2011 Installation Guide
221 pages
Project Report On PN Diode Characterization (Using LTSpice)
No ratings yet
Project Report On PN Diode Characterization (Using LTSpice)
11 pages
Table 1. Standard Rating Conditions: Gpm/ton 105.00
No ratings yet
Table 1. Standard Rating Conditions: Gpm/ton 105.00
28 pages
Ricas 2019 Gr8 Ela Rid Ada
No ratings yet
Ricas 2019 Gr8 Ela Rid Ada
19 pages
Ilani Fernandes Resume Policy Fellowship 2019
No ratings yet
Ilani Fernandes Resume Policy Fellowship 2019
1 page
CAP 3 Agronomic and Statistical Evaluation of Fertilizer Response 1985
No ratings yet
CAP 3 Agronomic and Statistical Evaluation of Fertilizer Response 1985
38 pages
Virtual Memory - Wikipedia, The Free Encyclopedia
No ratings yet
Virtual Memory - Wikipedia, The Free Encyclopedia
8 pages
Assignement 1 - NISR IZABAYO Jean de La Croix 220005236
No ratings yet
Assignement 1 - NISR IZABAYO Jean de La Croix 220005236
3 pages
Curriculum Vitae Samreen Shahbaz Gul: Academic Qualifications
No ratings yet
Curriculum Vitae Samreen Shahbaz Gul: Academic Qualifications
2 pages
Universiti Tunku Abdul Rahman (Utar) Centre For Foundation Studies Foundation in Arts & Science Fhhm1022 Effective Communication Skills
No ratings yet
Universiti Tunku Abdul Rahman (Utar) Centre For Foundation Studies Foundation in Arts & Science Fhhm1022 Effective Communication Skills
1 page

Dsbda 4

Uploaded by

Dsbda 4

Uploaded by

TI18_DSBDA_4th

[5]: import pandas as pd

Post Hour Paid Lifetime Post Total Reach \

Lifetime Post Total Impressions Lifetime Engaged Users \

Lifetime Post Consumers Lifetime Post Consumptions \

Lifetime Post Impressions by people who have liked your Page \

Lifetime Post reach by people who like your Page \

comment like share Total Interactions

[9]: (500, 20)

[10]: Unnamed: 0 int64

[12]: #Creating the subset of data

[13]: like share

[500 rows x 2 columns]

[14]: df_subset2 = df[['comment','Type']]

[15]: comment Type

[500 rows x 2 columns]

[16]: #Merging the DataFrames

[16]: comment Type like share

[1467 rows x 4 columns]

Sorted Data by Likes (Descending):

Post Hour Paid Lifetime Post Total Reach \

Lifetime Post Total Impressions Lifetime Engaged Users \

Lifetime Post Consumers Lifetime Post Consumptions \

Lifetime Post Impressions by people who have liked your Page \

Lifetime Post reach by people who like your Page \

comment like share Total Interactions

[500 rows x 20 columns]

[18]: # Transpose the DataFrame

9 … 1457 1458 1459 1460 1461 1462 1463 1464 \

[4 rows x 1467 columns]

9 … 1457 1458 1459 1460 1461 1462 1463 1464 \

[4 rows x 1467 columns]

[20]: (500, 20)

'bar': ['A', 'B', 'C', 'A', 'B', 'C'],

[22]: foo bar baz zoo

[23]: import matplotlib.pyplot as plt

[29]: col_name = ['Category','Post Month','Post Weekday','Post Hour','Paid']

[30]: plt.figure(figsize=(10, 6)) # Adjust the figure size if needed

You might also like