0% found this document useful (0 votes)

10 views3 pages

Sma 3

The document outlines a Python script for cleaning and processing social media data from a CSV file, focusing on Facebook and Instagram posts. It includes steps for data cleaning, feature engineering, and storing the cleaned data in a MongoDB database. Additionally, it provides a sample dataset and demonstrates how to verify data insertion and calculate average engagement rates by platform.

Uploaded by

Ganesh Panigrahi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views3 pages

Sma 3

Uploaded by

Ganesh Panigrahi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

EXPERIMENT NO: 3

import pandas as pd
import numpy as np
import pymongo
from pymongo import MongoClient

# Step 1: Load the raw social media data (assuming a CSV file)
df = pd.read_csv('social_media_data.csv')

# Step 2: Data Cleaning

# 2.1 Handle missing values
# For simplicity, we can drop rows with missing values (or use imputation)
df.dropna(inplace=True)

# 2.2 Remove duplicates

df.drop_duplicates(inplace=True)

# 2.3 Filter data (e.g., we may only be interested in Facebook and Instagram posts)
df = df[df['platform'].isin(['Facebook', 'Instagram'])]

# 2.4 Correct column types (e.g., ensure 'post_date' is in datetime format)

df['post_date'] = pd.to_datetime(df['post_date'])

# 2.5 Remove posts with zero likes, shares, or comments, as they might be irrelevant
df = df[(df['likes'] > 0) | (df['shares'] > 0) | (df['comments'] > 0)]

# Step 3: Feature Engineering (Optional)

# 3.1 Calculate engagement rate: (likes + shares + comments) / followers
df['engagement_rate'] = (df['likes'] + df['shares'] + df['comments']) / df['followers']

# 3.2 Extract relevant date parts (optional)

df['year'] = df['post_date'].dt.year
df['month'] = df['post_date'].dt.month
df['day'] = df['post_date'].dt.day
df['weekday'] = df['post_date'].dt.weekday

# Step 4: Connect to MongoDB and store the cleaned data

# Create a connection to MongoDB (local or cloud)
client = MongoClient('mongodb://localhost:27017/') # MongoDB connection string
db = client['social_media_data_db'] # Create a database
collection = db['posts'] # Create a collection

# Convert the cleaned dataframe to a dictionary format

records = df.to_dict(orient='records')

# Insert the cleaned data into MongoDB

collection.insert_many(records)
# Step 5: Verify data insertion
print(f"Inserted {len(records)} records into MongoDB.")

# You can verify by querying MongoDB directly or using Python:

result = collection.find().limit(5) # Displaying the first 5 records
for record in result:
print(record)

# Print first 5 Facebook posts

for post in collection.find({"platform": "Facebook"}).limit(5): print(post)

# Print avg engagement rate by platform

for record in collection.aggregate([{"$group": {"_id": "$platform", "avg_engagement_rate":
{"$avg": "$engagement_rate"}}}]):
print(record)

for record in avg_engagement:

print(record)

dbcreate.py
import pandas as pd

# Define the sample data as a dictionary

data = {
"post_date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05",
"2024-01-06", "2024-01-07", "2024-01-08", "2024-01-09", "2024-01-10"],
"platform": ["Facebook", "Instagram", "Twitter", "Instagram", "Facebook",
"Twitter", "Instagram", "Facebook", "Twitter", "Instagram"],
"likes": [350, 500, 300, 450, 400, 280, 550, 600, 320, 470],
"shares": [50, 75, 40, 80, 60, 50, 90, 100, 45, 85],
"comments": [120, 200, 100, 150, 110, 90, 250, 130, 110, 180],
"followers": [15000, 12000, 18000, 13000, 16000, 17000, 14000, 15000, 16000, 12500],
"hashtags": ["#business #growth", "#marketing #innovation", "#growth #success",
"#socialmedia #strategy", "#branding #entrepreneur",
"#leadership #business", "#digitalmarketing #startup",
"#innovation #content", "#productivity #success", "#inspiration #growth"]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Save the DataFrame to a CSV file

df.to_csv("social_media_data.csv", index=False)

print("Sample social media data saved to 'social_media_data.csv'")

Sony rcp-1530 1st-Edition Rev.1 MM
No ratings yet
Sony rcp-1530 1st-Edition Rev.1 MM
172 pages
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
UGEO - HM70A - Operation Manual (Vol1)
100% (1)
UGEO - HM70A - Operation Manual (Vol1)
232 pages
KCU401-C Keeler Cryomatic Service Manual
100% (1)
KCU401-C Keeler Cryomatic Service Manual
25 pages
Branches of Economics
No ratings yet
Branches of Economics
4 pages
Corba Book
100% (1)
Corba Book
286 pages
The Rise of Bioceramics
No ratings yet
The Rise of Bioceramics
6 pages
Sma Exp 3
No ratings yet
Sma Exp 3
7 pages
Dododo
No ratings yet
Dododo
10 pages
Instagram Reach Analysis - Ipynb
No ratings yet
Instagram Reach Analysis - Ipynb
238 pages
DS Retest
No ratings yet
DS Retest
18 pages
Data Science Practical Explanation
No ratings yet
Data Science Practical Explanation
26 pages
Sma 2
No ratings yet
Sma 2
9 pages
API Data Collection
No ratings yet
API Data Collection
3 pages
Sma 6
No ratings yet
Sma 6
4 pages
Sma Exp 09 Code Print
No ratings yet
Sma Exp 09 Code Print
5 pages
Facebook Wall Data Using Graph API
No ratings yet
Facebook Wall Data Using Graph API
55 pages
Chandru Lab 3
No ratings yet
Chandru Lab 3
7 pages
Part C Assignment No 2 Mini Project On Twitter 1
No ratings yet
Part C Assignment No 2 Mini Project On Twitter 1
9 pages
1745064423339-Coders of Delhi
No ratings yet
1745064423339-Coders of Delhi
12 pages
Python Record Manual
No ratings yet
Python Record Manual
18 pages
2023 Guide How To Scrape Social Media Data Using Python
No ratings yet
2023 Guide How To Scrape Social Media Data Using Python
12 pages
Ass 4
No ratings yet
Ass 4
9 pages
Dsbda 4
No ratings yet
Dsbda 4
13 pages
Kavin
No ratings yet
Kavin
13 pages
Facebook - Jupyter Notebook
No ratings yet
Facebook - Jupyter Notebook
6 pages
Twitter Scraping Streamlit - Py
No ratings yet
Twitter Scraping Streamlit - Py
2 pages
Part C - Assignment No. 2 Mini-Project On Twitter
No ratings yet
Part C - Assignment No. 2 Mini-Project On Twitter
7 pages
Mongodb
No ratings yet
Mongodb
24 pages
Social Media Sentimental Analysis 1
No ratings yet
Social Media Sentimental Analysis 1
30 pages
Python Hands On Project 1726651320
No ratings yet
Python Hands On Project 1726651320
15 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
By Olivia Wilson
No ratings yet
By Olivia Wilson
11 pages
Big Data in Python
No ratings yet
Big Data in Python
10 pages
Practical 01 Dms
No ratings yet
Practical 01 Dms
2 pages
Advance Data Mining Assignment
No ratings yet
Advance Data Mining Assignment
10 pages
All Exp Lab
No ratings yet
All Exp Lab
15 pages
SNS Lab Anual
No ratings yet
SNS Lab Anual
33 pages
Email
No ratings yet
Email
5 pages
Python Indepth Live Session
No ratings yet
Python Indepth Live Session
8 pages
10 Streamlit
No ratings yet
10 Streamlit
7 pages
Business Data Analysis
No ratings yet
Business Data Analysis
2 pages
Search Queries Anomaly Detection Using Python
No ratings yet
Search Queries Anomaly Detection Using Python
11 pages
Sna Project
No ratings yet
Sna Project
29 pages
Index
No ratings yet
Index
6 pages
Code2pdf 679e261343a22
No ratings yet
Code2pdf 679e261343a22
2 pages
SDFG
No ratings yet
SDFG
4 pages
Phase 4 Model Deployment and Interface Development
No ratings yet
Phase 4 Model Deployment and Interface Development
4 pages
Python + MongoDB
No ratings yet
Python + MongoDB
12 pages
Case Study 1&2
No ratings yet
Case Study 1&2
10 pages
KOL Scraper
No ratings yet
KOL Scraper
2 pages
Detailed Social Media Fake Account Detection Report
No ratings yet
Detailed Social Media Fake Account Detection Report
4 pages
SMA Expt 2
No ratings yet
SMA Expt 2
7 pages
Lec 14
No ratings yet
Lec 14
7 pages
Mongodb Homework 3.1 Python
100% (1)
Mongodb Homework 3.1 Python
6 pages
Sma Exp 04 Code Print
No ratings yet
Sma Exp 04 Code Print
5 pages
Unit 5 Lab Programs
No ratings yet
Unit 5 Lab Programs
18 pages
Twitter Data Pull
No ratings yet
Twitter Data Pull
10 pages
MongoDB With Python
No ratings yet
MongoDB With Python
4 pages
Project I
No ratings yet
Project I
15 pages
Total Likes Type Category Post Month Post Weekday Post Hour Paid Lifetime Post Total Reach Lifetime Post Total Impressions
No ratings yet
Total Likes Type Category Post Month Post Weekday Post Hour Paid Lifetime Post Total Reach Lifetime Post Total Impressions
10 pages
SMA Exp 2
No ratings yet
SMA Exp 2
4 pages
Code Explanation
No ratings yet
Code Explanation
9 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
DCEXP5
No ratings yet
DCEXP5
2 pages
Sma 5
No ratings yet
Sma 5
3 pages
Sustainability 15 11925
No ratings yet
Sustainability 15 11925
15 pages
AI Exp 3
No ratings yet
AI Exp 3
5 pages
Difference Between Waterfall and Spiral Models - Javatpoint
No ratings yet
Difference Between Waterfall and Spiral Models - Javatpoint
8 pages
Othello Analysis
No ratings yet
Othello Analysis
2 pages
Summative For Week 1 & 2 Statistics
No ratings yet
Summative For Week 1 & 2 Statistics
3 pages
What Is Failure Mode Effects Analysis
No ratings yet
What Is Failure Mode Effects Analysis
6 pages
K. Palepu - Business Analysis Valuation - Ch.1
No ratings yet
K. Palepu - Business Analysis Valuation - Ch.1
40 pages
CLS Aipmt-18-19 XIII Bot Study-Package-1 SET-1 Chapter-1 PDF
No ratings yet
CLS Aipmt-18-19 XIII Bot Study-Package-1 SET-1 Chapter-1 PDF
38 pages
Atp3 34x40
No ratings yet
Atp3 34x40
228 pages
DLL - ALL SUBJECTS 2 - Q4 - W5 - D2 - Ok
No ratings yet
DLL - ALL SUBJECTS 2 - Q4 - W5 - D2 - Ok
7 pages
Basic Question Bank With Answers and Explanations
No ratings yet
Basic Question Bank With Answers and Explanations
275 pages
WBCS Preliminary Exam Solved Question Paper 2015 (English Version) - BengalStudents
No ratings yet
WBCS Preliminary Exam Solved Question Paper 2015 (English Version) - BengalStudents
13 pages
Cyber Security Unit 1
No ratings yet
Cyber Security Unit 1
11 pages
A Detailed Lesson Plan in Science Grade 7
No ratings yet
A Detailed Lesson Plan in Science Grade 7
10 pages
Iso 20819 2018
No ratings yet
Iso 20819 2018
9 pages
Alluvial Soil Black Soil
No ratings yet
Alluvial Soil Black Soil
1 page
VTM Introduction
No ratings yet
VTM Introduction
17 pages
Synonyms
No ratings yet
Synonyms
3 pages
CT 230
No ratings yet
CT 230
21 pages
Buddhist Education in Bangladesh
100% (1)
Buddhist Education in Bangladesh
24 pages
Campbell - Introduction To Geomagnetic Fields
No ratings yet
Campbell - Introduction To Geomagnetic Fields
26 pages
Automatic Power Switching Mains, Solar, Inverter
No ratings yet
Automatic Power Switching Mains, Solar, Inverter
14 pages
Jira Certification Sample Questions
No ratings yet
Jira Certification Sample Questions
7 pages
PR Resume
No ratings yet
PR Resume
6 pages
Physics FYUGP
No ratings yet
Physics FYUGP
57 pages
Corrimano in Luce - It
No ratings yet
Corrimano in Luce - It
7 pages
Ysio
100% (1)
Ysio
252 pages

Sma 3

Uploaded by

Sma 3

Uploaded by

EXPERIMENT NO: 3

# Step 2: Data Cleaning

# 2.2 Remove duplicates

# 2.4 Correct column types (e.g., ensure 'post_date' is in datetime format)

# Step 3: Feature Engineering (Optional)

# 3.2 Extract relevant date parts (optional)

# Step 4: Connect to MongoDB and store the cleaned data

# Convert the cleaned dataframe to a dictionary format

# Insert the cleaned data into MongoDB

# You can verify by querying MongoDB directly or using Python:

# Print first 5 Facebook posts

# Print avg engagement rate by platform

for record in avg_engagement:

# Define the sample data as a dictionary

# Save the DataFrame to a CSV file

print("Sample social media data saved to 'social_media_data.csv'")

You might also like