0% found this document useful (0 votes)
3 views5 pages

Merge Append Python Code

The document provides a guide on merging datasets using Pandas in Python, specifically focusing on user usage and device information. It demonstrates how to combine two DataFrames, user_usage and user_device, to analyze usage patterns across different devices. The document also includes examples of reading CSV files and performing group operations on the merged data.

Uploaded by

paulrajarshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Merge Append Python Code

The document provides a guide on merging datasets using Pandas in Python, specifically focusing on user usage and device information. It demonstrates how to combine two DataFrames, user_usage and user_device, to analyze usage patterns across different devices. The document also includes examples of reading CSV files and performing group operations on the merged data.

Uploaded by

paulrajarshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2/24/2020 4.

3_Merge_append - Jupyter Notebook

Merge DataFrames

“Merging” two datasets is the process of bringing two datasets together into one, and aligning the rows from
each based on common attributes or columns.

In [1]:

import numpy as np
import pandas as pd

In [2]:

user_usage = pd.read_csv('data/user_usage.csv')
user_usage.head(3)

Out[2]:

outgoing_mins_per_month outgoing_sms_per_month monthly_mb use_id

0 21.97 4.82 1557.33 22787

1 1710.08 136.88 7267.55 22788

2 1710.08 136.88 7267.55 22789

In [6]:

user_usage.shape

Out[6]:

(240, 4)

In [3]:

user_device = pd.read_csv('data/user_device.csv')
user_device.head(3)

Out[3]:

use_id user_id platform platform_version device use_type_id

0 22782 26980 ios 10.2 iPhone7,2 2

1 22783 29628 android 6.0 Nexus 5 3

2 22784 28473 android 5.1 SM-G903F 1

In [7]:

user_device.shape

Out[7]:

(272, 6)

localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 1/5


2/24/2020 4.3_Merge_append - Jupyter Notebook

In [4]:

android_devices = pd.read_csv('data/android_devices.csv')
android_devices.head(3)

Out[4]:

Unnamed: Unnamed: U
use_id user_id platform platform_version device use_type_id
6 7

0 22782 26980 ios 10.2 iPhone7,2 2 NaN NaN

1 22783 29628 android 6.0 Nexus 5 3 NaN NaN

SM-
2 22784 28473 android 5.1 1 NaN NaN
G903F

Sample problem

We would like to determine if the usage patterns for users differ between different devices. For example, do
users using Samsung devices use more call minutes than those using LG devices?

We want to form a single dataframe with columns for user usage figures (calls per month, sms per month etc)
and also columns with device information (model, manufacturer, etc). We will need to “merge” (or “join”) our
sample datasets together into one single dataset for analysis.

Merging user_usage with user_devices

Lets see how we can correctly add the “device” and “platform” columns to the user_usage dataframe using the
Pandas Merge command.

In [5]:

result = pd.merge(user_usage,
user_device[['use_id', 'platform', 'device']],
on='use_id')
result.head()

Out[5]:

outgoing_mins_per_month outgoing_sms_per_month monthly_mb use_id platform device

GT-
0 21.97 4.82 1557.33 22787 android
I9505

SM-
1 1710.08 136.88 7267.55 22788 android
G930F

SM-
2 1710.08 136.88 7267.55 22789 android
G930F

3 94.46 35.17 519.12 22790 android D2303

SM-
4 71.59 79.26 1557.33 22792 android
G361F

localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 2/5


2/24/2020 4.3_Merge_append - Jupyter Notebook

In [8]:

result.shape

Out[8]:

(159, 6)

localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 3/5


2/24/2020 4.3_Merge_append - Jupyter Notebook

In [21]:

agg_device = result.groupby("device").count()['platform']
agg_device

Out[21]:

device
A0001 2
C6603 1
D2303 2
D5503 2
D5803 1
D6603 2
E6653 1
EVA-L09 1
F3111 4
GT-I8190N 1
GT-I9195 3
GT-I9300 2
GT-I9505 11
GT-I9506 1
GT-I9515 3
GT-N7100 2
HTC Desire 510 5
HTC Desire 530 1
HTC Desire 620 1
HTC Desire 626 2
HTC Desire 825 3
HTC One M9 1
HTC One S 2
HTC One mini 2 3
HTC One_M8 1
HUAWEI CUN-L01 1
HUAWEI VNS-L31 1
LG-H815 1
Lenovo K51c78 1
Moto G (4) 4
MotoE2(4G-LTE) 1
Nexus 5X 1
ONE A2003 2
ONEPLUS A3003 7
SM-A300FU 4
SM-A310F 2
SM-A500FU 1
SM-G360F 2
SM-G361F 5
SM-G531F 1
SM-G800F 1
SM-G900F 30
SM-G903F 2
SM-G920F 5
SM-G925F 4
SM-G930F 3
SM-G935F 5
SM-J320FN 5
SM-N9005 1
SM-N910F 5
VF-795 1
Vodafone Smart ultra 6 1
X11 2
localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 4/5
2/24/2020 4.3_Merge_append - Jupyter Notebook

iPhone6,2 1
iPhone7,2 1
Name: platform, dtype: int64

Append Dataframes

Join df1 and df2 into a single DataFrame called df

In [23]:

#df = df1.append(df2)

In [ ]:

localhost:8888/notebooks/Machine Learning/Data Analytics/4.3_Merge_append.ipynb 5/5

You might also like