0% found this document useful (0 votes)
4 views

Python Assignment 03- Sivakumar -91241460014

The document outlines an assignment for C. Sivakumar, focusing on creating datasets in CSV or JSON format and applying various Pandas and Matplotlib methods. It includes source code for reading datasets, handling missing values, and visualizing data through plots. The assignment is due on August 19, 2024.

Uploaded by

Deepa Sivakumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Python Assignment 03- Sivakumar -91241460014

The document outlines an assignment for C. Sivakumar, focusing on creating datasets in CSV or JSON format and applying various Pandas and Matplotlib methods. It includes source code for reading datasets, handling missing values, and visualizing data through plots. The assignment is due on August 19, 2024.

Uploaded by

Deepa Sivakumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Python Assignment 03 -July-2024

Name C. Sivakumar
Registration Code 91241460014
Subject Code 222MDS2405
Submission Date 19-08-2024

1. Create your own dataset in CSV or JSON file. And


Apply all Pandas methods.

Source Code:
import pandas as pd

#Read the dataset


df = pd.read_csv(r'C:\Users\anil1.kn\OneDrive - Reliance Corporate
IT Park Limited\Desktop\Assignments MSC DS\customers.csv')
df.head()

#Read the dataset through URL


df_url =
pd.read_csv("https://raw.githubusercontent.com/kimfetti/Videos/m
aster/Pandas_Tips/data/customers.csv")
df_url.head()

#Index by a column: index_col


df = pd.read_csv(
'customers.csv',
index_col = 'ID'
)
df.head()

# Specify missing value characters: na_values

df = pd.read_csv(

r'C:\Users\anil1.kn\OneDrive - Reliance Corporate IT Park


Limited\Desktop\Assignments MSC DS\customers.csv',

index_col = 'ID',na_values="?"

df.head()

df.Phone.value_counts()

df.LTV.value_counts()

df.info()

df.Phone.value_counts()

df.LTV.value_counts()

#Limiting the number of rows

df.shape

df.tail()

df = pd.read_csv(
'customers.csv',
index_col = 'ID', #index by ID column
na_values = '?', #treat ? characters as missing
nrows = 100
)

df.shape

df.tail()

Output:
Customers.CSV

2. Create your own Data set and apply Matplotlib


Methods

Source Code:

from matplotlib import pyplot as plt


plt.xkcd()

ages_x = [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35,

36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55]

py_dev_y = [20046, 17100, 20000, 24744, 30500, 37732, 41247,


45372, 48876, 53850, 57287, 63016, 65998, 70003, 70000, 71496,
75370, 83640, 84666,

84392, 78254, 85000, 87038, 91991, 100000, 94796, 97962,


93302, 99240, 102736, 112285, 100771, 104708, 108423, 101407,
112542, 122870, 120000]

plt.plot(ages_x, py_dev_y, label='Python')

plt.show()
js_dev_y = [16446, 16791, 18942, 21780, 25704, 29000, 34372,
37810, 43515, 46823, 49293, 53437, 56373, 62375, 66674, 68745,
68746, 74583, 79000,

78508, 79996, 80403, 83820, 88833, 91660, 87892, 96243,


90000, 99313, 91660, 102264, 100000, 100000, 91660, 99240,
108000, 105000, 104000]

plt.plot(ages_x, js_dev_y, label='JavaScript')

plt.show()

dev_y = [17784, 16500, 18012, 20628, 25206, 30252, 34368, 38496,


42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748,
73752, 77232, 78000, 78508, 79536, 82488, 88935, 90000, 90056,
95000, 90000, 91633, 91660, 98150, 98964, 100000, 98988, 100000,
108923, 105000, 103117]

plt.plot(ages_x, dev_y, color='#444444', linestyle='--', label='All


Devs')

plt.show()
plt.xlabel('Ages')

plt.ylabel('Median Salary (USD)')

plt.title('Median Salary (USD) by Age')

plt.legend()

plt.tight_layout()

plt.savefig('plot.png')

plt.show()
Output:

You might also like