DMV - 1 - Jupyter Notebook
DMV - 1 - Jupyter Notebook
In [5]: print(csv_data.info())
print(excel_data.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2823 entries, 0 to 2822
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ORDERNUMBER 2823 non-null int64
1 QUANTITYORDERED 2823 non-null int64
2 PRICEEACH 2823 non-null float64
3 ORDERLINENUMBER 2823 non-null int64
4 SALES 2823 non-null float64
5 ORDERDATE 2823 non-null object
6 STATUS 2823 non-null object
7 QTR_ID 2823 non-null int64
8 MONTH_ID 2823 non-null int64
9 YEAR_ID 2823 non-null int64
10 PRODUCTLINE 2823 non-null object
11 MSRP 2823 non-null int64
12 PRODUCTCODE 2823 non-null object
13 CUSTOMERNAME 2823 non-null object
14 PHONE 2823 non-null object
15 ADDRESSLINE1 2823 non-null object
16 ADDRESSLINE2 302 non-null object
17 CITY 2823 non-null object
18 STATE 1337 non-null object
19 POSTALCODE 2747 non-null object
20 COUNTRY 2823 non-null object
21 TERRITORY 1749 non-null object
22 CONTACTLASTNAME 2823 non-null object
23 CONTACTFIRSTNAME 2823 non-null object
24 DEALSIZE 2823 non-null object
dtypes: float64(2), int64(7), object(16)
memory usage: 551.5+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 390 entries, 0 to 389
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Postcode 390 non-null int64
1 Sales_Rep_ID 390 non-null int64
2 Sales_Rep_Name 390 non-null object
3 Year 390 non-null int64
4 Value 390 non-null float64
dtypes: float64(1), int64(3), object(1)
memory usage: 15.4+ KB
None
In [6]: print(json_data.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 9999 non-null int64
1 email 9999 non-null object
2 first 9999 non-null object
3 last 9999 non-null object
4 company 9999 non-null object
5 created_at 9999 non-null datetime64[ns, UTC]
6 country 9999 non-null object
dtypes: datetime64[ns, UTC](1), int64(1), object(5)
memory usage: 546.9+ KB
None
localhost:8888/notebooks/BE_PRACTICALS/DMV_1.ipynb 1/4
10/6/24, 6:55 PM DMV_1 - Jupyter Notebook
In [7]: csv_data.head()
Out[7]:
ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDATE STATUS QTR_ID MONTH_ID YEAR_ID ... ADDRESS
59
1 10121 34 81.35 5 2765.90 5/7/2003 0:00 Shipped 2 5 2003 ...
l'A
27
2 10134 41 94.74 2 3884.34 7/1/2003 0:00 Shipped 3 7 2003 ... Colone
8/25/2003 78934
3 10145 45 83.26 6 3746.70 Shipped 3 8 2003 ...
0:00
10/10/2003
4 10159 49 100.00 14 5205.27 Shipped 4 10 2003 ... 7734 Str
0:00
5 rows × 25 columns
In [8]: csv_data.columns
In [9]: excel_data.head()
Out[9]:
Postcode Sales_Rep_ID Sales_Rep_Name Year Value
In [15]: excel_data.columns
In [10]: json_data.head()
Out[10]:
id email first last company created_at country
0 1 [email protected] Torrey Veum Hilll, Mayert and Wolf 2014-12-25 04:06:27.981000+00:00 Switzerland
1 2 [email protected] Micah Sanford Stokes-Reichel 2014-07-03 16:08:17.044000+00:00 Democratic People's Republic of Korea
2 3 [email protected] Hollis Swift Rodriguez, Cartwright and Kuhn 2014-08-18 06:15:16.731000+00:00 Tunisia
3 4 [email protected] Perry Leffler Sipes, Feeney and Hansen 2014-07-10 11:31:40.235000+00:00 Chad
In [14]: json_data.columns
localhost:8888/notebooks/BE_PRACTICALS/DMV_1.ipynb 2/4
10/6/24, 6:55 PM DMV_1 - Jupyter Notebook
In [56]: combined_data.head()
Out[56]:
Country ORDERNUMBER SALES Year Value email first
In [57]: combined_data.tail()
Out[57]:
Country ORDERNUMBER SALES Year Value email first
In [58]: combined_data.shape
Out[58]: (82156, 7)
In [59]: combined_data.isna().sum()
Out[59]: Country 0
ORDERNUMBER 9700
SALES 9700
Year 9310
Value 81766
email 1538
first 1538
dtype: int64
In [60]: combined_data.dtypes
In [64]: combined_data.describe()
Out[64]:
ORDERNUMBER SALES Year
localhost:8888/notebooks/BE_PRACTICALS/DMV_1.ipynb 3/4
10/6/24, 6:55 PM DMV_1 - Jupyter Notebook
In [ ]:
localhost:8888/notebooks/BE_PRACTICALS/DMV_1.ipynb 4/4