Pratikum Visualisasi Data-Dikonversi
Pratikum Visualisasi Data-Dikonversi
https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-
python/
Correlation
1. Scatter plot
Scatteplot adalah plot klasik dan fundamental untuk melihat relationship antara variables. Jika
Anda memiliki multiple (lebih dari satu) grup dalam data Anda, dengan plotnine Anda dapat
memvisualisasikan tiap grup dengan warna yang berbeda.
Menyiapkan Data
... ... ... ... ... ... ... ... ... ... ... ...
In [3]: g = (
ggplot(mpg, aes(x = 'hwy', y ='cty', color = 'trans')) #koordinat x = v
ariabel hwy, koordinat y= cty, warna berdasar var trans
+ labs(x="Highway Miles per Gallon", y="City Mileage in Miles per
Galo n") # memberi nama koordinat x dan y
+ labs(color = 'transmision')# memberi nama legend
+ ggtitle("Fuel Consumption")# memberi judul
)
g
Out[3]: <ggplot: (-9223371880167378888)>
Anda dapat memberikan pengaturan tambahan untuk plot Anda dengan >> theme
Tidak semua code dalam theme perlu dituliskan. Tulis sesuai kebutuhan Anda saja, CONTOH:
In [7]: g + geom_point() +
theme( panel_background=element_rect(fill='gray',
alpha=.2), figure_size=(12, 12),
aspect_ratio=1/3)
https://plotnine.readthedocs.io/en/stable/generated/plotnine.themes.theme.html
BUBBLE PLOT
sama seperti scatter plot sebelumnya, hanya kita tambahkan ukuran dot nya berdasar variabel
numerik tertentu. Misal pada scatter plot di atas ukuran dot dibuat berdasarka variabel cty, maka
tuliskan code berikut:
In [15]: (
ggplot(mpg, aes(x = 'hwy', y ='cty', color = 'trans', size = 'cty')) #t ambahkan size disini
+ labs(x="Highway Miles per Gallon", y="City Mileage in Miles per Galo n")
+ labs(color = 'Cylinder')
+ ggtitle("Fuel Consumption")
+ guides(size = False) #karena variabel penentu size sama dengan koordi nat y, kita tidak perlu tampilkan sebagai legend
+ geom_point()
)
In [16]: (
ggplot(mpg, aes(x = 'hwy', y ='cty', color = 'cyl', size = 'cty'))
+ labs(x="Highway Miles per Gallon", y="City Mileage in Miles per Galo n")
+ labs(color = 'Cylinder')
+ ggtitle("Fuel Consumption")
+ guides(size = False)
+ geom_point()
+ scale_color_gradient(low='green', high='lightgreen') #seting warna
)
In [17]: (
ggplot(mpg, aes(x = 'hwy', y ='cty', color = 'cyl', size ='cyl',shape = 'class'))
+ labs(x="Highway Miles per Gallon", y="City Mileage in Miles per Galo n")
+ labs(size = 'Cylinder')
+ labs(shape = 'Class')
+ ggtitle("Fuel Consumption")
+ guides(size = False)
+ theme ( legend_position=(.7, .2), legend_direction='horizontal',
legend_background = element_rect(color='gray', size=2, fill='gray', alpha=.0),
panel_background = element_rect(fill='gray', alpha=.2), figure_size=(12, 12),
aspect_ratio=1/1
)
+ geom_point(alpha=0.5) #mengatur transparansi dot agar data yang tumpa ng tidndih lebih keliatan
)
Out[17]: <ggplot: (-9223371880123197472)>
nama koordinat x pada diagram batang di atas saling tumpang tindih, kita dapat mengatasinya
dengan menukar posisi manufactur dan count dengan coord_flip()
In [19]: (
ggplot(mpg,aes(x='manufacturer', fill='manufacturer'))
+ geom_bar(show_legend=False)
+ coord_flip()
In [20]: (
ggplot(mpg,aes(x='manufacturer', fill='manufacturer'))
+ geom_bar(show_legend=False)
+ coord_flip()
+ theme_classic()
)
In [21]: (
ggplot(mpg,aes(x='manufacturer', fill='manufacturer'))
+ geom_bar(show_legend=False)
+ coord_flip()
+ theme_minimal()
)
Out[21]: <ggplot: (-9223371880123198552)>
In [22]: (
ggplot(mpg,aes(x='manufacturer', fill='manufacturer'))
+ geom_bar(show_legend=False)
+ coord_flip()
+ theme_xkcd()
)
Out[22]: <ggplot: (-9223371880122933564)>
WRAP
In [23]: (
ggplot(mpg,aes(x='manufacturer', fill='manufacturer'))
+ facet_wrap("year")
+ geom_bar(show_legend=False)
+ coord_flip()
)
Out[23]: <ggplot: (-9223371880122518516)>
In [24]: (
ggplot(mpg,aes(x='manufacturer', fill='class'))
+ facet_grid("year~cyl")
+ geom_bar()
+ coord_flip()
)
Out[24]: <ggplot: (-9223371880122904608)>
Out[25]:
manufacturer model displ year cyl trans drv cty hwy fl class
... ... ... ... ... ... ... ... ... ... ... ...
In [28]: mpg2['manufacturer']
Out[28]: 0 audi
1 audi
2 audi
3 audi
4 audi
...
229 volkswagen
230 volkswagen
231 volkswagen
232 volkswagen
233 volkswagen
Name: manufacturer, Length: 234, dtype: category
Categories (15, object): [dodge < toyota < volkswagen < ford ... pontia
c < mercury < land rover < lincoln]
In [29]: (
ggplot(mpg2,aes(x='manufacturer', fill='manufacturer'))
+ geom_bar(show_legend=False)
+ coord_flip()
+ theme_xkcd()
+ ggtitle('Number of Cars by Manufacturer')
)
Out[29]: <ggplot: (-9223371880122834348)>
Out[44]:
date location new cases new deaths total cases total deaths
date location new_cases new_deaths total_cases total_deaths
0 2019-12-31 Afghanistan 0 0 0 0
1 2020-01-01 Afghanistan 0 0 0 0
2 2020-01-02 Afghanistan 0 0 0 0
3 2020-01-03 Afghanistan 0 0 0 0
4 2020-01-04 Afghanistan 0 0 0 0
0 2019-12-31 Indonesia 0 0 0 0
1 2020-01-01 Indonesia 0 0 0 0
2 2020-01-02 Indonesia 0 0 0 0
3 2020-01-03 Indonesia 0 0 0 0
4 2020-01-04 Indonesia 0 0 0 0
87 rows × 6 columns
C:\Users\Tissar\Anaconda3\lib\site-packages\plotnine\geoms\geom_path.p
y:83: PlotnineWarning: geom_path: Each group consist of only one observ
ation. Do you need to adjust the group aesthetic?
"group aesthetic?", PlotnineWarning)
Out[46]: <ggplot: (-9223371942254681300)>
COSTUMIZE DATE
In [48]: IndonesiaTS=Indonesia.copy() #copy dataframe Indonesia
IndonesiaTS.date = pd.to_datetime(IndonesiaTS.date)
IndonesiaTS.set_index('date', inplace=True) #mengubah date menjadi
inde x
IndonesiaTS
Out[48]:
location new_cases new_deaths total_cases total_deaths
date location new_cases new_deaths total_cases total_deaths
date
2019-12-31 Indonesia 0 0 0 0
2020-01-01 Indonesia 0 0 0 0
2020-01-02 Indonesia 0 0 0 0
2020-01-03 Indonesia 0 0 0 0
2020-01-04 Indonesia 0 0 0 0
87 rows × 5 columns
Out[49]:
location new_cases new_deaths total_cases total_deaths
date
2020-03-01 Indonesia 0 0 0 0
2020-03-02 Indonesia 2 0 2 0
date
2020-03-07 Indonesia 2 0 4 0
2020-03-09 Indonesia 2 0 6 0
2020-03-11 Indonesia 13 0 19 0
2020-03-12 Indonesia 15 1 34 1
2020-03-14 Indonesia 35 3 69 4
2020-03-15 Indonesia 27 0 96 4
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
2020-04-02 Indonesia 149 21 1677 157
0 2020-03-01 Indonesia 0 0 0 0
1 2020-03-02 Indonesia 2 0 2 0
2 2020-03-07 Indonesia 2 0 4 0
3 2020-03-09 Indonesia 2 0 6 0
4 2020-03-11 Indonesia 13 0 19 0
5 2020-03-12 Indonesia 15 1 34 1
6 2020-03-14 Indonesia 35 3 69 4
7 2020-03-15 Indonesia 27 0 96 4
In [51]: (
ggplot(IndonesiaTSDate_resetindex, aes(x='date', y='new_cases'))
+ geom_line(color='red')
+ ggtitle('Corona New Cases in Indonesia')
+ theme ( legend_position=(.7, .175),#mengatur posisi legend, silahkan
coba dirubah dan lihat pergeserannya
legend_direction='horizontal',#mengatur arah legend. Pilihan
vertic al dan horizontal. Default: vertical. Jadi klo anda ingi legend
anda ve rtikal Anda tidak perlu menuliskan code legend_direction
legend_background = element_rect(color='gray', size=10, fill='gray'
, alpha=.1),#mengatur warna border legend, ketebalan border legend, wa
rna background legend,transparancy background legend
panel_background = element_rect(fill='gray', alpha=.2),#warna
backg round plot,transparancy background plot
figure_size=(12, 12), # mengatur ukuran plot dalam inches
aspect_ratio=1/1
)
)
Out[51]: <ggplot: (-9223371942257239412)>
MEMBUAT PLOT TIME SERIES DENGAN DUA
SKALA DATA
In [17]: x = Indonesia['date']
y1 = Indonesia['new_cases']
y2 = Indonesia['total_cases']
# Decorations
# ax1 (left Y axis)
ax1.set_xlabel('Date', fontsize=20)
ax1.tick_params(axis='x', rotation=0, labelsize=12)
ax1.set_ylabel('New Cases', color='tab:red', fontsize=20)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
ax1.grid(alpha=.4)
fig.tight_layout()
plt.show()
PLOT TS DUA SKALA UNTUK TANGGAL 2
MARET 2020 HINGGA HARI INI
In [15]: IndonesiaTSDate_resetindex = IndonesiaTSDate.reset_index()
IndonesiaTSDate_resetindex
Out[15]:
date location new_cases new_deaths total_cases total_deaths
0 2020-03-01 Indonesia 0 0 0 0
1 2020-03-02 Indonesia 2 0 2 0
2 2020-03-07 Indonesia 2 0 4 0
3 2020-03-09 Indonesia 2 0 6 0
date location new_cases new_deaths total_cases total_deaths
4 2020-03-11 Indonesia 13 0 19 0
5 2020-03-12 Indonesia 15 1 34 1
6 2020-03-14 Indonesia 35 3 69 4
7 2020-03-15 Indonesia 27 0 96 4
In [54]: x = IndonesiaTSDate_resetindex['date']
y1 = IndonesiaTSDate_resetindex['new_cases'] y2 = IndonesiaTSDate_resetindex['total_cases']
# Plot Line1 (Left Y Axis)
fig, ax1 = plt.subplots(1,1,figsize=(12,6), dpi= 180)
ax1.plot(x, y1, color='tab:red')
# Decorations
# ax1 (left Y axis)
ax1.set_xlabel('Date', fontsize=20)
ax1.tick_params(axis='x', rotation=0, labelsize=12)
ax1.set_ylabel('New Cases', color='tab:red', fontsize=20)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
ax1.grid(alpha=.2)
fig.tight_layout()
plt.show()
sumber data: h
ttps://covid.ourworldindata.org/data/ecdc/full_data.csv
In [ ]:
In [ ]: