0% found this document useful (0 votes)
3 views16 pages

Altair Basic

Uploaded by

Trà Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views16 pages

Altair Basic

Uploaded by

Trà Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

altair-basic

November 13, 2024

1 Cài đặt
[ ]: import warnings
warnings.filterwarnings('ignore')
# ẩn đi warnings

[ ]: pip install altair

Requirement already satisfied: altair in /usr/local/lib/python3.10/dist-packages


(4.2.2)
Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-
packages (from altair) (0.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages
(from altair) (3.1.4)
Requirement already satisfied: jsonschema>=3.0 in
/usr/local/lib/python3.10/dist-packages (from altair) (4.23.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages
(from altair) (1.26.4)
Requirement already satisfied: pandas>=0.18 in /usr/local/lib/python3.10/dist-
packages (from altair) (2.2.2)
Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages
(from altair) (0.12.1)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-
packages (from jsonschema>=3.0->altair) (24.2.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in
/usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair)
(2024.10.1)
Requirement already satisfied: referencing>=0.28.4 in
/usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-
packages (from jsonschema>=3.0->altair) (0.20.0)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.10/dist-packages (from pandas>=0.18->altair) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-
packages (from pandas>=0.18->altair) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-
packages (from pandas>=0.18->altair) (2024.2)
Requirement already satisfied: MarkupSafe>=2.0 in

1
/usr/local/lib/python3.10/dist-packages (from jinja2->altair) (3.0.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-
packages (from python-dateutil>=2.8.2->pandas>=0.18->altair) (1.16.0)

#Basic Chart
[ ]: import pandas as pd
data = pd.DataFrame({'a': list('CCCDDDEEE'),
'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})

[ ]: data

[ ]: a b
0 C 2
1 C 7
2 C 4
3 D 1
4 D 2
5 D 6
6 E 8
7 E 4
8 E 7

[ ]: import altair as alt


chart = alt.Chart(data)

[ ]: alt.Chart(data).mark_point()

[ ]: alt.Chart(…)

[ ]: alt.Chart(data).mark_point().encode(
x='a'
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(data).mark_point().encode(
x='a',
y='b'
)

[ ]: alt.Chart(…)

2
2 Data Transformation: Aggregation
[ ]: alt.Chart(data).mark_point().encode(
x='a',
y='average(b)' # y là trung bình của b
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(data).mark_bar().encode(
x='a',
y='average(b)'
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(data).mark_bar().encode(
y='a',
x='average(b)'
)

[ ]: alt.Chart(…)

3 Customizing your Visualization


[ ]: alt.Chart(data).mark_bar(color='firebrick').encode(
alt.Y('a'), alt.X('average(b)') # trục y và trục x
)

[ ]: alt.Chart(…)

4 Example
[ ]: !pip install vega_datasets

Requirement already satisfied: vega_datasets in /usr/local/lib/python3.10/dist-


packages (0.9.0)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages
(from vega_datasets) (2.2.2)
Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-
packages (from pandas->vega_datasets) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.10/dist-packages (from pandas->vega_datasets) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-
packages (from pandas->vega_datasets) (2024.2)

3
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-
packages (from pandas->vega_datasets) (2024.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-
packages (from python-dateutil>=2.8.2->pandas->vega_datasets) (1.16.0)

4.1 Simple Scatter Plot with Tooltips


[ ]: import altair as alt
from vega_datasets import data

source = data.cars()

alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
tooltip=['Name', 'Origin', 'Horsepower', 'Miles_per_Gallon']
).interactive()

[ ]: alt.Chart(…)

[ ]: source.head()

[ ]: Name Miles_per_Gallon Cylinders Displacement \


0 chevrolet chevelle malibu 18.0 8 307.0
1 buick skylark 320 15.0 8 350.0
2 plymouth satellite 18.0 8 318.0
3 amc rebel sst 16.0 8 304.0
4 ford torino 17.0 8 302.0

Horsepower Weight_in_lbs Acceleration Year Origin


0 130.0 3504 12.0 1970-01-01 USA
1 165.0 3693 11.5 1970-01-01 USA
2 150.0 3436 11.0 1970-01-01 USA
3 150.0 3433 12.0 1970-01-01 USA
4 140.0 3449 10.5 1970-01-01 USA

[ ]: alt.Chart(source).mark_circle(size=60).encode(
x='Weight_in_lbs',
y='Miles_per_Gallon',
color='Cylinders',
tooltip=['Name','Weight_in_lbs', 'Miles_per_Gallon']
).interactive()

[ ]: alt.Chart(…)

4
[ ]: alt.Chart(source).mark_circle(size=60).encode(
x='Cylinders',
y='Miles_per_Gallon',
color = 'Origin',
tooltip=['Name','Miles_per_Gallon', 'Origin']
)

[ ]: alt.Chart(…)

4.2 Simple Bar Chart


[ ]: import altair as alt
import pandas as pd

source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
x='a', y='b'
).interactive()

[ ]: alt.Chart(…)

[ ]:

4.3 Simple heat map


[ ]: import altair as alt
import numpy as np
import pandas as pd

# Compute x^2 + y^2 across a 2D grid


x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2

# Convert this grid to columnar data expected by Altair


source = pd.DataFrame({'x': x.ravel(),
'y': y.ravel(),
'z': z.ravel()})

alt.Chart(source).mark_rect().encode(
x='x:O',
y='y:O',
color='z:Q'

5
)

[ ]: alt.Chart(…)

[ ]: import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-50, 50), range(-50, 50))
z = x ** 2 + y ** 2
ax.plot_surface(x, y, z, cmap=plt.cm.YlGnBu_r)
plt.show()

4.4 Simple Histogram


[ ]: import altair as alt
from vega_datasets import data

source = data.movies.url

6
alt.Chart(source).mark_bar().encode(
alt.X("IMDB_Rating:Q", bin=True),
y='count()',
)
# Vẽ mark_bar() (biểu đồ cột) với y = 'count()' -->histogram

[ ]: alt.Chart(…)

[ ]: source

[ ]: 'https://cdn.jsdelivr.net/npm/[email protected]/data/movies.json'

[ ]: # Vẽ biểu đồ hist cho data sau:


import pandas as pd
data = {'Diem':range(11),
'Soluong':[5, 6, 8, 8, 10, 10, 20, 21, 15, 14, 10]}

df = pd.DataFrame(data)
df

[ ]: Diem Soluong
0 0 5
1 1 6
2 2 8
3 3 8
4 4 10
5 5 10
6 6 20
7 7 21
8 8 15
9 9 14
10 10 10

[ ]: import altair as alt


alt.Chart(df).mark_bar().encode(
alt.X("Diem:Q", bin=True),
y='Soluong'
)

[ ]: alt.Chart(…)

4.5 Simple line chart


[ ]: import altair as alt
import numpy as np
import pandas as pd

7
x = np.arange(100)# 0--->99
source = pd.DataFrame({
'x': x,
'f(x)': np.sin(x / 5) #sin, cos, exp... -->numpy
})

alt.Chart(source).mark_line().encode(
x='x',
y='f(x)'
)

[ ]: alt.Chart(…)

[ ]: #Vẽ đồ thị y = sin(x)+cos(x) trong khoảng từ 0 -->10 bằng altair


import numpy as np
import altair as alt
import pandas as pd

x = range(0, 11)
y = np.sin(x)+np.cos(x)
df = pd.DataFrame({'x':x, 'f(x)':y})
df

alt.Chart(df).mark_line().encode(
x='x',
y='f(x)'
)

[ ]: alt.Chart(…)

[ ]: source.head()

[ ]: x f(x)
0 0 0.000000
1 1 0.198669
2 2 0.389418
3 3 0.564642
4 4 0.717356

4.6 Simple Stacked Area Chart


[ ]: import altair as alt
from vega_datasets import data

source = data.iowa_electricity()

alt.Chart(source).mark_area().encode(

8
x="year:T",
y="net_generation:Q",
color="source:N"
)
# mark_area(): biểu đồ diện tích

[ ]: alt.Chart(…)

[ ]: source

[ ]: year source net_generation


0 2001-01-01 Fossil Fuels 35361
1 2002-01-01 Fossil Fuels 35991
2 2003-01-01 Fossil Fuels 36234
3 2004-01-01 Fossil Fuels 36205
4 2005-01-01 Fossil Fuels 36883
5 2006-01-01 Fossil Fuels 37014
6 2007-01-01 Fossil Fuels 41389
7 2008-01-01 Fossil Fuels 42734
8 2009-01-01 Fossil Fuels 38620
9 2010-01-01 Fossil Fuels 42750
10 2011-01-01 Fossil Fuels 39361
11 2012-01-01 Fossil Fuels 37379
12 2013-01-01 Fossil Fuels 34873
13 2014-01-01 Fossil Fuels 35250
14 2015-01-01 Fossil Fuels 32319
15 2016-01-01 Fossil Fuels 28437
16 2017-01-01 Fossil Fuels 29329
17 2001-01-01 Nuclear Energy 3853
18 2002-01-01 Nuclear Energy 4574
19 2003-01-01 Nuclear Energy 3988
20 2004-01-01 Nuclear Energy 4929
21 2005-01-01 Nuclear Energy 4538
22 2006-01-01 Nuclear Energy 5095
23 2007-01-01 Nuclear Energy 4519
24 2008-01-01 Nuclear Energy 5282
25 2009-01-01 Nuclear Energy 4679
26 2010-01-01 Nuclear Energy 4451
27 2011-01-01 Nuclear Energy 5215
28 2012-01-01 Nuclear Energy 4347
29 2013-01-01 Nuclear Energy 5321
30 2014-01-01 Nuclear Energy 4152
31 2015-01-01 Nuclear Energy 5243
32 2016-01-01 Nuclear Energy 4703
33 2017-01-01 Nuclear Energy 5214
34 2001-01-01 Renewables 1437
35 2002-01-01 Renewables 1963

9
36 2003-01-01 Renewables 1885
37 2004-01-01 Renewables 2102
38 2005-01-01 Renewables 2724
39 2006-01-01 Renewables 3364
40 2007-01-01 Renewables 3870
41 2008-01-01 Renewables 5070
42 2009-01-01 Renewables 8560
43 2010-01-01 Renewables 10308
44 2011-01-01 Renewables 11795
45 2012-01-01 Renewables 14949
46 2013-01-01 Renewables 16476
47 2014-01-01 Renewables 17452
48 2015-01-01 Renewables 19091
49 2016-01-01 Renewables 21241
50 2017-01-01 Renewables 21933

[ ]: data = {'Nam':[2020, 2021, 2022, 2023, 2024, 2020, 2021, 2022, 2023, 2024],
'Diem':[5, 6, 6, 8, 9, 5, 5, 7, 6, 5],
'Mon':['T', 'T', 'T', 'T', 'T', 'V', 'V', 'V', 'V', 'V']}

df = pd.DataFrame(data)
df

[ ]: Nam Diem Mon


0 2020 5 T
1 2021 6 T
2 2022 6 T
3 2023 8 T
4 2024 9 T
5 2020 5 V
6 2021 5 V
7 2022 7 V
8 2023 6 V
9 2024 5 V

4.7 Simple Strip plot


[ ]: import altair as alt
from vega_datasets import data

source = data.cars()

alt.Chart(source).mark_tick().encode(
x='Horsepower:Q',
y='Cylinders:O'
)

10
[ ]: alt.Chart(…)

Ngoài ra, có thể tham khảo thêm ở đây: https://altair-viz.github.io/gallery/index.html

5 IDMB data
[ ]: !pip install vega_datasets

import pandas as pd
import altair as alt

Requirement already satisfied: vega_datasets in /usr/local/lib/python3.10/dist-


packages (0.9.0)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages
(from vega_datasets) (2.2.2)
Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-
packages (from pandas->vega_datasets) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.10/dist-packages (from pandas->vega_datasets) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-
packages (from pandas->vega_datasets) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-
packages (from pandas->vega_datasets) (2024.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-
packages (from python-dateutil>=2.8.2->pandas->vega_datasets) (1.16.0)

[ ]: # Importing the Vega Dataset


from vega_datasets import data as vega_data

movies_df = pd.read_json(vega_data.movies.url)

# Checking the type of data that we get


print("movies_df is of the type: ", type(movies_df))

print("movies_df: ", movies_df.shape)

movies_df is of the type: <class 'pandas.core.frame.DataFrame'>


movies_df: (3201, 16)

[ ]: movies_df.head(5)

[ ]: Title US_Gross Worldwide_Gross US_DVD_Sales \


0 The Land Girls 146083.0 146083.0 NaN
1 First Love, Last Rites 10876.0 10876.0 NaN
2 I Married a Strange Person 203134.0 203134.0 NaN
3 Let's Talk About Sex 373615.0 373615.0 NaN
4 Slam 1009819.0 1087521.0 NaN

11
Production_Budget Release_Date MPAA_Rating Running_Time_min Distributor \
0 8000000.0 Jun 12 1998 R NaN Gramercy
1 300000.0 Aug 07 1998 R NaN Strand
2 250000.0 Aug 28 1998 None NaN Lionsgate
3 300000.0 Sep 11 1998 None NaN Fine Line
4 1000000.0 Oct 09 1998 R NaN Trimark

Source Major_Genre Creative_Type Director \


0 None None None None
1 None Drama None None
2 None Comedy None None
3 None Comedy None None
4 Original Screenplay Drama Contemporary Fiction None

Rotten_Tomatoes_Rating IMDB_Rating IMDB_Votes


0 NaN 6.1 1071.0
1 NaN 6.9 207.0
2 NaN 6.8 865.0
3 13.0 NaN NaN
4 62.0 3.4 165.0

[ ]: movies_df.columns #Tên các columns (tên biến - cột)

[ ]: Index(['Title', 'US_Gross', 'Worldwide_Gross', 'US_DVD_Sales',


'Production_Budget', 'Release_Date', 'MPAA_Rating', 'Running_Time_min',
'Distributor', 'Source', 'Major_Genre', 'Creative_Type', 'Director',
'Rotten_Tomatoes_Rating', 'IMDB_Rating', 'IMDB_Votes'],
dtype='object')

[ ]: def extract_year(value):
return pd.to_datetime(value, format='%b %d %Y').year
# chỉ lấy ra year của cột Release_Date --> convert sang datetime

movies_df["Year"] = movies_df["Release_Date"].apply(extract_year)

[ ]: movies_df.columns

[ ]: Index(['Title', 'US_Gross', 'Worldwide_Gross', 'US_DVD_Sales',


'Production_Budget', 'Release_Date', 'MPAA_Rating', 'Running_Time_min',
'Distributor', 'Source', 'Major_Genre', 'Creative_Type', 'Director',
'Rotten_Tomatoes_Rating', 'IMDB_Rating', 'IMDB_Votes', 'Year'],
dtype='object')

[ ]: movies_df.head()

12
[ ]: Title US_Gross Worldwide_Gross US_DVD_Sales \
0 The Land Girls 146083.0 146083.0 NaN
1 First Love, Last Rites 10876.0 10876.0 NaN
2 I Married a Strange Person 203134.0 203134.0 NaN
3 Let's Talk About Sex 373615.0 373615.0 NaN
4 Slam 1009819.0 1087521.0 NaN

Production_Budget Release_Date MPAA_Rating Running_Time_min Distributor \


0 8000000.0 Jun 12 1998 R NaN Gramercy
1 300000.0 Aug 07 1998 R NaN Strand
2 250000.0 Aug 28 1998 None NaN Lionsgate
3 300000.0 Sep 11 1998 None NaN Fine Line
4 1000000.0 Oct 09 1998 R NaN Trimark

Source Major_Genre Creative_Type Director \


0 None None None None
1 None Drama None None
2 None Comedy None None
3 None Comedy None None
4 Original Screenplay Drama Contemporary Fiction None

Rotten_Tomatoes_Rating IMDB_Rating IMDB_Votes Year


0 NaN 6.1 1071.0 1998
1 NaN 6.9 207.0 1998
2 NaN 6.8 865.0 1998
3 13.0 NaN NaN 1998
4 62.0 3.4 165.0 1998

[ ]: movies_df["Year"].value_counts()

[ ]: Year
2006 220
2005 210
2002 208
2004 192
2000 188

1929 1
2020 1
1946 1
2043 1
1943 1
Name: count, Length: 91, dtype: int64

[ ]: movies_2000 = movies_df[movies_df["Year"] == 2000]


movies_2000.shape

13
[ ]: (188, 17)

5.1 Chart
[ ]: alt.Chart(movies_2000).mark_point().encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross')
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(movies_2000).mark_point(filled=True).encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross'),
alt.Size('US_Gross')
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(movies_2000).mark_point(filled=True).encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross'),
alt.Size('US_Gross'),
alt.Color('Major_Genre'),
alt.OpacityValue(0.7)
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(movies_2000).mark_point(filled=True).encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross'),
alt.Size('US_Gross'),
alt.Color('Major_Genre'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('Title'),
alt.Tooltip('Production_Budget'),
alt.Tooltip('Worldwide_Gross'),
alt.Tooltip('US_Gross')
]
)

[ ]: alt.Chart(…)

[ ]: alt.Chart(movies_2000).mark_point(filled=True).encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross'),

14
alt.Size('US_Gross'),
alt.Color('Major_Genre'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('Title'),
alt.Tooltip('Production_Budget'),
alt.Tooltip('Worldwide_Gross'),
alt.Tooltip('US_Gross')
]
).interactive()

[ ]: alt.Chart(…)

[ ]: select_year = alt.selection_single(
name='Select', fields=['Year'], init={'Year': 1928},
bind=alt.binding_range(min=1928, max=2046, step=10)
)

alt.Chart(movies_df).mark_point(filled=True).encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross'),
alt.Size('US_Gross'),
alt.Color('Major_Genre'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('Title:N'),
alt.Tooltip('Production_Budget:Q'),
alt.Tooltip('Worldwide_Gross:Q'),
alt.Tooltip('US_Gross:Q')
]
).add_selection(select_year).transform_filter(select_year)

[ ]: alt.Chart(…)

[ ]: select_year = alt.selection_single(
name='Select', fields=['Year'], init={'Year': 1968},
bind=alt.binding_range(min=1968, max=2008, step= 1)
)

alt.Chart(movies_df).mark_point(filled=True).encode(
alt.X('Production_Budget'),
alt.Y('Worldwide_Gross'),
alt.Size('US_Gross'),
alt.Color('Major_Genre'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('Title:N'),
alt.Tooltip('Production_Budget:Q'),
alt.Tooltip('Worldwide_Gross:Q'),
alt.Tooltip('US_Gross:Q')

15
]
).add_selection(select_year).transform_filter(select_year)

[ ]: alt.Chart(…)

16

You might also like