Ds 1
Ds 1
ipynb - Colab
Choose Files No file chosen Upload widget is only available when the cell has been executed
in the current browser session. Please rerun this cell to enable.
Saving search engine data csv to search engine data (1) csv
data = pd.read_csv("search_engine_data.csv")
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 1/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 2/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that art
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 3/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that art
# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Matrix Heatmap")
plt.show()
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 4/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 5/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 6/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
# Summing usage percentages for each search engine across the dataset
global_market_share = data.iloc[:, 1:].sum() # Skip 'Date' column
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 7/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
# Define subsets
sizes = [10, 100, 1000, 10000]
point_sizes = [50, 20, 10, 5] # Experiment with these for better visualization
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 8/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 9/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 10/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 11/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 12/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
# Generate plots
for i, style in enumerate(styles, start=1):
plt.figure(figsize=(8, 6))
plt.plot(x, y, color=style.get('color', 'blue'),
linestyle=style.get('linestyle', '-'),
linewidth=style.get('linewidth', 1),
marker=style.get('marker', None),
alpha=style.get('alpha', 1))
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 13/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 14/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 15/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 16/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 17/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 18/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 19/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
Searched 1 site I’ve reviewed several examples of charts and visualizations from news and
science websites. Below are ten examples, split equally into "good" and "bad," critiqued on the
speci ed dimensions. Here's an overview:
(a) Effective in presenting data clearly and interactively. (b) No evident bias; purely data-driven. (c)
Minimal chartjunk, focused design. (d) Axes aren't applicable, but labels are clear. (e) Excellent use
of gradients to represent intensity. (f) Improve by adding key explanations for better accessibility
【7】. Interactive Train Travel Time Map (Netherlands)
(a) Simpli es complex travel-time data effectively. (b) Neutral presentation. (c) No chartjunk. (d)
Clear, precise labeling. (e) Excellent use of color coding for time zones. (f) Add additional context
for international users【7】. City Heatmap Activity (Berlin)
(a) Clearly highlights urban activity patterns. (b) Objective. (c) Minimal distractions. (d) Informative
and appropriately scaled axes. (e) Vibrant but not overwhelming. (f) Add interactivity to explore
speci c areas【7】. Air Pollution Coiled Chart (China)
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 20/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
(a) Illustrates improvement effectively. (b) Possible accidental bias favoring improvement. (c)
Aesthetically dense but functional. (d) Labeled effectively. (e) Thoughtful use of soft tones for
trends. (f) Simplify the coiled design to enhance readability【7】. Global Twitter Reply Map
(a) Excellent at visualizing global communication. (b) Neutral. (c) Clean and elegant. (d) Minimal,
but clear context provided. (e) Colors are intuitive for connections. (f) Incorporate lters for data
density【7】. Bad Charts 3D Bar Chart for Population Growth
(a) Misleading perspective distortions. (b) Unintended bias due to poor scaling. (c) Chartjunk in
unnecessary 3D effects. (d) Axis labels hard to interpret. (e) Overly bright, di cult to read. (f)
Redesign as a simple 2D chart【7】. Glucose Levels and Activity Chart (Diabetic Study)
(a) Overly dense; hard to discern trends. (b) Neutral. (c) Cluttered design. (d) Axis labels unclear in
parts. (e) Color choices don't help distinguish key data. (f) Simplify layers and highlight key
correlations【7】. Hurricane Instagram Photos Chart
(a) Ineffective at presenting meaningful insights. (b) Possible accidental emphasis on social
media usage. (c) Excessive visuals without purpose. (d) Poor labeling of axes. (e) Overuse of
bright, distracting colors. (f) Focus on relevant data and simplify design【7】. Daily Routine
Visualization
(a) Too complex to interpret at a glance. (b) None. (c) Distracting artistic elements. (d) Labeled
inconsistently. (e) Mixed palette reduces clarity. (f) Remove artistic overlays, focus on utility【7】.
Sunset Streets (NYC)
(a) Poorly scaled; lacks practical use. (b) Accidental bias toward aesthetics. (c) Contains
unnecessary details. (d) Axis labels unclear. (e) Overly bright, lacks depth. (f) Make design
functional, reduce decorative elements【7】.
1. 3D Pie Chart
Why it’s bad: Pie charts are already hard enough to interpret, but a 3D pie chart takes the confusion
to a whole new level. The 3D effect distorts the proportions of the slices, making it nearly
impossible to compare them accurately. Why it’s amusing: It's like trying to read a map with a giant
smudge over the place you're trying to nd. You don’t even know if the pieces are true to size
because the angles are messing with your perception. It’s like asking someone to guess the size of
a cake by looking at a warped photo of it.
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 21/22
11/20/24, 11:44 PM Untitled45.ipynb - Colab
2. Bar Chart with Random Height Bars Why it’s bad: Imagine a bar chart where the bars are
randomly scattered across the graph—some very tall, some tiny, with no logic behind the
numbers they represent. Why it’s amusing: It's like they took a perfectly good idea (a bar
chart) and decided, "Let’s make it as random as possible, because why not?" This chart turns
data analysis into an exercise in complete confusion—like someone trying to explain a map
using abstract art.
3. "Data" Filled with 3D Clip Art Why it’s bad: Some visualizations use 3D clip art instead of
actual data points, like putting a picture of a house in place of an actual number. The chart
has pictures of things like houses or cars instead of bar heights or plotted points. Why it’s
amusing: It’s like trying to make sense of a spreadsheet that’s been replaced with emojis. It
looks cool, but you have absolutely no idea what’s going on. Imagine trying to get an answer
from a data visualization that looks like a picture book.
4. Unlabeled Axis on a Line Graph Why it’s bad: A line graph without axis labels is like a race
without a nish line. Without labels or a title, you’re left to guess what the lines represent,
making the entire chart pointless. Why it’s amusing: It's like showing someone a picture of a
desert with no context, and saying, "Look, here's the data!" You have no idea what the graph
is about or even what units you're looking at—it's like a puzzle with no clues.
5. A Venn Diagram with Too Many Circles Why it’s bad: Venn diagrams are meant to show
relationships between two or three categories. But sometimes you’ll see a Venn diagram
with 10 or 20 circles, all overlapping, making it impossible to gure out where the
commonalities lie. Why it’s amusing: It’s like a child trying to draw a simple diagram but
adding so many circles that it turns into a chaotic mess. The result is a confusing swirl of
overlap where logic is lost, and all you can do is laugh at the absurdity.
Choose Files No file chosen Upload widget is only available when the cell has been executed
in the current browser session. Please rerun this cell to enable.
Saving Bengaluru House Data csv to Bengaluru House Data csv
data = pd.read_csv("Bengaluru_House_Data.csv")
import pandas as pd
https://colab.research.google.com/drive/19ampz5xV5VES2zya95Vg5q8TPhhIMi5E#scrollTo=09dd5HyzTQuA&printMode=true 22/22
11/20/24, 11:38 PM Untitled37.ipynb - Colab
q6) Analyze the movie data set (of your choice). Which types of movies are most likely to succeed in the market?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files # Only needed for Google Colab; skip if not using Colab
uploaded = files.upload() # Opens a file dialog for upload
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 1/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.
Saving movie_dataset.csv to movie_dataset (3).csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 index 4803 non-null int64
1 budget 4803 non-null int64
2 genres 4775 non-null object
3 homepage 1712 non-null object
4 id 4803 non-null int64
5 keywords 4391 non-null object
6 original_language 4803 non-null object
7 original_title 4803 non-null object
8 overview 4800 non-null object
9 popularity 4803 non-null float64
10 production_companies 4803 non-null object
11 production_countries 4803 non-null object
12 release_date 4802 non-null object
13 revenue 4803 non-null int64
14 runtime 4801 non-null float64
15 spoken_languages 4803 non-null object
16 status 4803 non-null object
17 tagline 3959 non-null object
18 title 4803 non-null object
19 vote_average 4803 non-null float64
20 vote_count 4803 non-null int64
21 cast 4760 non-null object
22 crew 4803 non-null object
23 director 4773 non-null object
dtypes: float64(3), int64(5), object(16)
memory usage: 900.7+ KB
None
index budget genres \
0 0 237000000 Action Adventure Fantasy Science Fiction
1 1 300000000 Adventure Fantasy Action
2 2 245000000 Action Adventure Crime
3 3 250000000 Action Crime Drama Thriller
4 4 260000000 Action Adventure Science Fiction
homepage id \
0 http://www.avatarmovie.com/ 19995
1 http://disney.go.com/disneypictures/pirates/ 285
2 http://www.sonypictures.com/movies/spectre/ 206647
3 http://www.thedarkknightrises.com/ 49026
4 http://movies.disney.com/john-carter 49529
keywords original_language \
0 culture clash future space war space colony so... en
1 ocean drug abuse exotic island east india trad... en
2 spy based on novel secret agent sequel mi6 en
3 dc comics crime fighter terrorist secret ident... en
4 based on novel mars medallion space travel pri... en
original_title \
0 Avatar
1 Pirates of the Caribbean: At World's End
2 Spectre
3 The Dark Knight Rises
4 John Carter
spoken_languages status \
0 [{"iso_639_1": "en", "name": "English"}, {"iso... Released
1 [{"iso_639_1": "en", "name": "English"}] Released
2 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released
3 [{"iso_639_1": "en", "name": "English"}] Released
4 [{"iso_639_1": "en", "name": "English"}] Released
tagline \
0 Enter the World of Pandora.
1 At the end of the world, the adventure begins.
2 A Plan No One Escapes
3 The Legend Ends
4 Lost in our world, found in another.
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 2/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
title vote_average vote_count \
0 Avatar 7.2 11800
1 Pirates of the Caribbean: At World's End 6.9 4500
2 Spectre 6.3 4466
3 The Dark Knight Rises 7.6 9106
4 John Carter 6.1 2124
cast \
0 Sam Worthington Zoe Saldana Sigourney Weaver S...
1 Johnny Depp Orlando Bloom Keira Knightley Stel...
2 Daniel Craig Christoph Waltz L\u00e9a Seydoux ...
3 Christian Bale Michael Caine Gary Oldman Anne ...
4 Taylor Kitsch Lynn Collins Samantha Morton Wil...
crew director
0 [{'name': 'Stephen E. Rivkin', 'gender': 0, 'd... James Cameron
1 [{'name': 'Dariusz Wolski', 'gender': 2, 'depa... Gore Verbinski
2 [{'name': 'Thomas Newman', 'gender': 2, 'depar... Sam Mendes
3 [{'name': 'Hans Zimmer', 'gender': 2, 'departm... Christopher Nolan
4 [{'name': 'Andrew Stanton', 'gender': 2, 'depa... Andrew Stanton
[5 rows x 24 columns]
# Remove rows with missing 'budget' or 'revenue' since they are critical to analysis
movies.dropna(subset=['budget', 'revenue'], inplace=True)
index 0
budget 0
genres 28
homepage 3091
id 0
keywords 412
original_language 0
original_title 0
overview 3
popularity 0
production_companies 0
production_countries 0
release_date 1
revenue 0
runtime 2
spoken_languages 0
status 0
tagline 844
title 0
vote_average 0
vote_count 0
cast 43
crew 0
director 30
dtype: int64
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 3/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 4/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 5/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
plt.ylabel('Average Revenue')
plt.show()
average_revenue average_profit_margin
director
Chris Buck 1.274219e+09 7.494793
Kyle Balda 1.156731e+09 14.631499
Lee Unkrich 1.066970e+09 4.334849
Joss Whedon 9.879437e+08 3.307678
Chris Renaud 8.759583e+08 10.679444
James Cameron 8.405099e+08 6.560511
Roger Allers 7.882418e+08 16.516484
Tim Miller 7.831130e+08 12.501948
Colin Trevorrow 7.587683e+08 6.716957
Robert Stromberg 7.585398e+08 3.214110
q9) Build an interactive exploration widget for your favorite data set, using appropriate libraries and tools. Start simple, but be as creative as you
want to be.
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 7/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
print(df.info())
print("\nPreview of the dataset:")
print(df.head())
Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to
enable.
Saving social-media-impact-on-suicide-rates.csv to social-media-impact-on-suicide-rates (1).csv
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 30 non-null int64
1 sex 30 non-null object
2 Suicide Rate % change since 2010 30 non-null float64
3 Twitter user count % change since 2010 30 non-null float64
4 Facebook user count % change since 2010 30 non-null float64
dtypes: float64(3), int64(1), object(1)
memory usage: 1.3+ KB
None
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 8/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
print("\nCorrelation Matrix:")
correlation_matrix = numeric_df.corr()
print(correlation_matrix)
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 9/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
Correlation Matrix:
year \
year 1.000000
Suicide Rate % change since 2010 -0.963467
Twitter user count % change since 2010 0.917765
Facebook user count % change since 2010 0.998266
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 10/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
plot_trends()
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 11/12
11/20/24, 11:38 PM Untitled37.ipynb - Colab
interact(
explore_relationship,
y_axis=widgets.Dropdown(
options=['Twitter user count % change since 2010', 'Facebook user count % change since 2010'],
value='Twitter user count % change since 2010',
description='Select Metric:'
)
)
https://colab.research.google.com/drive/1kEwkjoBG6JG9jir-bTUWtQ6QGGfElfEP?authuser=1#printMode=true 12/12
8) Analyze the Olympic data set. What can you say about the
relationship between a country’s population and the number
of medals it wins?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Merge the population data with the Olympic data based on 'Country'
merged_data = pd.merge(data, population_df, left_on='NOC', right_on='Country', how=
merged_data = pd.merge(data, population_df, left_on='NOC', right_on='Country', how=