0% found this document useful (0 votes)

12 views11 pages

HW 1

The document outlines the homework assignment HW-1 for Math 189, due on January 24, 2024, and includes instructions for submission and academic integrity certification. It consists of various questions requiring data analysis using pandas and visualization with seaborn and matplotlib, focusing on a dataset of student responses. The tasks include generating insights, creating plots, and performing calculations related to data types, statistics, and matrix operations.

Uploaded by

dande.t.lion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views11 pages

HW 1

Uploaded by

dande.t.lion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1/25/24, 12:02 AM hw-1

HW-1 • Math 189 • Wi 2024

Due Date: Wed, Jan 24
NAME: <Dylan Oquendo>

PID: <A17054351>

Instructions
Submit your solutions online on Gradescope
Look at the detailed instructions here
I certify that the following write-up is my own work, and have abided by the UCSD Academic Integrity Guidelines.
Yes
No

Question 1
For this question you will use the class data from HW-0 to generate insights with the help of pandas
The dataset student_data_189.csv is available on Github here or on Canvas in the Files tab.

In [ ]: import numpy as np
import pandas as pd

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 1/11
1/25/24, 12:02 AM hw-1

import matplotlib.pyplot as plt

import seaborn as sns

a. Read the dataset as a pandas dataframe and print the first 5 rows of the dataframe.
In [ ]: df = pd.read_csv('student_data_189.csv')

b. Print the number of variables and the number of observations in the dataset.
In [ ]: print('There are', df.columns.size,'variables, and', df.size,'observations in the dataset.')

There are 11 variables, and 3025 observations in the dataset.

c. Describe the type for each variable you answered in your survey.
In [ ]: possible_types = ['categorical', 'ordinal', 'discrete quantitative', 'continuous quantitative']
df.columns
print('Based on names of the columns, name would be categorical, fav_color would be categorical, math183_ex

Based on names of the columns, name would be categorical, fav_color would be categorical, math183_excited w
ould be ordinal,
seat_comfort would be ordinal, year would be discrete quantitative, major would be categorical, wi24_credi
ts would be discrete quantitative,
time_reading would be continuous quantitative, time_physical would be continuous quantitative, time_online
would be continuous quantitative, and sex would be categorical.

d. create a boxplot of the number of hours of physical activity by sex. Do you see any differences?
In [ ]: sns.boxplot(df, x = 'sex', y = 'time_physical')

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seaborn/categorical.py:640:
FutureWarning: SeriesGroupBy.grouper is deprecated and will be removed in a future version of pandas.
positions = grouped.grouper.result_index.to_numpy(dtype=float)
Out[ ]: <Axes: xlabel='sex', ylabel='time_physical'>

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 2/11
1/25/24, 12:02 AM hw-1

These tables look very similar, although it appears that the female plot has a more extreme outlier and a lower middle 50%
than males, but male has more outliers.
e. create a boxplot of the number of credits taken by sex. Do you see any differences?
In [ ]: sns.boxplot(df, x = 'sex', y = 'wi24_credits')

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 3/11
1/25/24, 12:02 AM hw-1

I see that the bulk of the female plot is higher on average than the male plot, the males have less outliers, but also a higher
max.
f. create a scatterplot of the number of hours of physical activity vs. the number of hours online. Do you see any patterns?
In [ ]: sns.scatterplot(df, y = 'time_physical', x = 'time_online')

Out[ ]: <Axes: xlabel='time_online', ylabel='time_physical'>

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 4/11
1/25/24, 12:02 AM hw-1

There seems to be an average positive correlation between the two, a cluster near the smaller values, with some outliers.
g. create a bar chart for the overall comfort in the classroom's seating
In [ ]: #sns.barplot(df, y = 'seat_comfort')
sns.histplot(df, x = 'seat_comfort')

Out[ ]: <Axes: xlabel='seat_comfort', ylabel='Count'>

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 5/11
1/25/24, 12:02 AM hw-1

h. create another column called fav_color_simplified which keeps the five most popular fav_color as is, but
changes every other color to other . Create a bar chart of the new column fav_color_simplified .
In [ ]: # determine the five most popular colors here
# Hint: you can use the .value_counts() for this
popular_colors = df['fav_color'].value_counts()[:5].index.tolist()

Changed to histplot for better visualization

In [ ]: df['fav_color_simplified'] = df['fav_color'].apply(lambda x: x if x in popular_colors else 'other')

In [ ]: #sns.barplot(df, y = 'fav_color_simplified')
sns.histplot(df, x = 'fav_color_simplified')

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 6/11
1/25/24, 12:02 AM hw-1

Out[ ]: <Axes: xlabel='fav_color_simplified', ylabel='Count'>

Question 2
Consider the following list:
In [ ]: my_list = [
"+0.07",
"-0.07",
"+0.25",
"-0.84",

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 7/11
1/25/24, 12:02 AM hw-1

"+0.32",
"-0.24",
"-0.97",
"-0.36",
"+1.76",
"-0.36"
]

a. What type of data type does the list contain?

In [ ]: type(my_list[0])

Out[ ]: str

The above code confirms that the list above contains data of type str or string
b. Create two new lists called my_list_float , my_vec_int and my_array which converts my_list to Float, Integer
and numpy array types, respectively,
In [ ]: my_list_float = [float(x) for x in my_list]
my_list_int = [int(float(x)) for x in my_list]
my_array = np.array(my_list)

c. what is the difference between my_list_float and my_array ? e.g., what happens when you multiply them by 2?
In [ ]: floattimestwo = my_list_float * 2
arraytimestwo = my_array.astype(float) * 2
print(floattimestwo)
print(arraytimestwo)

[0.07, -0.07, 0.25, -0.84, 0.32, -0.24, -0.97, -0.36, 1.76, -0.36, 0.07, -0.07, 0.25, -0.84, 0.32, -0.24, -
0.97, -0.36, 1.76, -0.36]
[ 0.14 -0.14 0.5 -1.68 0.64 -0.48 -1.94 -0.72 3.52 -0.72]

The float list times 2 just doubles the elements of the list, and the array needs to be converted to an appropriate type before
multiplying by an integer, but then scales each element by two like mathematical multiplication.
d. Let's call my_array as x . Compute the and norm of x , and compute the dot product of x with itself.
ℓ2 ℓ1

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 8/11
1/25/24, 12:02 AM hw-1

In [ ]: x = my_array
l2_norm = np.linalg.norm(x, ord=2)
l1_norm = np.linalg.norm(x, ord=1)
x_dot_x = np.dot(x.astype(float), x.astype(float))

e. Let be the following matrix:

In [ ]: np.random.seed(42)
A = np.random.randn(1000, 10)

Find the row-wise and column-wise mean of . A

In [ ]: row_mean = np.mean(A, axis=1)

col_mean = np.mean(A, axis=0)

f. Find the top 2 eigenvalues and eigenvectors of A

⊤
.
A

In [ ]: AtA = np.dot(A.T, A)
eigenvalues, eigenvectors = np.linalg.eigh(AtA)
sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
eigenvectors = eigenvectors[:, sorted_indices]
top2_eigenvalues = eigenvalues[:2]
top2_eigenvectors = eigenvectors[:, :2]

g. Let be the vector obtained by summing the squares of the rows of . Plot the histogram of with the
v A v axis to show
Y−

the normalized frequency of each bin.

In [ ]: v = np.sum(A**2, axis=1)

fig, ax = plt.subplots(1, 1, figsize=(5, 5))

ax.hist(v, bins=30, density=True, color='skyblue', edgecolor='black')

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 9/11
1/25/24, 12:02 AM hw-1

Out[ ]: (array([0.00543086, 0.00868938, 0.03801604, 0.04670542, 0.06191184,

0.08906615, 0.0912385 , 0.11296195, 0.1107896 , 0.08037677,
0.07060122, 0.07711825, 0.04779159, 0.04561925, 0.04453308,
0.0358437 , 0.02498197, 0.02280962, 0.01629259, 0.0119479 ,
0.01086173, 0.00868938, 0.00760321, 0.00325852, 0.00543086,
0. , 0.00217235, 0.00325852, 0. , 0.00217235]),
array([ 1.09471143, 2.01537543, 2.93603943, 3.85670343, 4.77736743,
5.69803143, 6.61869543, 7.53935943, 8.46002343, 9.38068743,
10.30135143, 11.22201543, 12.14267943, 13.06334343, 13.98400743,
14.90467143, 15.82533543, 16.74599943, 17.66666343, 18.58732743,
19.50799143, 20.42865543, 21.34931943, 22.26998343, 23.19064743,
24.11131143, 25.03197543, 25.95263943, 26.87330342, 27.79396742,
28.71463142]),
<BarContainer object of 30 artists>)

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 10/11
1/25/24, 12:02 AM hw-1

h. Using the same fig, ax objects from part (g). overlay the probability density function of the 2
χ (10) distribution—the
chi2 distribution with 10 degrees.

In [ ]: !pip install scipy

%matplotlib inline
import scipy.stats as stats

Requirement already satisfied: scipy in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/s

ite-packages (1.12.0)
Requirement already satisfied: numpy<1.29.0,>=1.22.4 in /Library/Frameworks/Python.framework/Versions/3.11/
lib/python3.11/site-packages (from scipy) (1.26.3)

In [ ]: x_range = np.linspace(0, 30, 1000)

y = stats.chi2.pdf(x_range, df=10)
ax.plot(x_range, y)
plt.show()

i. What do you observe in the previous plot? Why do you think this is the case?
I cannot get the plot to show.

file:///Users/dylanoquendo/Downloads/materials-main/notebooks/hw-1.html 11/11

ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Aphical Representation
No ratings yet
Aphical Representation
8 pages
Final Ip Practical File
No ratings yet
Final Ip Practical File
29 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Practical File
No ratings yet
Practical File
20 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
MLFILE
No ratings yet
MLFILE
21 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
Class X Practical-2025 - Jupyter Notebook
No ratings yet
Class X Practical-2025 - Jupyter Notebook
6 pages
Fds SLOT 2
No ratings yet
Fds SLOT 2
12 pages
Ceaser and Cleopatra
No ratings yet
Ceaser and Cleopatra
9 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
23HCS4142 PDF
No ratings yet
23HCS4142 PDF
24 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
Wah Industry Limited. Internship Report
100% (4)
Wah Industry Limited. Internship Report
52 pages
Ch-4 Plotting Data Using Matplotlib
No ratings yet
Ch-4 Plotting Data Using Matplotlib
32 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Adobe Scan 14-May-2024
No ratings yet
Adobe Scan 14-May-2024
7 pages
Data Preprocessing Python Tome II
No ratings yet
Data Preprocessing Python Tome II
14 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Chapter3 Part 2
No ratings yet
Chapter3 Part 2
7 pages
Sem1 Module2 Unit1 Graphs
No ratings yet
Sem1 Module2 Unit1 Graphs
23 pages
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
No ratings yet
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
12 pages
ML Record
No ratings yet
ML Record
19 pages
Pyq Solution
No ratings yet
Pyq Solution
12 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
DP Prog
No ratings yet
DP Prog
10 pages
Dav 2024 Pyq
No ratings yet
Dav 2024 Pyq
7 pages
Experiment - 2.3 Krikita
No ratings yet
Experiment - 2.3 Krikita
12 pages
FDS All Practicals
No ratings yet
FDS All Practicals
10 pages
DAV Practicals
No ratings yet
DAV Practicals
26 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Pandas Worksheet
No ratings yet
Pandas Worksheet
3 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
PML Ex3
No ratings yet
PML Ex3
20 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Brochure AVEVA InitialDesign PDF
No ratings yet
Brochure AVEVA InitialDesign PDF
4 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
CLASS XII - IP List of Practicals With Coding 2020
No ratings yet
CLASS XII - IP List of Practicals With Coding 2020
15 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
SESION 12 (Pandas)
No ratings yet
SESION 12 (Pandas)
41 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Call of Cthulhu - D20 - Hardboiled Part 1 - Waters Over Heaven
100% (1)
Call of Cthulhu - D20 - Hardboiled Part 1 - Waters Over Heaven
51 pages
Your Roll No ..............
No ratings yet
Your Roll No ..............
6 pages
The Ordinary Skincare Routine
100% (1)
The Ordinary Skincare Routine
4 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Practical List Ip
100% (1)
Practical List Ip
10 pages
PLAY - The Bean Game - Worksheet
No ratings yet
PLAY - The Bean Game - Worksheet
5 pages
Visual Memory 1st Edition Steven J. Luck All Chapter Instant Download
100% (22)
Visual Memory 1st Edition Steven J. Luck All Chapter Instant Download
84 pages
Holidays Homework - 20231204 - 195647 - 0000
No ratings yet
Holidays Homework - 20231204 - 195647 - 0000
15 pages
Modern Tanks and Afvs 1991present Russell Hart Stephen Hart Download
No ratings yet
Modern Tanks and Afvs 1991present Russell Hart Stephen Hart Download
33 pages
Worksheet-1 (Python)
No ratings yet
Worksheet-1 (Python)
9 pages
Matplotlib Starter: Import As Import As Import As
No ratings yet
Matplotlib Starter: Import As Import As Import As
24 pages
Thyroid Surgery Dissertation
100% (2)
Thyroid Surgery Dissertation
5 pages
Frozen Desserts
No ratings yet
Frozen Desserts
27 pages
Ch07 CMOS Amplifiers Ch09 Cascode Stages and Current Mirrors 2024 V3print
No ratings yet
Ch07 CMOS Amplifiers Ch09 Cascode Stages and Current Mirrors 2024 V3print
133 pages
2020-21 XIIInfo - Pract.S.E.155
No ratings yet
2020-21 XIIInfo - Pract.S.E.155
11 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Pandas Practice Questions
No ratings yet
Pandas Practice Questions
2 pages
Emphasis On Sustainable & Regenerative Farming Methods
No ratings yet
Emphasis On Sustainable & Regenerative Farming Methods
48 pages
Physics - Classes IX-X - NC 2006 - Latest Revision June 2012
No ratings yet
Physics - Classes IX-X - NC 2006 - Latest Revision June 2012
72 pages
2.5 Screw Pile Info
No ratings yet
2.5 Screw Pile Info
4 pages
APC Detailing Guideline - Draft 01
No ratings yet
APC Detailing Guideline - Draft 01
23 pages
UKMT - JMC - Junior Mathematical Challenge 2015 - Solutions
No ratings yet
UKMT - JMC - Junior Mathematical Challenge 2015 - Solutions
13 pages
LTSpice Tutorial
No ratings yet
LTSpice Tutorial
24 pages
A Cute Letter From A Muslim Girl To Her Christian Parents
No ratings yet
A Cute Letter From A Muslim Girl To Her Christian Parents
3 pages
Biopsychology and Neuroscience Reviewer
No ratings yet
Biopsychology and Neuroscience Reviewer
4 pages
FBS Lab
No ratings yet
FBS Lab
3 pages
The Z-Transform in DSP Lecture 10-12 Andreas Spanias Spanias@asu - Edu
No ratings yet
The Z-Transform in DSP Lecture 10-12 Andreas Spanias Spanias@asu - Edu
16 pages
General Physics 2 Performance Task #1 Module 1, Week 1, Quarter 3
100% (1)
General Physics 2 Performance Task #1 Module 1, Week 1, Quarter 3
2 pages
Evaluacion Ing
No ratings yet
Evaluacion Ing
1 page
Eco SMRT
No ratings yet
Eco SMRT
2 pages
Steering Damper For Yamaha R1M: Otee! Otee!
No ratings yet
Steering Damper For Yamaha R1M: Otee! Otee!
4 pages
High Current Linear Regulated Bench Power Supply
No ratings yet
High Current Linear Regulated Bench Power Supply
14 pages
Icect 2012
No ratings yet
Icect 2012
4 pages
Astro Case Study 415972
No ratings yet
Astro Case Study 415972
5 pages
FX FX FX FX FX FX X X: Lim 0 Lim 1 Lim 1 Lim 4 Lim 1 Lim 0 3 1
No ratings yet
FX FX FX FX FX FX X X: Lim 0 Lim 1 Lim 1 Lim 4 Lim 1 Lim 0 3 1
2 pages
Pa6 GF20 - RTP Company RTP Pa6 20 GF
No ratings yet
Pa6 GF20 - RTP Company RTP Pa6 20 GF
1 page
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet

HW 1

Uploaded by

HW 1

Uploaded by

1/25/24, 12:02 AM hw-1

HW-1 • Math 189 • Wi 2024

import matplotlib.pyplot as plt

There are 11 variables, and 3025 observations in the dataset.

Out[ ]: <Axes: xlabel='time_online', ylabel='time_physical'>

Out[ ]: <Axes: xlabel='seat_comfort', ylabel='Count'>

Changed to histplot for better visualization

Out[ ]: <Axes: xlabel='fav_color_simplified', ylabel='Count'>

a. What type of data type does the list contain?

e. Let be the following matrix:

Find the row-wise and column-wise mean of . A

In [ ]: row_mean = np.mean(A, axis=1)

f. Find the top 2 eigenvalues and eigenvectors of A

the normalized frequency of each bin.

fig, ax = plt.subplots(1, 1, figsize=(5, 5))

Out[ ]: (array([0.00543086, 0.00868938, 0.03801604, 0.04670542, 0.06191184,

In [ ]: !pip install scipy

Requirement already satisfied: scipy in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/s

In [ ]: x_range = np.linspace(0, 30, 1000)

You might also like