0% found this document useful (0 votes)
5 views

Assignment 1 - Part B

The assignment requires the creation of a Jupyter Notebook using the 'grad-students.csv' dataset, focusing on data manipulation and visualization. Key tasks include loading the dataset, modifying columns, filtering data, and plotting specific metrics. Submissions must include the notebook, a screenshot of the plot, and a README file, all adhering to specified naming conventions and deadlines.

Uploaded by

Sami Jahangir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Assignment 1 - Part B

The assignment requires the creation of a Jupyter Notebook using the 'grad-students.csv' dataset, focusing on data manipulation and visualization. Key tasks include loading the dataset, modifying columns, filtering data, and plotting specific metrics. Submissions must include the notebook, a screenshot of the plot, and a README file, all adhering to specified naming conventions and deadlines.

Uploaded by

Sami Jahangir
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

BUSS 5802 – Business Analytics

Assignment 1B
Due September 23, 2024, at 8pm Atlantic time
Individual Submission to Moodle

You need to download the dataset from Moodle called “grad-students.csv”. You will find it with the
files for Assignment 1.

The provenance of the dataset is: https://github.com/fivethirtyeight/data/blob/master/college-


majors/grad-students.csv

It is recommended that you use the “Learning_Jupyter_Notebook.ipynb” that was shown in class to
help you.

Required:

Create a Jupyter Notebook that does the following:

1. Loads the grad-students.csv file


2. Shows the data types
3. Identifies the columns
4. Drops all of the columns beginning with “Nongrad”
5. Concatenates (joins together) the Major_code and Major_category columns into a new
column called Major_cat_code. There should be a space between the Major_code and the
major. For example, 5601 Construction Services, and NOT 5601Construction Services.
6. Creates a new data frame that contains only Major codes below 1110
a. Drop the column called Grad_premium
b. Slice out Grad_median, Grad_P25, and Grad_P75
7. Create a plot with Major_code (for all major codes) on the X-axis and Grad_median,
Grad_P25, and Grad_P75 on the Y-axis

Once you load the data, everything should be done in Jupyter Notebook. That means that you can’t
alter the Excel file as a workaround after the fact. Therefore, carefully think about your data
preparation and what you are being asked to do in this assignment. You should not be using
ChatGPT (or any other LLM) as I have shown you everything you need with the exception of
concatenating and sorting. You will need to do some research on those and try to figure them out
yourself, but ChatGPT (or any other LLM) should not be your place of research. You need to learn to
do this on your own without the help of an LLM. So, look to the internet, stack overflow, the Python
books that you were recommended in the syllabus, etc.

Your code should be appropriately commented with one sentence for each command indicating
what the command does. Do not use the amount of commenting provided in the
Learning_Jupyter_Notebook.ipynb as a guide as that file was heavily commented to help your
learning.

You should also include comments for any sources that you looked at to help you with the coding.
You should indicate where in the code you have used someone else’s code as inspiration by
indicating in a comment the URL or source that you looked at, and how you used the code. If you
use code from the Learning_Jupyter_Notebook that I provided, you can put “Per class framework”
as your source. The references in the code should be not be formal (formal references go in the
README). You can simply put your brief explanation along with the URL or source.

If you use any sources in your work, appropriate referencing should be done using any standard
reference format such as APA, IEEE, Chicago, Vancouver etc.

What to submit:

1. Share your Jupyter Notebook solution with me:

Your Jupyter Notebook file should be named as follows: STaylor_Assignment1B.ipynb


You will store this in your OneDrive and share that file with me. If you are unsure of how to share the
file from OneDrive, please review this link: https://support.microsoft.com/en-us/oeice/share-
onedrive-files-and-folders-9fcc2f7d-de0c-4cec-93b0-
a82024800c07#:~:text=Just%20right%2Dclick%20the%20file,select%20Share%20a%20OneDrive
%20link. My email address is on the course syllabus – you will need this to share your .ipynb file
with me. You also need to put “BUSS 5802 – Assignment 1” in the “Add a message” section before
sending. This notification will act as your submission date and time. So, please don’t forget it. If you
do not share the notebook with me correctly and the deadline passes, your submission will be
considered late and will not be marked if it is past the 29-minute grace period.

I will run your notebook on my computer. The only thing that I will change is the file path so that it
will run on my computer. The rest of the programming will be marked as is.

2. Submit a screenshot of your plot and discuss if you have altered the dataset in any way. This
discussion should clearly explain what you have done and why. If you have not altered the dataset,
then only a screenshot of your plot is needed. The screenshot and your discussion (if needed)
should be submitted in a Word file. If you submit any other file type, you will lose marks. The Word
document should be named as follows: STaylor_Assignment1B.docx (or .doc) and should be
submitted to Moodle.

3. A README should also be submitted to Moodle. You can find an example in the Group Project
files on Moodle. I would recommend copying that file and then overwriting it with your information.
This should also be submitted to Moodle and should be named as follows: STaylor_README.txt

Note: The File Naming Convention uses my name as an example. Please don’t forget to use your
name instead of mine.

Please remember that you only have a 29-minute grace period. The suggested solution
will be posted by 8:30pm on September 23, 2024. No late submissions will be
accepted. So, make sure that you submit by 8:29pm on September 23, 2024.

You might also like