0% found this document useful (0 votes)
12 views5 pages

Portfolio Spring 25

The document outlines the portfolio requirements for the 'Introduction to Statistics and Data Science' course for Spring 2025, including submission details and academic integrity guidelines. It consists of multiple tasks involving statistical analysis and R programming, with specific instructions on variable naming and data handling. Students must complete individual assignments by the specified deadline and adhere to academic misconduct policies.

Uploaded by

4sadimond
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Portfolio Spring 25

The document outlines the portfolio requirements for the 'Introduction to Statistics and Data Science' course for Spring 2025, including submission details and academic integrity guidelines. It consists of multiple tasks involving statistical analysis and R programming, with specific instructions on variable naming and data handling. Students must complete individual assignments by the specified deadline and adhere to academic misconduct policies.

Uploaded by

4sadimond
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

INTRODUCTION TO STATISTICS AND DATA SCIENCE (4ECON006C_n)

PORTFOLIO

Semester: Spring 2025


Weight: 20%
Date available: February 19th, 9:00 am
Deadline of submission: by 11:59 pm on February 20th, 2025
Link of submission for zipped R file:
https://intranet.wiut.uz/Coursework/UploadsDT?courseworkID=4337

Link of submission of the Report (.doc or .pdf):


https://intranet.wiut.uz/Coursework/UploadsDT?courseworkID=4399
(You can also find the submission pages by visiting Module Intranet -> Course Work -> Portfolio and Report)

Instructions to students. Please read them carefully before you start.

1. This is an individual work and you are solely responsible to submit your own work. Any type of
collaboration with others is not allowed and will be subject to academic misconduct policies.
2. Do not change the variable names (such as task_1a, task_1b, etc.) in the R file. This is very important.
Your solutions will be checked by those variable names.
3. Most of the exercises require the student to insert his/her last 5 digits of student ID in place of 𝒂𝒃𝒄𝒅𝒆
values. Check several times to ensure you are using the correct values.
For example, a student with an ID of 00014040 has the following values:
𝒂 = 1, 𝒃 = 4, 𝒄 = 0, 𝒅 = 4, 𝒆 = 0.
A student with an ID of 00009540 has the following values:
𝒂 = 0, 𝒃 = 9, 𝒄 = 5, 𝒅 = 4, 𝒆 = 0.
4. You must replace the 𝒂𝒃𝒄𝒅𝒆 values with the correct values from your own ID in each exercise. Do not
provide the solutions with a reference to a, b, c, d, e values (i.e. your code should not have these letters,

1
replace them with your own ID values). For example, if you declare these values in the beginning of your
R file and continue to refer to these letters in the exercises, then this work is not acceptable.
5. Do not round your answers unless it is stated in the question. Your answers should be provided in R
codes. Watch the recorded Portfolio instructions before you start.
6. Submission instructions:
• R file (zipped): You must use the R Template file to write your codes, rename it with your student ID
and upload it in compressed (zipped) format. Ensure that your file contains the correct file.
• Report: This file must contain your own explanations for each task. Plagiarism and use of AI to
generate explanations are not allowed and will be subject to Academic Misconduct regulations.
7. Late submissions are not acceptable. Please upload your work at least 30 minutes before the deadline. The
Intranet might become unresponsive when many students start uploading at the same time.

TASKS
Task 1. [10 marks] Oriyat FM, a radio station, finds that the distribution of the lengths of time listeners are
tuned to the station follows the normal distribution. The mean of the distribution is (30+𝒂) minutes and the
standard deviation is (8+𝒃) minutes.
a. What is the probability that a particular listener will tune in between (30+𝒄) and (45+𝒅)
minutes? [5 marks]
b. What percent of the listeners tune in for more than (28+𝒂) minutes? Do not include the
percentage sign (%) in your solution. [5 marks]

Task 2. [10 marks] Light bulbs are tested for their life-span. It is found that (𝒄+ 𝒅 +5)% of the light bulbs are
rejected. A random sample of (20+𝒂) bulbs is taken from the stock and tested. The random variable X is the
number of bulbs that are rejected.
a. What is the probability that the value of X is at least (2+𝒂)? [5 marks]
b. What is the (80+𝒆)th percentile value for X? [5 marks]

Task 3. [10 marks] Suppose the closing stock price (X, in USD) of Bank of America Corp follows the following
continuous distribution in one year:
'
for 20 + 𝐜 < X < 55 + 𝐜
()
p(X) =
0 for other values of X.

2
a. What is the probability that the stock price will close above $(30 + 𝒃) in a randomly chosen
trading day? [5 marks]
b. Suppose there are 252 trading days in 2025. How many days should we expect the stock price
to close below $(40 + 𝒅)? Round your answer to the nearest integer value. [5 marks]

Task 4. [10 marks] In a cafe, the customers arrive at a mean rate of (𝒄+4) per every 12 minutes. The variance
of customer arrivals is equal to (5*𝒄+20) per hour.
a. Find the probability of arrival of at most (𝒂+1) customers in the next minute. [5 marks]
b. Let x denote the number of customer arrivals per 12 minutes.
Find P(𝒄+2 £ x < 𝒄+6 ). [5 marks]

Task 5. [20 marks: 2 marks for each correct answer]


a. Create a numeric vector with a sequence of only even numbers from (𝒂 +10) to (𝒃+25).
b. Create a vector of consecutive integer numbers which starts from 1 and has the same length as the
vector from part 5a.
c. Add the vectors from part 5a and 5b and store the new vector under the given name in R file.
d. Sum up the elements of the vector from part 5c.
e. Find the median value of the vector created in part 5c.
f. Find the mean of the vector created in part 5c.
g. Find the sample standard deviation of the vector created in part 5c.
h. Find the (40+𝒂)th percentile of the vector created in part 5c.
i. Find the IQR of the vector created in part 5c. Use built-in R function(s) to find the IQR.
j. Calculate the absolute difference between (80 + 𝐜)th and (20+𝐝 )th percentiles of the vector from part
5c.
Task 6. [10 marks]
A man has two bags. Bag A contains (2+ 𝐝) keys and bag B contains (12+𝐞) keys. Only one of those
keys fits the lock which he is trying to open. The man selects a bag at random, picks out a key from the bag
at random and tries that key in the lock. What is the probability that the key he has chosen fits the lock?

Task 7. [10 marks]


A company wants to investigate the quality of its current customer service with the one it had two
years ago. To do so, the company research team randomly selected customer enquiries and looked at the
summary statistics of waiting times (in minutes) in those enquiries. The following table provides the data:

3
Year Average waiting time Sample standard deviation Sample size
2024 (7 + 𝑏) (3 + 𝑐) (35 + 𝑑)
2022 (7.5 + 𝑏) (4 + 𝑐) (40 + 𝑒)

Compute the p-value of the hypothesis test whether the average waiting times have decreased in 2024
compared to 2022.

Task 8. [20 marks]


Read the following college data into your RStudio using the url option:
Link: "https://s3.amazonaws.com/itao-30230/college.csv"
Remove the following rows from the dataset all at once: (𝒂+5), (𝒂+15), (𝒃+105), (𝒃+255), (𝒄+405),
(𝒄+455), (𝒅+600), (𝒅+700), (𝒆+1001).
This is your unique college data and you will be working with this dataset for the following questions:
a. Find the average tuition for private universities with more than $(12000 + 300*d + 400*e) tuition rate
in the state of New York. [3 marks]
b. How many universities from the South region have an acceptance rate higher than 40% and SAT
average above 1050? [3 marks]
c. How many university names in the data contain the string "Virginia" (Consider all letter cases, such as
"virginia", "VIRGINIA" if any)? [3 marks]
d. Group the universities by state and compute the average of tuition by each state.
Which state has the minimum value? Your answer code should ONLY produce the two-letter
abbreviation from the state column. [3 marks]
e. Create a set of boxplots of tuition rates based on 4 regions in a 2 by 2 frame.
Make sure to have 4 different colors for your boxplots using only hex color codes.
Your student ID must be shown in the main title of the plot. [4 marks]
f. Create the histograms of tuition rates for private and public universities in overlapped plots in two
different transparent colors (blue for private and green for public). Your ID must be given in the title of
the plot (see the graph on the next page).
[4 marks]

4
5

You might also like