Assignment 1
Assignment 1
M = 3×3
97.0593 80.0280 91.5736
95.7167 14.1886 79.2207
48.5376 42.1761 95.9492
1. The ans variable should be matrix that has replaced all the numbers bigger than 42 to 0 and all the numbers less than 42 has to be left
untouched.
M .* (M <= 42)
ans = 3×3
0 0 0
0 14.1886 0
0 0 0
2. The ans variable should be a matrix whose entries are: 1 when an entry is bigger than 100 and 0 if an entry is less than or equal to 100.
M > 100
3. The ans variable should be a 0 if there are even number of entries bigger than 42 in each entry of every row. Otherwise, the answer should be 1.
M>42
1
mod(sum(M > 42,2),2)
ans = 3×1
1
0
1
4. The ans variable should be 1 if if all elements in M are unique otherwise the ans variable should be 0.
length(M(:)) == length(unique(M(:)))
ans = logical
1
5. The ans variable should be the index of the row that has the maximum product of its elements. Assume no row is entirely zeros.
prod(M,2)
ans = 3×1
105 ×
7.1129
1.0759
1.9642
ans =
1
A saddle point is an element in a matrix M that is the smallest in its row and the largest in its column.
6. The ans variable must be a list of (row,column) indices of all the saddle points in M .
ans =
4
2
SCRIPT
NOTE: " TRIDEV " --- clc;clear;close;
1. Write a script in Matlab to import an excel sheet of data of students with a column of heights and a column of weights. Now write a script that
(a) computes the sample mean and standard deviations of the heights and weights.
i. computes the OLS slope and intercept using your standard deviations and means.
ii. plot the regression line and the input data of points on a clearly labelled figure with legends. Show the value of R2 on the plot.
The catch is: You cannot use any built in function for mean, standard deviations, slope and intercept, R2 computations. You have to build the script for scratch.
Of course, liberally use MATLAB functions for plotting.
%Import Data: Either Using Open Or Using Import Data Tab (Home)
%Dont forget to change type to vector instead of table
N=length(Height);
% Mean
mean_x = sum(Height)/N;
mean_y = sum(Weight)/N;
% SD
% % % sd_h = sqrt(sum((Height - mean_x).^2) / n-1);
% % % sd_w = sqrt(sum((Weight - mean_y).^2) / n-1);
% Covariance
Sxy = sum((Height - mean_x).*(Weight - mean_y));
% OLS _______________________________________________________________
% Slope
3
beta1 = Sxy / Sx2 ;
% Intercept
beta0 = mean_y - beta1*mean_x ;
% R^2 _______________________________________________________________
R2 = 1 - SSE/SST
R2 =
0.0071
% Plotting __________________________________________________________
figure;
scatter(Height,Weight);
hold on;
plot(Height,y_pred);
legend("Data Points" , "Regression Line");
4
fprintf("R^2 = %f ",R2) % The "%f" works as a placeholder here
R^2 = 0.007050
% BOOOM BOON !
mdl=fitlm(Height,Weight,"linear")
mdl =
Linear regression model:
y ~ 1 + x1
Estimated Coefficients:
Estimate SE tStat pValue
________ _______ ______ _______
5
Number of observations: 20, Error degrees of freedom: 18
Root Mean Squared Error: 16.6
R-squared: 0.00705, Adjusted R-Squared: -0.0481
F-statistic vs. constant model: 0.128, p-value = 0.725
plot(mdl)
2. Convert the above into a function that takes two input vectors of equal length X, Y and returns an array [m, c, R2] which is the slope, intercept
and R2. In other words, this is the Linear Regression function created entirely by you!
n=length(X);
6
% Mean
mean_x = sum(X)/n;
mean_y = sum(Y)/n;
% SD
% Covariance
Sxy = sum((X - mean_x).*(Y - mean_y));
% OLS
% Slope
m = Sxy / Sx2 ;
% Intercept
c = mean_y - m*mean_x ;
% R^2
R2 = 1 - SSE/SST;
result = [m,c,R2];
end
7
3. You are given a vector data containing 10,000 readings of a thermostat of a hall from MSE in summer. Write a script that removes all readings that
are more than 3 sample standard deviations from the mean. The final answer should be a shorter array.
• First Step would be to generate some data(dummy) - Temp in Chennai , Assuming Mean= 25 Degrees Celsius
clc;clear;close;
N=10000;
%data = 25 + 3 * randn(10000, 1) % Mean = 25, Std Dev = 3
data = (randn(N , 1) * 10) + 20
data = 10000×1
26.7150
7.9251
27.1724
36.3024
24.8889
30.3469
27.2689
16.9656
22.9387
12.1272
lower_bound =
-9.7044
upper_bound =
49.7078
8
% Remove Outliers
filtered_data = data(data >= lower_bound & data <= upper_bound);
% Results
fprintf('Original Data Size: %d ', N);
4. You are given a array data of size 1,000,000 containing noisy measurements. Find the locations of peaks.
N = 1000000;
data_array = randn(N, 1) * 100 + 50; % Random data with noise
peaks = [];
peaks
9
peaks = 333476×1
2
5
9
12
15
17
19
22
26
31
SIMULATION
1. Simulate daily closing stock prices for a company over 5 years (252 days per year). Assume:
(b) Daily return follows a normal distribution with mean 0.05% and standard deviation 1%.
(b) The year with the highest average price,from your simulated data.
clc;clear;close;
% Parameters:
initial_price = 10000;
mu=0.0005;
sigma=0.01;
days=252
days =
10
252
years=5
years =
5
total_days=years*days;
% Simulation ___________________________________________________________________________________
for t = 2:total_days
stock_prices(t) = stock_prices(t-1) * (1 + daily_returns(t)); % Compute new price
end
highest_price = max(stock_prices);
lowest_price = min(stock_prices);
% for y = 1:years
% start_index = (y-1) * days + 1;
% end_index = y * days;
% avg_prices_per_year(y) = mean(stock_prices(start_index:end_index));
% end
% Reshape stock prices into a matrix with each row representing one year
11
stock_prices_reshaped = reshape(stock_prices, days, years)
stock_prices_reshaped = 252×5
104 ×
1.0000 0.9709 1.3200 1.2233 1.5404
0.9913 0.9562 1.3302 1.2259 1.5278
0.9983 0.9516 1.3443 1.2024 1.5293
1.0069 0.9568 1.3464 1.2223 1.5287
0.9926 0.9613 1.3432 1.2207 1.5026
0.9985 0.9678 1.3808 1.2040 1.4832
0.9999 0.9711 1.3768 1.2104 1.4828
0.9890 0.9666 1.3682 1.2182 1.4585
0.9996 0.9681 1.3388 1.2159 1.4558
0.9947 0.9760 1.3359 1.2231 1.4561
avg_prices_per_year = 1×5
104 ×
1.0115 1.1580 1.2438 1.3295 1.4480
best_avg_price =
1.4480e+04
best_year =
5
% find(max(avg_prices_per_year) == avg_prices_per_year)
2. You are given a 5000 × 24 matrix likes, where each row represents a user and each column represents the number of “likes” they gave in a
specific hour of the day. Find
12
(b) The most active user (row with the highest total likes).
likes = 5000×24
24 83 53 73 98 39 44 10 32 37 66 93 43
89 91 14 11 89 63 95 40 95 40 49 70 78
89 63 6 57 42 84 15 25 62 20 24 34 61
51 91 69 73 73 81 24 25 37 43 1 93 66
45 89 27 98 10 30 11 5 29 20 92 59 13
79 24 75 81 8 73 68 29 7 26 4 90 91
36 93 59 100 61 66 98 0 29 51 25 50 32
86 25 15 60 0 27 87 14 94 92 17 64 62
87 58 74 34 65 70 42 10 29 62 83 39 76
16 93 66 98 43 0 11 91 65 25 89 11 96
likes_per_hour = sum(likes, 1)
likes_per_hour = 1×24
249869 251115 247444 249015 250104 250500
total_likes_per_user = sum(likes, 2)
total_likes_per_user = 5000×1
1307
1182
1035
1352
1072
1305
1163
1089
1163
1179
13
[max_likes, most_active_user] = max(total_likes_per_user) % Find max and its index
max_likes =
1697
most_active_user =
4377
3. Simulate the flow of vehicles through a highway section with 100 cars. Each car has a randomly assigned speed between 20 and 120 km/
h.Calculate:
(b) The percentage of cars exceeding the speed limit (100 km/h).
num_cars = 100;
speeds = 100×1
71
28
29
94
93
23
98
90
44
61
average_speed = mean(speeds)
average_speed =
69.8200
14
% (b) Percentage of cars exceeding 100 km/h _____________________________________
percentage_exceeding =
18
4. Simulate a walk of a drunk dog moving on the plane in steps of 1 unit starting from (0, 0). Each second, the dog can only move 1 unit to the left
OR 1 unit to the right OR 1 unit top OR 1 unit down. with equal probability. Do the following:
(b) Plot the distribution of the dog’s position at 100th second. Is the shape similar at 1000th second?
clc; clear;
N = 100;
x(1) = 0;
y(1) = 0;
figure;
15
x(time+1) = x(time);
else
y(time+1) = y(time) - 1; % Move Down
x(time+1) = x(time);
end
16