Assignment_Stats 1
Assignment_Stats 1
Assignment: Assignment 1
PG ID: 62210493
Study Group: F 15
• The standard error mean is the standard deviation of the sample population which is
2090.7614
• The interesting observation is that there are outliers in the price towards the higher end.
2. Does the normal model provide a good description of the prices? Use a Normal Quantile plot
to frame your response.
• The normal model does not provide good accountability of prices because:-
• The normal quantile plot does not follow the straight line and is concave (shown via yellow
line). It is skewed and it has a lot of outliers.
3. Irrespective of your response to Q2, assume that Price ~ N(164K, (68K)2). Given this:
a. Calculate the following probabilities – P(Price > 92.8K), P(Price < 255.5K). Do these
numbers agree with what you see in the data?
P > 92.8k: P < 255.5K:
b. Once again, assuming the above normal distribution, what percentage of houses should
have a value less than 232K? Does that agree with the data?
The number agrees with the data since it is around a smaller % than 255.5 K
c. Based on the theoretical model, what do you expect should be the price of a house that is
exactly on the 3rd quartile (75th percentile,). How does that compare to the actual?
In a normal distribution, the 75th percentile should be 0.6745 toward the corresponding z-
score. So, the corresponding price for Z-score 0.6745 =164000+68000*0.6745= 209,866.
Therefore, to be exactly in the 3rd quartile (75th percentile), the price of the house should
be 209,866. In tune with the data given to us, the quartile of the 75th percentile is around
205,397.
Hence, the estimated value is more than the real value and does not match the real value.
4. Create a histogram and boxplot for the Living Area variable. Is the distribution symmetric?
Check the skewness measure to see if it is consistent with your observation.
The distribution is not symmetric and is inclined towards the right side and so it is right-skewed.
Also, from the data, Skewness is 0.807 validates it.
5. Create a new column in the dataset by taking the logarithm of the Living Area variable. Is the
normal distribution a better fit for this variable or the original (Living Area) variable? Why do
you think this is the case?