Description: Hint: Perform Steps As Mentioned Below
Description: Hint: Perform Steps As Mentioned Below
Consider an automobile data set with values such as Name of the car Model, its
mileage, number of cylinders, number of gears and so on.
Here’s a preview of the data under consideration:
Load data
Use all the columns except ‘mpg’ & ‘model’ as predictors (x)
Use ‘mpg’ as the response column(y)
Perform Lasso & Ridge regression on this data. Use all the rows as
training data for performing regression.
Perform cross validation on both the regression results for the above x & y
with cv=10 and default scoring
Calculate and print the difference by subtracting the mean score of
ridge from the mean score of lasso regression
Input Format:
Output Format:
You have to perform the operations as described above and write the
value of difference in the mean scores as stated above in a file
named output.csv, which should be present at the
location /code/output/output.csv
output.csv should contain the value rounded to 2 decimals in the first row
Sample Output:
DESCRIPTION
Question:
2. Perform cross validation on this model for the specified x & y values with cv
as 5 and scoring as accuracy.
Input Format:
Refer to the starter code provided in the CODE section to load the data
and set the predictors & response variables.
Output Format:
Sample Output:
DATASETS
EXECUTION TIME LIMIT
Default.
IQ Data - (Assignment 4 - Question 1)
bookmark_border
subject Machine Learning / AI
casino 15 points
DESCRIPTION
The IQ data set containing 10000 data points is present at the location
(/data/training/iqdata.csv)
The data set contains only the IQ values of people who participated in the survey
across the world in a single column without header.
Based on this data, create Python programs to perform the required analysis as
described below:
1. Load the 10000 point data into a 1-D array. Then recalculate
its mean & standard deviation to obtain their exact values. Print these two
values.
3. A sample is drawn from this data is stored in different files such
as iqsample1.csv , iqsample2.csv and so on. Read the name of the file
(<file_name>) from testcaseiq.txt and then read the corresponding file
from /data/training/<file_name>.csv
Consider a Null hypothesis that the mean of the sample is equal to the
population mean of the above 10000 point data set. Test and decide whether the
hypothesis can be accepted or rejected based on the p-value as:
- If p-value < 0.05, print as "Reject"
- Else print as "Accept"
Input Format:
The first file to be read will be iqdata.csv, which contains the data as
mentioned above. This file is in .csv format and is present at the location
(/data/training/iqdata.csv)
The second file to be read is testcaseiq.txt which is present at
(/data/training/testcaseiq.txt)
testcaseiq.txt has the following lines:
o The first line contains the number of test cases T
o From the second line, every set of three lines contain the lower
value of the desired IQ range, the upper value of the desired IQ range and the
name of the file containing samples to be used in the calculation of Null
Hypothesis testing such as iqsample1
o Then read the sample data from (/data/training/iqsample1.csv)
Output Format:
Note: These two values are calculated for the entire data in all the cases
The third line should contain the percentage value such as 34.567 of
people with IQ in the specified range. The value should be rounded to 3 decimal
places
The fourth line should contain the result of the Null Hypothesis in the
format stated above
outputn.csv should consist of the values on four separate rows one
below the other
2
80
140
iqsample1
70
120
iqsample2
Sample Output:
Example: output1.csv will have data looking like this:
DATASETS
Training datasethelp_outline