100% found this document useful (1 vote)
46 views11 pages

Description: Hint: Perform Steps As Mentioned Below

This document describes analyzing IQ data from several datasets: 1. Load 10000 IQ scores and recalculate the mean and standard deviation, printing the results. 2. Using a normal distribution, calculate the percentage of scores between values from a test file and print the result. 3. Read sample data files specified in the test file, test if the sample mean equals the population mean, and print "Reject" or "Accept".

Uploaded by

Anish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
46 views11 pages

Description: Hint: Perform Steps As Mentioned Below

This document describes analyzing IQ data from several datasets: 1. Load 10000 IQ scores and recalculate the mean and standard deviation, printing the results. 2. Using a normal distribution, calculate the percentage of scores between values from a test file and print the result. 3. Read sample data files specified in the test file, test if the sample mean equals the population mean, and print "Reject" or "Accept".

Uploaded by

Anish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

DESCRIPTION

Consider an automobile data set with values such as Name of the car Model, its
mileage, number of cylinders, number of gears and so on. 
 
Here’s a preview of the data under consideration: 

The data is present in the file named mtcars.csv which is present at the


location /data/training/mtcars.csv
 
Write a Python code to calculate the difference between the means 10-fold cross
validation scores of ridge regression and lasso regression with alpha as 1.0
                
Hint: Perform steps as mentioned below:

 Load data
 Use all the columns except ‘mpg’ & ‘model’ as predictors (x)
 Use ‘mpg’ as the response column(y)
 Perform Lasso & Ridge regression on this data. Use all the rows as
training data for performing regression.
 Perform cross validation on both the regression results for the above x & y
with cv=10 and default scoring
 Calculate and print the difference by subtracting the mean score of
ridge from the mean score of lasso regression

Input Format:

Read the input file /data/training/mtcars.csv

Output Format:

 You have to perform the operations as described above and write the
value of difference in the mean scores as stated above in a file
named output.csv, which should be present at the
location /code/output/output.csv
 output.csv should contain the value rounded to 2 decimals in the first row
Sample Output:

Example: output.csv will have data looking like this:


Iris Data - (Assignment 4 - Question 3)
bookmark_border
 subject Machine Learning / AI
 casino 5 points

DESCRIPTION
Question:

Perform logistic regression on iris data set as follows:

1. Load iris data set from sklearn.datasets

 Hint: To load the dataset, use:

                from sklearn import datasets


                iris = datasets.load_iris()
                x = iris.data
                y = iris.target

 To perform logistic regression, use function from sklean.linear_model with


default value of parameters

2. Perform cross validation on this model for the specified x & y values with cv
as 5 and scoring as accuracy. 

 Hint: Use function cross_val_score


 This generates accuracy scores, one for each iteration of the 5 iterations
performed
 Print the mean accuracy score rounded to 2 decimal places

Input Format:

 Refer to the starter code provided in the CODE section to load the data
and set the predictors & response variables.

Output Format:

 Write the value of mean accuracy score in a file named output.csv which


should be present at the location /code/output/output.csv
 Write the value rounded to 2 decimal places in the first row

Sample Output:

Example: output.csv will have data looking like this:

DATASETS
EXECUTION TIME LIMIT
Default.
IQ Data - (Assignment 4 - Question 1)
bookmark_border
 subject Machine Learning / AI
 casino 15 points

DESCRIPTION
The IQ data set containing 10000 data points is present at the location
(/data/training/iqdata.csv)
The data set contains only the IQ values of people who participated in the survey
across the world in a single column without header.

Here's a preview of the data under consideration:

It contains IQ values with the below specifications:

 The average IQ is around 110


 There are a few super-intelligent people whose IQ is 192
 There are a few people with less IQ of 34
 The standard deviation is around 20
 The data points follow a normal distribution

 
Based on this data, create Python programs to perform the required analysis as
described below:
 
1. Load the 10000 point data into a 1-D array. Then recalculate
its mean & standard deviation to obtain their exact values. Print these two
values.

 Hint: Use functions from numpy library


 Note: These two values are calculated for the entire data in all the cases
  
2. Calculate what percentage of people should have an IQ value between two
values specified in the /data/training/testcaseiq.txt

 Hint: Since the data follows a normal distribution, use an appropriate


function of norm from scipy.stats library
 Using this function, calculate the probability of an IQ score being smaller
than the upper value specified in the testcaseiq.txt
 Similarly, calculate the probability of an IQ score being smaller than
the lower value specified in the testcaseiq.txt
 Subtract the above two values to calculate the probability of IQ score
falling between the lower and upper values
 Finally, print the result as percentage without the % sign
 Do this for all the testcases provided in testcaseiq.txt

  
3. A sample is drawn from this data is stored in different files such
as iqsample1.csv , iqsample2.csv and so on. Read the name of the file
(<file_name>) from testcaseiq.txt and then read the corresponding file
from /data/training/<file_name>.csv

 Consider a Null hypothesis that the mean of the sample is equal to the
population mean of the above 10000 point data set. Test and decide whether the
hypothesis can be accepted or rejected based on the p-value as:
    - If p-value < 0.05, print as "Reject"
    - Else print as "Accept"

Input Format:

  The first file to be read will be iqdata.csv, which contains the data as
mentioned above. This file is in .csv format and is present at the location
(/data/training/iqdata.csv)
 The second file to be read is testcaseiq.txt which is present at
(/data/training/testcaseiq.txt)
 testcaseiq.txt has the following lines:
o The first line contains the number of test cases T 
o From the second line, every set of three lines contain the lower
value of the desired IQ range, the upper value of the desired IQ range and the
name of the file containing samples to be used in the calculation of Null
Hypothesis testing such as iqsample1
o Then read the sample data from (/data/training/iqsample1.csv)
Output Format:

 For each test case T, create an output file, output1.csv, output2.csv, ...,


outputn.csv where n represents the test case number
 outputn.csv should be present at the location
(/code/output/outputn.csv) . This file should consist of the values
for Mean and Standard Deviation on two separate rows, both values rounded
to 2 decimal places.

            Note: These two values are calculated for the entire data in all the cases

 The third line should contain the percentage value such as 34.567 of
people with IQ in the specified range. The value should be rounded to 3 decimal
places
 The fourth line should contain the result of the Null Hypothesis in the
format stated above 
 outputn.csv should consist of the values on four separate rows one
below the other

Sample Test Cases:


testcaseiq.txt contains the following data:

2
80
140
iqsample1
70
120
iqsample2

Sample Output:
Example: output1.csv will have data looking like this:

DATASETS

 Training datasethelp_outline

EXECUTION TIME LIMIT


Default.

You might also like