Week Two Assignment A

Uploaded by

The validation partition is used to assess models and select the best one, and may help improve models through tuning. The test partition assesses the final model's performance on new data without influencing model building. A sample of credit applicants is likely randomly sampled and useful for data mining as algorithms can build accurate models from thousands of records. The next step after observing a randomly selected bank database sample for training would be reducing mortgages to improve availability and performance of securities accounts. About 92% of records, or around 920 records, would be expected to be removed from a dataset with 1000 records, 50 variables, and 5% randomly missing values if removing records with any missing values.

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Week Two Assignment A

Uploaded by

Ravalika

0% found this document useful (0 votes)

548 views1 page

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

548 views1 page

Week Two Assignment A

Uploaded by

Ravalika

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 1

Search inside document

2.2 Describe the difference in roles assumed by the validation partition and the test partition.

The validation partition is used to assess the performance of each supervised learning model so
that we can compare models and pick the best one. In some algorithms (e.g., classification and
regression trees, k-nearest neighbors) the validation partition may be used in automated
fashion to tune and improve the model. This means that the validation data are actually used to
help build the model. The test data partition is used for assessing the performance of the final
chosen model on new data. The test data are not used to build models, or to further tweak the
model or improve its fit. (If the test data were used for these purposes, they would play a role
in building or selecting the best model and would no longer provide an unbiased assessment of
the chosen model's performance with completely new data.)
2.3 Consider the sample from a database of credit applicants in Table 2.15. Comment on the
likelihood that it was sampled randomly, and whether it is likely to be a useful sample.
Typically, we perform data mining on less than the complete database. Data mining algorithms
will have varying limitations on what they can handle in terms of the numbers of records and
variables, limitations that may be specific to computing power and capacity as well as software
limitations. Even within those limits, many algorithms will execute faster with smaller samples.
Accurate models can often be built with as few as several thousand records. Hence, we will
want to sample a subset of records for credit applications.
2.4 Consider the sample from a bank database shown in Table 2.16; it was selected randomly
from a larger database to be the training set. Personal Loan indicates whether a solicitation
for a personal loan was accepted and is the response variable. A campaign is planned for a
similar solicitation in the future and the bank is looking for a model that will identify likely
responders. Examine the data carefully and indicate what your next step would be.
After carefully observing the sample from a bank database, there is many randomly selected
databases are to be arranged for the training set. According to the given sample database
personal Loan indicates whether a solicitation for a personal loan was accepted and is the
response variable. The next step would be reducing the mortgage so it is easily available for
both high and low income and improving the performance of securities acct.
2.7 A dataset has 1000 records and 50 variables with 5% of the values missing, spread
randomly throughout the records and variables. An analyst decides to remove records with
missing values. About how many records would you expect to be removed?
For a record to have all values present, it must avoid having a missing value (P = 0.95) for each
of 50 records. The chance that a given record will escape having a missing value for two
variables is 0.95 * 0.95 = 0.903. The chance that a given record would escape having a missing
value for all 50 records is (0.95)50=0.076945. This implies that 1-0.076944=0.9231 (92.31%) of
all records will have missing values and would be deleted.

5LQ
Document2 pages
5LQ
Jacob Gortat
0% (2)
Case Study: Dividends-Based Valuation of Starbucks' Common Equity
Document9 pages
Case Study: Dividends-Based Valuation of Starbucks' Common Equity
NAM Dang Phuong
0% (1)
Econ 304 HW 2
Document8 pages
Econ 304 HW 2
Tedjo Ardyandaru Imardjoko
No ratings yet
AEO - LBO Scenario #1a, Case 1 Overview
Document4 pages
AEO - LBO Scenario #1a, Case 1 Overview
milken466
No ratings yet
Location: ... Section 1: Declarations, Initialization and Scoping Objective 1.1 Q
Document72 pages
Location: ... Section 1: Declarations, Initialization and Scoping Objective 1.1 Q
Francisco J. Piedrahita N
No ratings yet
Promotion Effects Evaluation
Document13 pages
Promotion Effects Evaluation
api-494254959
No ratings yet
Tayko Software Case
Document1 page
Tayko Software Case
Mohammed Bilal
No ratings yet
Updated - STA416 - Project Guidelines
Document3 pages
Updated - STA416 - Project Guidelines
Luqman Hasif
No ratings yet
Ualification Nit Number and Title: Earning Outcome and Assessment Criteria
Document9 pages
Ualification Nit Number and Title: Earning Outcome and Assessment Criteria
ophir jatt
No ratings yet
20191206035646mis 7610 Systems Analysis Fall2019 12.5.19
Document4 pages
20191206035646mis 7610 Systems Analysis Fall2019 12.5.19
kelvin
0% (1)
Student Answers
Document31 pages
Student Answers
davod_ir
0% (2)
Case 2 Predicting Boston Housing
Document2 pages
Case 2 Predicting Boston Housing
Sai Nath
0% (6)
Banking Segmentation Case Study
Document1 page
Banking Segmentation Case Study
Sourav Aggarwal
0% (1)
Cytegic - ACRO Datasheet
Document2 pages
Cytegic - ACRO Datasheet
Louis
No ratings yet
Benefits of Automatic Optimisation of School Timetables
Document36 pages
Benefits of Automatic Optimisation of School Timetables
Martin Klemsa
No ratings yet
Pink Tax: Price Discrimination and Product Versioning Exercises
Document9 pages
Pink Tax: Price Discrimination and Product Versioning Exercises
irwan
No ratings yet
Robson2002 Book
Document91 pages
Robson2002 Book
jacom0811
100% (1)
Lab 1
Document3 pages
Lab 1
Joe
100% (1)
Akamai Performance Matters Key Consumer Insights Ebook
Document32 pages
Akamai Performance Matters Key Consumer Insights Ebook
Anand Seshadri
No ratings yet
Causal Loop Quick Tutorial' - Traffic Congestion
Document9 pages
Causal Loop Quick Tutorial' - Traffic Congestion
hendrickjhon
No ratings yet
Student Sol 064 e
Document98 pages
Student Sol 064 e
Minh Nguyễn
No ratings yet
This Study Resource Was: A M MTH 5301
Document7 pages
This Study Resource Was: A M MTH 5301
muhammad irfan
No ratings yet
Sequence Diagram-Vending Machine Case Study
Document3 pages
Sequence Diagram-Vending Machine Case Study
Vitthal Dhoke
No ratings yet
Costar
Document8 pages
Costar
srinivas15j1988
No ratings yet
Smart Pricing PDF
Document10 pages
Smart Pricing PDF
Akshaya Lakshminarasimhan
No ratings yet
B 00 Midterm
Document6 pages
B 00 Midterm
sitedepartment
No ratings yet
QSO 320 Final Project Guidelines and Rubric: Case Study Case Study Data Set
Document7 pages
QSO 320 Final Project Guidelines and Rubric: Case Study Case Study Data Set
klm klm
No ratings yet
Capacity Planning A Case Study From Cigarette Production PDF
Document15 pages
Capacity Planning A Case Study From Cigarette Production PDF
Rizal Adiwangsa
100% (1)
Transcription Guidelines V 1.3 03022020
Document24 pages
Transcription Guidelines V 1.3 03022020
Ciccio Romero
100% (1)
Maruti Suzuki Strives To Consistently Improve The Environmental Performance of Its Manufacturing Operations
Document6 pages
Maruti Suzuki Strives To Consistently Improve The Environmental Performance of Its Manufacturing Operations
andljnnjds
No ratings yet
Week 2 Practice Quiz For Bayesian Statistics - Coursera
Document3 pages
Week 2 Practice Quiz For Bayesian Statistics - Coursera
Caio Henrique Konyosi Miyashiro
33% (3)
Ex1 - Flights Data In-Class Exercise
Document1 page
Ex1 - Flights Data In-Class Exercise
Trí Đặng
0% (2)
Thera Bank - Project - Submission - V1 PDF
Document26 pages
Thera Bank - Project - Submission - V1 PDF
Ramachandran Venkataraman
No ratings yet
Amway 7 Full
Document2 pages
Amway 7 Full
mandloij
0% (1)
FedEx Case Study - Ciprian Jitaru
Document8 pages
FedEx Case Study - Ciprian Jitaru
Jitaru Ciprian
100% (1)
Essentials of Linear Regression in Python
Document23 pages
Essentials of Linear Regression in Python
Sourav Das
No ratings yet
Project 4
Document9 pages
Project 4
Yihan Peng
No ratings yet
Logistics Persona
Document1 page
Logistics Persona
Service Management
No ratings yet
Chapter 05 - Software Effort Estimation II
Document49 pages
Chapter 05 - Software Effort Estimation II
Tahir ali
No ratings yet
Agile Vs Waterfall Ebook
Document29 pages
Agile Vs Waterfall Ebook
Ahmad Adeniyi Sharafudeen
No ratings yet
7 Cases FA10
Document9 pages
7 Cases FA10
王太八
No ratings yet
POM All Units
Document408 pages
POM All Units
s.v.praveen Gopu
No ratings yet
Module 4 Quiz - Answers
Document5 pages
Module 4 Quiz - Answers
Yasir Khan
No ratings yet
AUTOSAR 3.2 Based Protocol Data Unit Router Module
Document6 pages
AUTOSAR 3.2 Based Protocol Data Unit Router Module
Editor IJRITCC
No ratings yet
University of Greenwich: Assignment Submission Coversheet
Document8 pages
University of Greenwich: Assignment Submission Coversheet
Janar Rajaseheran
No ratings yet
IT-102 - Fundamental of Information System
Document4 pages
IT-102 - Fundamental of Information System
Ch M Sami Jutt
100% (1)
Assignment Data Analysis Example
Document10 pages
Assignment Data Analysis Example
Sadichhya Pradhanang
100% (1)
What Is A Budget
Document6 pages
What Is A Budget
Fentahun Amare
No ratings yet
COMM 401 Course Outline Fall 2019A
Document4 pages
COMM 401 Course Outline Fall 2019A
Majed Abou Alkhir
No ratings yet
Useful Cplex PDF
Document159 pages
Useful Cplex PDF
Henry
No ratings yet
ERD Assignment
Document7 pages
ERD Assignment
mitiku tolasa
No ratings yet
Week 6 Questions
Document10 pages
Week 6 Questions
Chandan Bc
No ratings yet
Project Lifecycle Models - How The Differ and When To Use Them
Document5 pages
Project Lifecycle Models - How The Differ and When To Use Them
jamoris
100% (1)
Using DDL Statements To Create and Manage Tables
Document40 pages
Using DDL Statements To Create and Manage Tables
IjazKhan
No ratings yet
Cplex Excel User
Document87 pages
Cplex Excel User
Manoj Kumar B
No ratings yet
Tutorial 4
Document2 pages
Tutorial 4
Vishant Kalwani
No ratings yet
Data Management & Documentation: To Find The Best Solution To Your Needs
Document2 pages
Data Management & Documentation: To Find The Best Solution To Your Needs
drsivaprasad7
No ratings yet
Capstone Project Report v1 - Abhishek Bihani
Document16 pages
Capstone Project Report v1 - Abhishek Bihani
hangtri1711
No ratings yet
SVM - Report
Document12 pages
SVM - Report
Gautam Praveen
No ratings yet
Unit 8 Classification and Prediction: Structure
Document16 pages
Unit 8 Classification and Prediction: Structure
Kamal Kant
No ratings yet
Syllogism - Verbal Reasoning Questions and Answers
Document3 pages
Syllogism - Verbal Reasoning Questions and Answers
Srushti Mehta
No ratings yet
Group 2 Bijal Dani (16) Shilpa Jhamb (316) Ankush Shaik (337) Sourav Mukherjee (354) Amit Nageshri
Document20 pages
Group 2 Bijal Dani (16) Shilpa Jhamb (316) Ankush Shaik (337) Sourav Mukherjee (354) Amit Nageshri
smacky777
No ratings yet
Lecture-12 Concurrency Control in Distributed Databases: DUET, Gazipur
Document20 pages
Lecture-12 Concurrency Control in Distributed Databases: DUET, Gazipur
Gaurav Rupnar
No ratings yet
DWM QP Win 2022
Document2 pages
DWM QP Win 2022
Sp
No ratings yet
In This Learning Unit, You Are Going To Learn About T24 API's
Document22 pages
In This Learning Unit, You Are Going To Learn About T24 API's
Youness Azza
100% (1)
MySQL Installation and Practice
Document17 pages
MySQL Installation and Practice
Sardor Juraev
No ratings yet
Overview of Database Management System - Studytonight
Document2 pages
Overview of Database Management System - Studytonight
satishgw
100% (1)
DBT1120C Database Design Concepts
Document7 pages
DBT1120C Database Design Concepts
Abhay Kumar Sharma BOODHOO
No ratings yet
WT 2
Document3 pages
WT 2
Hardik Agarwal
No ratings yet
(INV0006) Copy Inventory Organization - Simplifying Oracle E Business Suite
Document3 pages
(INV0006) Copy Inventory Organization - Simplifying Oracle E Business Suite
AKSHAY PALEKAR
0% (1)
Big Data Engineer - 110322
Document2 pages
Big Data Engineer - 110322
Arun K
No ratings yet
DWDM(BCS058) 2nd UNIT NOTES
Document39 pages
DWDM(BCS058) 2nd UNIT NOTES
mkbaghel1818
No ratings yet
Unit 1
Document45 pages
Unit 1
swdahmc
No ratings yet
Car Showroom
Document28 pages
Car Showroom
devanshu218216821
No ratings yet
A Short Note On Transaction in Database Management System
Document4 pages
A Short Note On Transaction in Database Management System
Imran Kabir
No ratings yet
Test Phase and Pre-Final Viva Important Questions
Document10 pages
Test Phase and Pre-Final Viva Important Questions
Iznah Khan
No ratings yet
Exame 1 PDF
Document3 pages
Exame 1 PDF
Bruno Teles
No ratings yet
New 2
Document7 pages
New 2
Sivva Chalasani
No ratings yet
Micro-Project Report PL/SQL Programme 1.0 Rationale: Management System
Document30 pages
Micro-Project Report PL/SQL Programme 1.0 Rationale: Management System
khalique demon
No ratings yet
Vendor: Oracle
Document7 pages
Vendor: Oracle
Benbase Salim Abdelkhader
100% (1)
DBMS All Five Units MCQS
Document14 pages
DBMS All Five Units MCQS
Kisan Sakthi
No ratings yet
PHP Answer 1
Document5 pages
PHP Answer 1
Dhiraj Chaudhari
No ratings yet
COMPUTER SCIENCE QP Kerala Sahodaya
Document5 pages
COMPUTER SCIENCE QP Kerala Sahodaya
Mercy
No ratings yet
1 Stored Procedures in PL/SQL: 1.1 Oracle Users
Document22 pages
1 Stored Procedures in PL/SQL: 1.1 Oracle Users
Sofie
No ratings yet
PL SQL
Document12 pages
PL SQL
Isha Thakre
No ratings yet
What Is Spark?: History of Apache Spark
Document65 pages
What Is Spark?: History of Apache Spark
Apurva
No ratings yet
Data Mining
Document25 pages
Data Mining
Deepshikha Mehta
No ratings yet
Examen 1 Parcial
Document5 pages
Examen 1 Parcial
Mr. Kevin KR
No ratings yet
Fake Media Detection Based On Natural Language Processing and Blockchain Approaches
Document12 pages
Fake Media Detection Based On Natural Language Processing and Blockchain Approaches
Zeinab Shahbazi
No ratings yet
MCSA 70-764: Administering An SQL Database Infrastructure
Document8 pages
MCSA 70-764: Administering An SQL Database Infrastructure
Muthu Raman Chinnadurai
No ratings yet