0% found this document useful (0 votes)
24 views36 pages

23-01!99!00 CS 633 Data Ming - Final Project PDF - PDF 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views36 pages

23-01!99!00 CS 633 Data Ming - Final Project PDF - PDF 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CS 633 Data Mining

Final Project
Mr. Nardi
Winter 2023
Introduction…
• Data Mining is an Interesting and Exciting Area of Computer
Science…
• It is a Combination of Technology, Business, Data, and
Imagination…
• So Let’s Combine All of Those Items for Your Final Project…
Your Project…
• For Your Final Project, You are Going to Build an Existing Data
Mining Engine…Then Adapt It to Perform New or Similar Data
Mining Functions…and Then Build the Business Case for It…
• Got It?...
• “Ok…But Really…What is It We REALLY Have to Do?”…
• Let’s Break This Down…
Part 1 – The Fun Part!...
• For Part 1, You Need to Find an Existing Data Mining Program
and Get It to Work…
• There Are Numerous Web Sites That Give You Complete Data
Mining Programs…
• Many Give Step-by-Step Instructions for Completing the Code…
• One Such Website is Below:
https://www.interviewbit.com/blog/data-mining-projects/
Part 1 – The Fun Part!...
• That Website Contains 14 Data Mining Projects of Varying Degrees
of Difficulty…
• There Are Several Other Sites That Have Similar Projects…
• For Your Project, You Are to Select One of These Projects and Get It
to Work Per the ORIGINAL Spec…
• Keep in Mind, You Are Going to Have to Explain the Project and the
Code…This is NOT a Copy/Paste Exercise…
Part 2 – The Not So Fun Part!...
• Now That You Have the ORIGINAL Code Working, You Need to
Adapt the Original to Do Something Different…
• YOU MUST USE THE ORIGINAL CODE BASE FOR THE UPDATED
FUCTIONALITY…
For Example…
• One Group Found a Data Mining Program That Did Earthquake
Predictions…After Getting the Earthquake Program to Work,
They Updated the Code to Predicate Heart Attacks…
• Another Group Found a Program That Could Differentiate
Between Pictures of Cats and Dogs…After Getting the Cats and
Dogs Program to Work, They Updated the Code to Differentiate
Between Various Anime Characters…
What?!...One More Time…
• Part 1…
▪ One Group Found a Data Mining Program (and Data) That Would Predict Earthquakes…
▪ They Were Able to Get the Code Up and Running…
▪ They Were Able to Explain the Code, Display and Explain Charts and Graphs (More on
That Later)…
• Part 2…
▪ They Then Adapted That SAME PROGRAM to Try to Predict If Someone Was Susceptible
to Heart Attacks…
▪ They Used the Earthquake Code as a Base…and Then Updated That Code (and of Course
the Data) to Perform the Heart Attack Analysis…
• Make Sense?...
Part 3 – The Really Not So Fun Part!...
• As Data Mining is as Much a Business Exercise as a Technology
Exercise, You Need to Create CRISP-DM Type Documentation for
Your Updated Project…
A Few Notes…
• You May Work Alone or in Groups of TWO…
• As With All Projects, You MUST Get My Approval on BOTH the Original and the
Updated Project BEFORE YOU BEGIN…If You Do Not, I Will NOT ACCEPT THE
PROJECT…
• You MUST Have a LARGE Set of Training and Testing Data...We Will Work Together to
Determine Exactly What "Large" Means...
• If I Find Your Final Code Online, You Will Receive a ZERO For the Project With NO
OPPORTUNITY to Make It Up…
• So Now…What EXACTLY Do You Have to Turn In…
So What EXACTLY Am I Turning In?...
Project - What to Turn In…
• Part 1…
▪ Original Python Working Code With Output…Be Sure the Code Has a Link to the Dataset…
▪ Documentation and Output as Described in the Following Slides…
• Part 2…
▪ Updated Python Working Code With Output…Be Sure the Code Has a Link to the Dataset…
▪ Documentation and Output as Described in the Following Slides…
• Part 3…
▪ CRISP-DM Type Documentation for the UPDATED Project…
• Let’s Take a Closer Look…
Part 1 – Original Project - Code…
• Pick an ORIGINAL Project…
▪ REMEMBER…You MUST Get My Approval BEFORE YOU BEGIN…
▪ REMEMBER…the Point of the Project Is NOT to Copy/Paste Code From a Website…It Is Important
That You UNDERSTAND What the Code Does and What the Project is Trying to Accomplish…

• You MUST Have a LARGE Set of Data…


▪ NOTE: Some Projects On the Website Have Datasets That Go Into the Millions…
▪ If You Have to Create Your Own Dataset, We Can Discuss What “Large” Actually Means…

• Run the Code Several Times and Show the Intended Output…
• TURN IN the Original Python Code With the Output as Documented on the Website…
• Update the Original Code to Include the Items on the Next Slide…
Part 1 – Original Project - Code…
• Dimensions of the Data…
• Sample of the Data…
• Statistical Summary of the Data…
• Class Distribution…
• One Univariate and One Multivariate Diagram…Explain What They Mean…
• Decision Tree…Explain the Best Depth for the Project…Why is it the Best?...
• Confusion Matrix for Training and New Data With an 80%-20% and a 50%-50%
Split…
• All of the Above Should Include Explanations…
Part 1 – Original Project - Documentation…
• BRIEFLY Explain What Your Code Is Supposed to Be Doing, Including:
• Major Components of Code…
• Any Particularly Interesting or Difficult Areas of the Code or Data or
Project…Why You Thought They Were Interesting or Difficult…
• EXPLAIN the Output…
• Discuss Issues, Problems, Lessons Learned…
Part 2 – Updated Project - Code…
• Now That You Have a Working Base of Code, Let’s Apply It to a “Real World” Scenario…
• UPDATE the Original Application So That It “Works” For the NEW Application…
• You MUST Have a LARGE Set of Data…
▪ If You Have to Create Your Own Dataset, We Can Discuss What “Large” Actually Means…

• Run the Code Several Times to Compare Results and Determine Consistency…
• TURN IN the Updated Python Code With the NEW Intended Output…
• Your Updated Code Must Include the Items on the Next Slide…
• NOTE: YOU MUST UPDATE THE ORIGINAL CODE…If You Submit an Entirely New Code Base,
You Will Receive a ZERO For This Part of the Project…
Part 2 – Original Project - Code…
• Dimensions of the Data…
• Sample of the Data…
• Statistical Summary of the Data…
• Class Distribution…
• One Univariate and One Multivariate Diagram…Explain What They Mean…
• Decision Tree…Explain the Best Depth for the Project…Why is it the Best?...
• Confusion Matrix for Training and New Data With an 80%-20% and a 50%-50%
Split…
• All of the Above Should Include Explanations…
Part 2 – Original Project - Documentation…
• BRIEFLY Explain What Your Code Is Supposed to Be Doing, Including:
• Major Components of Code…
• Any Particularly Interesting or Difficult Areas of the Code or Data or
Project…Why You Thought They Were Interesting or Difficult…
• EXPLAIN the Output…
• Discuss Issues, Problems, Lessons Learned…
Part 3 – Updated Project - Documentation…
• Data Mining is Not Done Just for the Sake of Doing It – There Must Be Business
Reasons for Doing It…
• For Part 3, You Will Provide CRISP-DM Documentation for the UPDATED Project…
• The Following Slide Shows ALL of the Items That Are Included in the CRISP-DM
Documentation…
• Do Only Those Items That Are NOT Crossed Out…
• Every Tasks Does NOT Need to Be 5 Pages…In Most Cases a Page (or Two) Will Be
Fine…Just Make Sure You Are Complete…
• NOTE: You Might Want to Review This Documentation BEFORE You Decide on What
Projects (Original and Updated) to Work On…
Data Mining Project Tasks and Deliverables…
• PHASE 1.0 : Business Understanding… • PHASE 3.0 : Data Preparation… • PHASE 5.0 : Evaluation…
✓ Task 1.1 : Determine Business Objectives… ✓ Task 3.1 : Selecting Data… ✓ Task 5.1 : Evaluating Results…
▪ Deliverable : Project Scope Document – ▪ Deliverable : Data Rationale Report… ▪ Deliverable 1 : Result Assessment…
Part 1… ✓ Task 3.2 : Cleaning Data… ▪ Deliverable 2 : Model Approval…
✓ Task 1.2 : Assess the Situation… ✓ Task 5.2 : Reviewing the Process…
▪ Deliverable : Data Cleansing Report…
▪ Deliverable : Project Scope Document – ▪ Deliverable : Process Evaluation
✓ Task 3.3 : Constructing Data…
Part 2… Report…
▪ Deliverable 1 : Data Attribute Report… ✓ Task 5.3 : Determining the Next Steps…
✓ Task 1.3 : Determine Data-Mining Goals…
▪ Deliverable 2 : Data Generation Report… ▪ Deliverable 1 : Possible Actions…
▪ Deliverable : Data-Mining Scope
✓ Task 3.4 : Integrating Data… ▪ Deliverable 2 : Final Decision…
Document…
✓ Task 1.4 : Produce a Project Plan… ▪ Deliverable : Merged Data Set…
▪ Deliverable : Data Mining ✓ Task 3.5 : Formatting Data…
Project/Resource Plan… ▪ Deliverable : Final Formatted Dataset… • PHASE 6.0 : Deployment…
✓ Task 6.1 : Planning Deployment…
• PHASE 4.0 : Modeling… ▪ Deliverable : Deployment Plan…
✓ Task 4.1 : Selecting Modeling Techniques… ✓ Task 6.2 : Planning Monitoring and
• PHASE 2.0 : Data Understanding…
▪ Deliverable 1 : Defined Modeling Technique(s)… Maintenance…
✓ Task 2.1 : Gathering Data…
▪ Deliverable 2 : Defined Modeling Assumptions… ▪ Deliverable : Monitoring and
▪ Deliverable : Data Collection Report…
✓ Task 4.2 : Designing Tests… Maintenance Plan…
✓ Task 2.2 : Describing Data…
▪ Deliverable : Test Design Document… ✓ Task 6.3 : Reporting Final Results…
▪ Deliverable : Data Description Report… ▪ Deliverable 1 : Final Report…
✓ Task 2.3 : Exploring Data… ✓ Task 4.3 : Building Model(s)…
▪ Deliverable 2 : Final Presentation…
▪ Deliverable : Data Exploration Report… ▪ Deliverable 1 : Parameter Definitions…
✓ Task 6.4 : Review Project…
✓ Task 2.4 : Verifying Data Quality… ▪ Deliverable 2 : Model Descriptions…
▪ Deliverable : Team Experience
▪ Deliverable : Data Quality Report… ▪ Deliverable 3 : Data Models… Assessment…
✓ Task 4.4 : Assessing Model(s)…
▪ Deliverable 1 : Model Assessment…
▪ Deliverable 2 : Revised Parameter Settings…
PHASE 1.0 : Business Understanding…
✓Task 1.1 : Determine Business Objectives…
▪ Deliverable : Project Scope Document – Part 1…
✓Task 1.2 : Assess the Situation…
▪ Deliverable : Project Scope Document – Part 2…
✓Task 1.3 : Determine Data-Mining Goals…
▪ Deliverable : Data-Mining Scope Document…
PHASE 2.0 : Data Understanding…
✓Task 2.1 : Gathering Data…
▪ Deliverable : Data Collection Report…
✓Task 2.2 : Describing Data…
▪ Deliverable : Data Description Report…
✓Task 2.3 : Exploring Data…
▪ Deliverable : Data Exploration Report…
✓Task 2.4 : Verifying Data Quality…
▪ Deliverable : Data Quality Report…
PHASE 3.0 : Data Preparation…
✓ Task 3.1 : Selecting Data…
▪ Deliverable : Data Rationale Report…
✓ Task 3.2 : Cleaning Data…
▪ Deliverable : Data Cleansing Report…
✓ Task 3.3 : Constructing Data…
▪ Deliverable 1 : Data Attribute Report…
▪ Deliverable 2 : Data Generation Report…
✓ Task 3.4 : Integrating Data…
▪ Deliverable : Merged Data Set…
✓ Task 3.5 : Formatting Data…
▪ Deliverable : Final Formatted Dataset…
PHASE 4.0 : Modeling …
✓ Task 4.1 : Selecting Modeling Techniques…
▪ Deliverable 1 : Defined Modeling Technique(s)…
▪ Deliverable 2 : Defined Modeling Assumptions…
✓ Task 4.2 : Designing Tests…
▪ Deliverable : Test Design Document…
✓ Task 4.3 : Building Model(s)…
▪ Deliverable 1 : Parameter Definitions…
▪ Deliverable 2 : Model Descriptions…
▪ Deliverable 3 : Data Models…
✓ Task 4.4 : Assessing Model(s)…
▪ Deliverable 1 : Model Assessment…
▪ Deliverable 2 : Revised Parameter Settings…
PHASE 4.0 : Modeling …
✓ Task 4.1 : Selecting Modeling Techniques…
▪ Deliverable 1 : Defined Modeling Technique(s)…
▪ Deliverable 2 : Defined Modeling Assumptions…
✓ Task 4.2 : Designing Tests…
▪ Deliverable : Test Design Document…
✓ Task 4.3 : Building Model(s)…
▪ Deliverable 1 : Parameter Definitions…
▪ Deliverable 2 : Model Descriptions…
▪ Deliverable 3 : Data Models…
✓ Task 4.4 : Assessing Model(s)…
▪ Deliverable 1 : Model Assessment…
▪ Deliverable 2 : Revised Parameter Settings…
PHASE 5.0 : Evaluation …
✓Task 5.1 : Evaluating Results…
▪ Deliverable 1 : Result Assessment…
▪ Deliverable 2 : Model Approval…
✓Task 5.2 : Reviewing the Process…
▪ Deliverable : Process Evaluation Report (Lessons Learned)…
✓Task 5.3 : Determining the Next Steps…
▪ Deliverable 1 : Possible Actions…
▪ Deliverable 2 : Final Decision…
Questions On Project?…
Final Presentation…
• This Presentation Will Be Considered Your Final Exam…
• Your Final Presentation is a Review of the Original AND Updated
Applications That You Created…
• Your Presentation Should Last Between 15 to 20 Minutes…
• Your Presentations Are Scheduled for Wednesday, APRIL 5…
• Your Projects Are Due on Wednesday, APRIL 12…
Final Presentation…
• Part 1 - Original Code…
▪ Overview of the Original Project…
▪ Demonstration of the Code…
▪ Review of the Code…
✓I Do Not Expect You To Go Through This Line By Line…But You Should Hit All
the Major Components, and Discuss Any Particularly Interesting or Difficult
Areas of the Code or the Project…
✓Use the “Iris” Power Point as a Guide For Presenting Your Code…
▪ Review and Discuss Your Output…
▪ Discuss Issues, Problems, Lessons Learned…
Final Presentation…
• Part 2 - Updated Code…
▪ Overview of the Updated Project…
▪ Your Approach to Updating the Base Code…
▪ Demonstration of the New Code…
▪ Review of the Code…See Notes From the Prior Slide…
▪ Review and Discuss Your Output…
▪ Discuss Issues, Problems, Lessons Learned…
Final Presentation…
• Part 3 – CRISP-DM…
▪ Summarize the Key Findings By Phase…
▪ You Can Show Samples of Your Documentation But You Do NOT Have to
Read Through Each Task…
▪ Discuss Issues, Problems, Lessons Learned…
▪ Final Thoughts…
When?...
Final Presentation – April 5…
Time… Team and Project…

06:00pm

06:20pm

06:40pm

07:00pm

07:20pm

07:40pm

8:00pm

08:20pm

08:40pm
Questions on Final Presentation?…
Questions on Final Project?…
Questions on ANYTHING?…

You might also like