0% found this document useful (0 votes)

217 views14 pages

Predict Intro Slides

This document provides guidance and resources for students completing a SQL data normalization predictive assessment. It includes an overview of the problem domain, deliverables, rules, and starter materials. Students are asked to normalize an unnormalized retail database into third normal form and answer multiple choice questions testing their work. The document also details the original database tables, provides entity relationship diagrams for each normalization step, and answers frequently asked questions about the predictive assessment.

Uploaded by

Asmaa Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

217 views14 pages

Predict Intro Slides

Uploaded by

Asmaa Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

SQL for Data Science

Predict
Contents

Problem Context/Domain

Predict Rules + Instructions

Student Starter Pack

Important Packages

Data ERD Guide

Predict FAQs

2
Problem Context/Domain - Retail (Online Retailing Business)

Problem Statement:
The Bhejane Trading store is an online retailer specializing in the sale of covid-related essential items. As a
consultant hired by the company, you have been tasked with the objective of normalizing the database of the
store’s inventory management system.
You are provided with an unnormalised database, and are expected to normalise it's contents to bring it into 3rd
Normal Form (3NF). The database has 2 tables (products and transactions) which are summarised here.

Deliverables:
After having normalised the DB, you will be required to answer several multiple-choice questions which test
your completed work, and your practical SQL skills gained in the course.

NB: The following deliverables must also be uploaded to Athena.

- The notebook should also contain queries used to answer the MCQ.
- A SQLite ‘.db’ database ﬁle of the normalised database.

© Explore Data Science Academy

Predict Rules + Instructions

● This project is an individual project; your work needs to reﬂect your understanding of the course content.

● You are free to share ideas with colleagues and classmates, however, you are not allowed to share your code, solutions, or
submissions with any other individual. Plagiarism will not be tolerated.

● You are required to submit all of the code you use to both normalise the given dataset, and to answer the related MCQ
assessment. Submit your completed starter notebook, along with any other material as a zipped ﬁle under the ‘Upload Predict
File’ tab on Athena.

● The ofﬁcial due date of the Predict will be displayed on Athena. No submissions after 23:59 on this date will be accepted for
marking.

© Explore Data Science Academy

Student Starter Pack - Getting You on the Right Track

In order to help you get your bearings within the Predict, we’ve prepared a ‘starter pack’ which contains
essential material to guide your work. This material includes:

● Base notebook: A Jupyter notebook containing code and instructions to begin work on the Predict.
Continue developing this file to use for final solution submission to Athena.
● The unnormalised data: Two .csv files containing the unnormalised data.
○ ‘bhejane_covid_essentials_Products.csv’
○ ‘bhejane_covid_essentials_Transactions.csv’
● A description of the various data fields found in the database.

© Explore Data Science Academy

Making Sense of our Queries

Within this Predict we’ll be writing a lot of SQL statements. In order to make your SQL queries more
human-readable and to help you along, we will install an ipython-sql package to assist with syntax highlighting.

● Install the ipython-sql package by entering the following command into your terminal:
pip install ipython-sql.
● Now you can use the %%sql magic command at the start of each cell when writing your SQL queries and
the syntax will be highlighted.

© Explore Data Science Academy

Detailing the Data - Original Database Tables

To help familiarise yourself with the data

in the original database, we provide the
following ERD - showing the various
ﬁelds for the Products and
Transactions tables respectively.

You are required to use the principles of

database normalisation to transform
these tables into the 3NF schema.
Subsequent slides will detail the
normalization process

*NB: Be wary of handling NULL values

in the dataset

© Explore Data Science Academy

Detailing the Data - 1NF Entity Relationship Diagram

Throughout the Predict you will be

given the target ERD for each
normalization step.

To the right is the ERD sketch for the

1st Normal Form to get you started.

Pay attention to ﬁeld attributes such as

data types, primary keys, composite
keys, foreign keys and relationships
that exist amongst them

© Explore Data Science Academy

Detailing the Data - 2NF Entity Relationship Diagram

You are encouraged to use the AUTOINCREMENT property when creating new
ﬁelds that are going to be used as primary keys.

Detailing the Data - 3NF Entity Relationship Diagram

Hint:
As you progress through
the different normal forms
you may ﬁnd it easier
populate the current
normal forms using the
previous normal forms

Predict-related FAQs

This page will be updated periodically with common predict-related questions which may arise during the
Sprint. Consider consulting this space before asking your course facilitator a question.

Considerations to keep in mind when completing the predict, before answering the predict questions.

1. The aim of the predict is to understand and implement normalization on the dataset provided. This includes,
understanding separation of entities (tables which serve a single purpose), maintaining relationships and
enforcing normalization through data integrity.

2. Following the normalization process is an important step to follow in order to be able to answer the predict
questions effectively.

3. Having an understanding of your problem and data can be very helpful in guiding your thinking to solve a
problem. At each stage of the normalization it is suggested that you take some time to reﬂect on what changes
were made from the previous normal form and understand why transformations were made.

Predict-related FAQs

I am getting the following error - ModuleNotFoundError: No module named 'ipython-sql'; What should I do?
● Please make sure that you have installed the ipython-sql using the following command: pip install ipython-sql

I cannot make changes to my table creation code, I get the following error everytime I try - OperationalError:
table <TableName> already exists
● You are advised to ﬁrst drop the old table before re-creating the table with your new changes
○ DROP TABLE IF EXISTS [TableName];

● You can drop and create the tables as many time as you want, just remember to keep the table naming
convention consistent with the ERD sketches that are provided.

What does ‘PK’ and ‘FK’ stand for when looking at the ERD sketches?
● PK: Primary Key
● FK: Foreign Key

Predict-related FAQs

I am constantly getting errors and debugging is a nightmare

● SQL by nature requires one to be pedantic - so pay special attention to syntax and formatting. If your SQL queries
generally look like the below - may the debugging gods be with you…

● SQL doesn’t have any formatting rules (such as indentation in python), so it will allow you to run the above query
with no issues at all. It is however recommended you practise good SQL hygiene and stay away from this practice.
Although there is no book of all truths for SQL formatting, it should generally take the following form:

Predict-related FAQs

How can I compare my normalised database to the reference ERD diagrams?

● ERAlchemy is a useful package for viewing relationship diagrams within Jupyter
ERAlchemy requires GraphViz to generate the graphs and Python. Both are available for Windows, Mac and Linux.

● Within a Jupyter codecell, execute the render_er Python function to see your relationship diagram

● Or be more speciﬁc on the tables you want to include in the output

2022 L4M3 Answers Commercial Contracting Revision Paper Tutor 2023
No ratings yet
2022 L4M3 Answers Commercial Contracting Revision Paper Tutor 2023
48 pages
D1 Jul-14 Exam Report Final
100% (3)
D1 Jul-14 Exam Report Final
4 pages
Oracle DBA Workshop I - Introduction
0% (1)
Oracle DBA Workshop I - Introduction
242 pages
Using Pony in Flask
No ratings yet
Using Pony in Flask
90 pages
Cips L3M5
No ratings yet
Cips L3M5
8 pages
L4M6 Summarised Note
No ratings yet
L4M6 Summarised Note
10 pages
D1 - PM Report EXTERNAL Jan 19
No ratings yet
D1 - PM Report EXTERNAL Jan 19
8 pages
CIPS Guide To Exemptions 8pp A4 0119 v5
100% (1)
CIPS Guide To Exemptions 8pp A4 0119 v5
7 pages
L4M7 Exam Practice 18
No ratings yet
L4M7 Exam Practice 18
2 pages
Cips L3M6
No ratings yet
Cips L3M6
7 pages
L4M6 Supplier Relationship Lesson Plan
No ratings yet
L4M6 Supplier Relationship Lesson Plan
50 pages
Cips l4m2 Dumps by Hays 24 05 2024 6qa Actualtestdumps
No ratings yet
Cips l4m2 Dumps by Hays 24 05 2024 6qa Actualtestdumps
9 pages
Return To Review: Incorrect
No ratings yet
Return To Review: Incorrect
50 pages
L4M6 #1.1
100% (1)
L4M6 #1.1
73 pages
L5M4 Exam Valid Dumps Questions
No ratings yet
L5M4 Exam Valid Dumps Questions
12 pages
L4M4 Summarised Note Revised
No ratings yet
L4M4 Summarised Note Revised
41 pages
L4M8 GRP Disc.
No ratings yet
L4M8 GRP Disc.
3 pages
L4M7 - Whole Life Asset Management - Sep19 Exam
No ratings yet
L4M7 - Whole Life Asset Management - Sep19 Exam
21 pages
D4 - Negotiating and Contracting in Procurement and Supply: Exam Exemplar Questions
No ratings yet
D4 - Negotiating and Contracting in Procurement and Supply: Exam Exemplar Questions
9 pages
CIPS Di (Loma L4
No ratings yet
CIPS Di (Loma L4
8 pages
L4M6 Questions
No ratings yet
L4M6 Questions
12 pages
Diploma Cips Results
100% (1)
Diploma Cips Results
1 page
Cips November 2014 Examination Timetable
No ratings yet
Cips November 2014 Examination Timetable
1 page
L5M9 Rev
No ratings yet
L5M9 Rev
20 pages
L4M8 - Sep19 - PM Report
No ratings yet
L4M8 - Sep19 - PM Report
1 page
L4M2 Demo
No ratings yet
L4M2 Demo
18 pages
CIPS L4M2 - Sequential Questions - Video 2 - Test 2
No ratings yet
CIPS L4M2 - Sequential Questions - Video 2 - Test 2
18 pages
L4M6 Final-1 231111 211042
100% (1)
L4M6 Final-1 231111 211042
13 pages
L4M3 - Commercial Contracting 1.1 Slides
100% (1)
L4M3 - Commercial Contracting 1.1 Slides
28 pages
Defining Business Needs: Practice Questions
100% (1)
Defining Business Needs: Practice Questions
87 pages
Day 1 - 24th Nov 2023
No ratings yet
Day 1 - 24th Nov 2023
4 pages
L4m5 Negotiation Free Test 30 Questions
No ratings yet
L4m5 Negotiation Free Test 30 Questions
10 pages
L4M6 REV QUSETIONS - Docx-3
No ratings yet
L4M6 REV QUSETIONS - Docx-3
18 pages
Level 4 m3 Quesrtions
No ratings yet
Level 4 m3 Quesrtions
12 pages
L3M4 Tutor Notes TEAM Dynamics
100% (1)
L3M4 Tutor Notes TEAM Dynamics
33 pages
L4M8 May 2024 External Marker Report
100% (1)
L4M8 May 2024 External Marker Report
2 pages
CIPS L4M8 Sample Questions V2
No ratings yet
CIPS L4M8 Sample Questions V2
4 pages
Cips l4m8 Actualtestdumps Actual Questions by Mullen 15 04 2024 9qa
No ratings yet
Cips l4m8 Actualtestdumps Actual Questions by Mullen 15 04 2024 9qa
7 pages
Abm L2m4-Test 1 Nov 2020
No ratings yet
Abm L2m4-Test 1 Nov 2020
6 pages
CIPS Exam Series - CR Timetable
0% (1)
CIPS Exam Series - CR Timetable
1 page
L4 M2 Demo
No ratings yet
L4 M2 Demo
7 pages
Ethical Procurement and Supply Questions
No ratings yet
Ethical Procurement and Supply Questions
2 pages
L4M4 Mock
No ratings yet
L4M4 Mock
41 pages
Sample Questions L2 Module 4
No ratings yet
Sample Questions L2 Module 4
5 pages
Questions For Chapter 1 Group Assignemnt Operations l2m2
100% (1)
Questions For Chapter 1 Group Assignemnt Operations l2m2
34 pages
L4M8 July 2024 External Marker Report
100% (1)
L4M8 July 2024 External Marker Report
2 pages
L4M3 Questions
No ratings yet
L4M3 Questions
6 pages
Cips L6M5
100% (1)
Cips L6M5
10 pages
D1 Jan15
100% (2)
D1 Jan15
5 pages
Cips l4m5 Dumps by Barber 24 05 2024 8qa Dumpssheet
No ratings yet
Cips l4m5 Dumps by Barber 24 05 2024 8qa Dumpssheet
13 pages
Contexts of Procurement and Supply
100% (1)
Contexts of Procurement and Supply
3 pages
L4M7 Sample
No ratings yet
L4M7 Sample
6 pages
L4M6 - Focus Areas & Questions
100% (1)
L4M6 - Focus Areas & Questions
122 pages
L4M4 Practice Tests & Results
No ratings yet
L4M4 Practice Tests & Results
40 pages
L3M6 Tutor Notes 1.0 JAN19
100% (1)
L3M6 Tutor Notes 1.0 JAN19
30 pages
l6m7 Paper
No ratings yet
l6m7 Paper
8 pages
SUMMARY 3 - L4M4 v2 2
No ratings yet
SUMMARY 3 - L4M4 v2 2
73 pages
L4M8 November 2023 PM External Marker Report
100% (1)
L4M8 November 2023 PM External Marker Report
2 pages
MER Plan Outline
No ratings yet
MER Plan Outline
6 pages
Relationship Management-Knowledge How To
No ratings yet
Relationship Management-Knowledge How To
20 pages
L4M8 March 2024 External Marker Report
No ratings yet
L4M8 March 2024 External Marker Report
2 pages
Organizational Readiness to E-Transformation
From Everand
Organizational Readiness to E-Transformation
Aqel M. Aqel
No ratings yet
SQL Presentations Rubric - Rubric Description
No ratings yet
SQL Presentations Rubric - Rubric Description
1 page
Asmaa A Hassan
No ratings yet
Asmaa A Hassan
2 pages
Cambridge IGCSE™: Islamiyat 0493/12 October/November 2020
No ratings yet
Cambridge IGCSE™: Islamiyat 0493/12 October/November 2020
13 pages
Cambridge IGCSE: ISLAMIYAT 0493/21
No ratings yet
Cambridge IGCSE: ISLAMIYAT 0493/21
16 pages
Cambridge IGCSE™: Islamiyat 0493/11 October/November 2020
No ratings yet
Cambridge IGCSE™: Islamiyat 0493/11 October/November 2020
12 pages
SQL Direct User Guide
No ratings yet
SQL Direct User Guide
47 pages
Crypto Format
No ratings yet
Crypto Format
4 pages
Vol 3
No ratings yet
Vol 3
34 pages
DDL - DML
No ratings yet
DDL - DML
54 pages
Information Systems and Technology (In/It) : Purdue University Global 2022-2023 Catalog - 1
No ratings yet
Information Systems and Technology (In/It) : Purdue University Global 2022-2023 Catalog - 1
9 pages
Database Security Mechanisms and Implementations
No ratings yet
Database Security Mechanisms and Implementations
6 pages
DBMS External Internal Question Bank
No ratings yet
DBMS External Internal Question Bank
10 pages
HP Man UniversalDiscovery OracleLMS 1.30 User Guide 2nded PDF
No ratings yet
HP Man UniversalDiscovery OracleLMS 1.30 User Guide 2nded PDF
41 pages
DP 900 - Udemy
No ratings yet
DP 900 - Udemy
62 pages
A11yprovider Log
No ratings yet
A11yprovider Log
232 pages
Using and Installing BMC Analytics
No ratings yet
Using and Installing BMC Analytics
41 pages
1 - DML and DDL Homework
No ratings yet
1 - DML and DDL Homework
2 pages
Online Book Store Report
No ratings yet
Online Book Store Report
30 pages
SQL - Notes
No ratings yet
SQL - Notes
3 pages
All SQL Queries - Python For Xi CS PDF
No ratings yet
All SQL Queries - Python For Xi CS PDF
10 pages
Replication and Database Mirroring
No ratings yet
Replication and Database Mirroring
22 pages
Size, Capacity, and Load Testing For Power BI (PBI) With Microsoft Fabric
No ratings yet
Size, Capacity, and Load Testing For Power BI (PBI) With Microsoft Fabric
5 pages
NextLeap SQL Cheat Sheet Compressed
No ratings yet
NextLeap SQL Cheat Sheet Compressed
23 pages
SoftOne BlackBook ENG Ver.3.3 PDF
No ratings yet
SoftOne BlackBook ENG Ver.3.3 PDF
540 pages
Silo - Tips - Summer 13 Examination Model Answer Subject Name Advanced Java Programming
No ratings yet
Silo - Tips - Summer 13 Examination Model Answer Subject Name Advanced Java Programming
26 pages
Test
No ratings yet
Test
80 pages
06 PL & SQL
No ratings yet
06 PL & SQL
12 pages
Database by Kuya NR Newdocx PDF Free
No ratings yet
Database by Kuya NR Newdocx PDF Free
82 pages
Superbase Odbc
No ratings yet
Superbase Odbc
21 pages
LytecInstallation 2
No ratings yet
LytecInstallation 2
109 pages
Databases Note
No ratings yet
Databases Note
6 pages
SQL Interview Questions On Database Backups
No ratings yet
SQL Interview Questions On Database Backups
47 pages
Sample Paper Questions On Database Management System
No ratings yet
Sample Paper Questions On Database Management System
12 pages

Predict Intro Slides

Uploaded by

Predict Intro Slides

Uploaded by

SQL for Data Science

Predict Rules + Instructions

Student Starter Pack

Data ERD Guide

NB: The following deliverables must also be uploaded to Athena.

© Explore Data Science Academy

© Explore Data Science Academy

© Explore Data Science Academy

© Explore Data Science Academy

To help familiarise yourself with the data

You are required to use the principles of

*NB: Be wary of handling NULL values

© Explore Data Science Academy

Throughout the Predict you will be

To the right is the ERD sketch for the

Pay attention to ﬁeld attributes such as

© Explore Data Science Academy

© Explore Data Science Academy

© Explore Data Science Academy

© Explore Data Science Academy

© Explore Data Science Academy

I am constantly getting errors and debugging is a nightmare

© Explore Data Science Academy

How can I compare my normalised database to the reference ERD diagrams?

● Or be more speciﬁc on the tables you want to include in the output

© Explore Data Science Academy

You might also like