0% found this document useful (0 votes)
94 views

CIS 5500 Database

Uploaded by

tanishk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

CIS 5500 Database

Uploaded by

tanishk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CIS 5500 Project Milestone 2

Section I - Motivation
Our motivation behind choosing this as our project was that we saw there was a need for people
to be able to find Healthcare Providers in a streamlined fashion with specific characteristics:
proximity, specialty, cost, and ratings. Currently, if you try to google for healthcare providers near
you your page is cluttered with ads and sponsored websites so you don’t always get the best /
most appropriate healthcare provider for what you specifically need. That’s why we thought of
making a website where you could do that. Additionally, with the information required to perform
this search, we thought that it would be beneficial to also tell people what conditions they might
be predisposed to and educate them on what can be done. We saw this as a problem as people
often don’t know what conditions they are predisposed to and what steps they need to take to
avoid getting potentially life-altering conditions.

Section II - Features that WILL be implemented

● Health Risk Assessment


○ As the user loads the website, they get a pop-up survey that they have to fill out
detailing important personal info: weight, age, height, town/county, exercise
habits, nutritional habits, substance usage (alcohol, smoking, drugs, etc.),
vaccinations, (*potentially insurance plan via API Find a Provider)
○ Based on the user’s demographic and geographical information, the application
calculates potential health risks and common conditions in the area and
demographic.

● Healthcare provider matching


○ On a separate page, a search bar with sliders/dropdowns/checkboxes to select
options for healthcare providers based on criteria
○ Criteria: radius (from town), provider type, cost, quality, (*do they accept user’s
insurance via API Find a Provider)
○ Output: list of providers that meet requirements

Section III - Features that MIGHT be implemented


● Appointment helper
○ Given the user likes a certain healthcare provider, the user can click a button like
make an appointment and it will then utilize something like Google Assistant to
help set up an appointment for the user
● Auth0
○ Auth0 implementation to secure user data (entering personal details) we need to
decide if we want to completely delete a user’s data, or if we want to store it so
they can use it again in the future

Section IV - List of Pages


● Home page
○ This page will serve as a home base with buttons that will bring you to the
subpages and an overall overview of our application.
● Health Risk Assessment
○ Opening on this page (for first time with new account *see possible features with
storing data) a pop-up survey will be shown to the user and they can choose
what data they would like to submit (most of the inputs should be optional so
users can choose to opt out of some things if they choose)
○ After completing the survey the page will then display conditions the user is
predisposed to and accompanying websites about the condition and early
prevention steps
● Healthcare provider matching
○ A simple search page similar to HW3 songs page where user can select multiple
options (outlined above) and then run a query on healthcare providers based on
the given criteria
● Credits
○ A simple page with acknowledgments to anything and everything used
(technologies, TA help, etc.) as well as credits to us as the authors

Section V - Relational Schema ER Diagram


Section VI - DDL

User Table
CREATE TABLE Users (
UserID INT PRIMARY KEY,
DemographicInfo VARCHAR(255),
GeographicInformation VARCHAR(255)
);

Insurance Table
CREATE TABLE Insurance (
InsuranceID INT PRIMARY KEY,
GeographicArea VARCHAR(255),
PlanName VARCHAR(255),
PlanBenefits TEXT,
APILink VARCHAR(255)
);

Subscriber Relationship Table


CREATE TABLE Subscriber (
UserID INT,
InsuranceID INT,
FOREIGN KEY (UserID) REFERENCES Users(UserID),
FOREIGN KEY (InsuranceID) REFERENCES Insurance(InsuranceID)
);

CREATE TABLE HealthcareProvider (


ProviderID INT PRIMARY KEY,
Name VARCHAR(255),
MedicalLicenseNo VARCHAR(50),
GeographicInformation VARCHAR(255),
InstitutionAffiliation VARCHAR(255),
EducationalCredentials TEXT,
PracticingSpecialty VARCHAR(255)
);

CREATE TABLE HealthCondition (


ConditionID INT PRIMARY KEY,
MostAffectedDemographic VARCHAR(255),
MostAffectedGeography VARCHAR(255),
ProviderCareSpeciality VARCHAR(255),
PreventiveMeasures TEXT,
RiskFactors TEXT,
LinkToWHO VARCHAR(255)
);

CREATE TABLE Disease (


UserID INT,
ConditionID INT,
FOREIGN KEY (UserID) REFERENCES Users(UserID),
FOREIGN KEY (ConditionID) REFERENCES HealthCondition(ConditionID)
);

CREATE TABLE ConditionProvider (


ProviderID INT,
ConditionID INT,
FOREIGN KEY (ProviderID) REFERENCES HealthcareProvider(ProviderID),
FOREIGN KEY (ConditionID) REFERENCES HealthCondition(ConditionID)
);

Section VII - Cleaning Explanation


There are quite a lot of steps that we could use in order to pre-process and clean our data for
use.

1. Dealing with the missing values: For essential fields that cannot be imputed (e.g., NPI,
Provider Last Name, Provider First Name), we will consider removing rows with missing
values. For non-essential fields, we can also fill missing values with a placeholder (e.g.,
"Unknown" for categorical data, or the column's median for numerical data).
2. Deduplication: Identify and remove duplicate entries to avoid redundancy. This can be
particularly important for providers of healthcare insurance services listed multiple times
with slight variations in their address or other details since that could give incorrect
results.
3. Standardization: Standardize the formatting of key fields such as names, addresses, and
phone numbers to ensure consistency. This might include converting text to title case,
removing extraneous characters from phone numbers, and standardizing address
formats.
4. Data Type Conversions: We will need to ensure that each column is of the appropriate
data type. For example, ZIP codes should be treated as strings to preserve leading
zeros, and graduation years should be integers.
5. Normalization: We also plan to normalize the dataset to ensure that similar data points
are represented uniformly. This might involve unifying similar specialty names or
grouping them into broader categories to facilitate easier analysis and matching.
Especially when it comes to analyzing the dataset of insurance services per region and
being able to match, there needs to be efficient grouping on the basis of distance and
proximity.
6. Feature Engineering: We will create new features that could be useful for our application.
For example, we can extract or compute the provider's years of experience from the
graduation year, or create flags indicating if the provider offers telehealth services based
on the Telehealth field.
7. We also plan to specifically work on handling the specialties: The dataset contains
multiple columns for specialties (pri_spec, sec_spec_1, sec_spec_2, etc.). We can
aggregate these into a single column or a structured format (like a list) associated with
each provider to simplify querying and analysis.
8. For the Geographical Data like City/Town, State, and ZIP Code, ensure these are
correctly formatted and consider creating a combined location field if useful for
application's geolocation features which we definitely have to use and thus this will be
very important!!!!
9. Binary/Indicator Fields: For fields like Telehlth, ind_assgn, and grp_assgn, we will ensure
they are consistently coded (e.g., Y/N converted to True/False) to facilitate analysis and
filtering.
10. Lastly, to make sure our data is consistent and correct, we will also have a validity check
for all the geographic information and specialties.

Section VIII - Technologies that will most likely be used


● React.js
● Node.js
● MySQL
● AWS
● Javascript
● HTML
● CSS
● Python
○ pandas (pre-cleaning)
○ numpy (pre-cleaning)
● Github
○ GitHub pages (automatic deployment?)
● Auth0 (potentially)
● Hugo (website templater, potentially?)

Section VIIII - Responsibilities


1) Tanish Kelkar | [email protected] | Github: TanishKelkar
a) Great at UI design.
b) Will work on overall design, clean data and manage integrations
2) Max Mercado | [email protected] | GitHub: maxmerc
a) Best at SQL queries, front-end (HTML, CSS, React)
b) Will work on back-end and creating the provider matching algorithm
3) Ryan Kertzner | [email protected] | GitHub: rkertz
a) Great at front-end and React
b) Will work on developing the pages for HRA Assesement
4) Seher Taneja | [email protected] | GitHub : sehertaneja
a) Great at algorithms and overall optimization
b) Will work on creating the page for healthcare provider matching and survey.

You might also like