Uid - Bda Report
Uid - Bda Report
CHAPTER 1
INTRODUCTION
A Health Care insurance company is facing challenges in enhancing its revenue and
understanding the customers so it wants to take help of Big Data and User Interface Ecosystem
to analyze the Competitors company data received from varieties of sources, namely through
scrapping and third-party sources. This analysis will help them to track the behavior, condition
of customers so that to customize offers for them to buy insurance policies and also calculate
royalties to those customers who buy policies in past, this in turn will enhance their revenues.
Health care insurance analysis involves the examination and evaluation of various aspects
related to health insurance policies, coverage, costs, and outcomes. As the healthcare landscape
continues to evolve, understanding and analyzing health care insurance is crucial for
individuals, healthcare providers, insurance companies, and policymakers.
This examination includes evaluating coverage options, understanding policy terms, analyzing
premium costs, and considering factors such as deductibles and copayments. By delving into
these components, individuals and organizations can optimize their health insurance choices,
ensuring adequate coverage while managing overall healthcare expenses. This process is
crucial for making informed decisions that align with one's health needs and
financial considerations.
CHAPTER 2
OBJECTIVES
1.The goal of the project is to create data pipelines for the Health Care insurance company
which will make the company make appropriate business strategies to enhance their revenue
by analyzing customers behaviors and send offers and royalties to customers respectively.
2.Increase Revenue: By leveraging competitor data and customer insights to tailor insurance
policies and offers effectively, resulting in revenue growth.
CHAPTER 3
PROJECT ARCHITECTURE
Fig 3.1: The above figure describes the architecture of “HEALTH CARE INSURANCE ANALYSIS”
CHAPTER 4
PROBLEM STATEMENT
● Parse and Infer schema of the given xml and csv formats data is ingested.
● We are expected to do general data cleaning steps like empty string replacements
with actual NULL, data type checks (including date format) and corrections/
rejections, file name checks, empty file checks, malformed record checks and
rejection etc.
Once we have made the data ready for analysis, we have to perform analysis on a batch
basis.
Fig 4.1 decsribes the schema design for SQL Database which stores the information of “HEALTH
CARE INSURANCE ANALYSIS”.
CHAPTER 5
METHODOLOGY
User Interface Design: We have designed the frond end part using python – tkinter toolkit ,
in Visual studio code.
DATASET CREATION:A data set is a collection of data. Data sets can also consist of a
collection of documents or files. database is an organized collection of structured information,
or data, typically stored electronically in a computer system. A database is usually controlled
by a database management system (DBMS).
DATA CLEANING : Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted , duplicate, or incomplete data within a dataset
LOADING TO DATABASE : Data loading refers to the "load" component of ETL. After
data is retrieved and combined from multiple sources , cleaned, and formatted , it is then loaded
HIVE: Apache Hive is a particularly efficient tool when it comes to big data . A warehouse
data software that supports the data analysis process of big data on a regular basis, the concept
of hive big data is quite popular in the technological realm.
DATA VISULAIZATION:
Data visualization is the representation of data through use of common graphics, such as charts,
CHAPTER 6
CODE TEMPLATES
Data Processing
For each raw file we have checked null values, duplicate values and other parameters and
then converted into processed dataset. here are some samples of codes.
We have used Sqoop to import the data form RDBMS to Hive and there we can perform our
necessary tasks to get the outputs
6.4
After uploading the data in to HDFS we connected spark. Here we analyze the data with
help of python. Here we get our desired result in tabular form and that result is used to
visualize our use cases.
CHAPTER 7
OUTPUT SCREENS
Use Case -1: User Interface Design for Health Care Insurance Analysis.
Use Case-3: Number of people whose claim either got accepted or rejected.
CHAPTER 7
CONCLUSION
We have collected data from various 3rd party sources and processed them and with the
help of Big Data tools we computed the data to visualize some of necessary use case. Based
on the above analysis the health care insurance company will create a new business strategy
to acquire more customers, engagement and send offers. As well as fetching the company
and customer details and provide easy access to information regarding customers.
Balancing affordability, inclusivity, and quality of services is crucial for crafting effective
healthcare policies that cater to diverse needs. As healthcare systems continue to evolve, a
collaborative approach involving policymakers, insurers, and healthcare providers is
necessary to address emerging challenges and enhance the overall effectiveness of health
insurance programs.
CHAPTER 8
FURTHER ENHANCEMENTS
This project has a very vast scope in future in this field. We developed this project on the
requirement of our client but it can be generalized in future. If we get required resources,
we can get more accurate results. There are various use cases that can be achieved by this
project. Some of future scopes are bellow-
Real time data can also be used for real time processing.
We can automate the whole procedure where data coming from sources and
getting executed at a same time.
REFERENCES
[1] Beranger, Jérôme. 2016. Ethics in Big Data: the medical datasphere.
London: Elsevier.
[2] Davis, Cord and Patterson, Doug. 2012. Ethics of Big Data. Farnham, O’Reilly.
[4] GeekforGeeks