0% found this document useful (0 votes)
14 views15 pages

Uid - Bda Report

The document outlines a project aimed at enhancing revenue for a health care insurance company through the analysis of customer behavior and competitor data using Big Data tools. It details objectives such as creating data pipelines, increasing revenue, and improving customer understanding, along with methodologies for data processing, cleaning, and visualization. The conclusion emphasizes the importance of collaborative approaches in healthcare policy while suggesting future enhancements for real-time data processing and broader applications beyond healthcare.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views15 pages

Uid - Bda Report

The document outlines a project aimed at enhancing revenue for a health care insurance company through the analysis of customer behavior and competitor data using Big Data tools. It details objectives such as creating data pipelines, increasing revenue, and improving customer understanding, along with methodologies for data processing, cleaning, and visualization. The conclusion emphasizes the importance of collaborative approaches in healthcare policy while suggesting future enhancements for real-time data processing and broader applications beyond healthcare.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

HEALTH CARE INSURANCE ANALYSIS

CHAPTER 1

INTRODUCTION

A Health Care insurance company is facing challenges in enhancing its revenue and
understanding the customers so it wants to take help of Big Data and User Interface Ecosystem
to analyze the Competitors company data received from varieties of sources, namely through
scrapping and third-party sources. This analysis will help them to track the behavior, condition
of customers so that to customize offers for them to buy insurance policies and also calculate
royalties to those customers who buy policies in past, this in turn will enhance their revenues.

Health care insurance analysis involves the examination and evaluation of various aspects
related to health insurance policies, coverage, costs, and outcomes. As the healthcare landscape
continues to evolve, understanding and analyzing health care insurance is crucial for
individuals, healthcare providers, insurance companies, and policymakers.

This examination includes evaluating coverage options, understanding policy terms, analyzing
premium costs, and considering factors such as deductibles and copayments. By delving into
these components, individuals and organizations can optimize their health insurance choices,
ensuring adequate coverage while managing overall healthcare expenses. This process is
crucial for making informed decisions that align with one's health needs and
financial considerations.

Dept of ISE, DSATM 2023-2024 Page 1


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 2

OBJECTIVES

1.The goal of the project is to create data pipelines for the Health Care insurance company
which will make the company make appropriate business strategies to enhance their revenue
by analyzing customers behaviors and send offers and royalties to customers respectively.

2.Increase Revenue: By leveraging competitor data and customer insights to tailor insurance
policies and offers effectively, resulting in revenue growth.

3.Enhance Customer Understanding: Gain a deeper understanding of customer behavior,


preferences, and health conditions to provide more relevant and personalized insurance
solutions.

Dept of ISE, DSATM 2023-2024 Page 2


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 3

PROJECT ARCHITECTURE

Fig 3.1: The above figure describes the architecture of “HEALTH CARE INSURANCE ANALYSIS”

Dept of ISE, DSATM 2023-2024 Page 3


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 4

PROBLEM STATEMENT

Problem 1- Data Pre-processing, Enrichment and Load into Database

● Parse and Infer schema of the given xml and csv formats data is ingested.

● We are expected to do general data cleaning steps like empty string replacements
with actual NULL, data type checks (including date format) and corrections/
rejections, file name checks, empty file checks, malformed record checks and
rejection etc.

Problem 2 - Data Analysis (Spark/Hive)

Once we have made the data ready for analysis, we have to perform analysis on a batch
basis.

 Schema Design for SQL Database:

Fig 4.1 decsribes the schema design for SQL Database which stores the information of “HEALTH
CARE INSURANCE ANALYSIS”.

Dept of ISE, DSATM 2023-2024 Page 4


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 5

METHODOLOGY

User Interface Design: We have designed the frond end part using python – tkinter toolkit ,
in Visual studio code.

DATASET CREATION:A data set is a collection of data. Data sets can also consist of a
collection of documents or files. database is an organized collection of structured information,
or data, typically stored electronically in a computer system. A database is usually controlled
by a database management system (DBMS).

DATA CLEANING : Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted , duplicate, or incomplete data within a dataset

LOADING TO DATABASE : Data loading refers to the "load" component of ETL. After
data is retrieved and combined from multiple sources , cleaned, and formatted , it is then loaded

into a storage system, such as a cloud data warehouse, or relational database.

HIVE: Apache Hive is a particularly efficient tool when it comes to big data . A warehouse
data software that supports the data analysis process of big data on a regular basis, the concept
of hive big data is quite popular in the technological realm.

DATA VISULAIZATION:

Data visualization is the representation of data through use of common graphics, such as charts,

plots, infographics, and even animations.

Dept of ISE, DSATM 2023-2024 Page 5


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 6

CODE TEMPLATES

Data Processing

6.1 Conversion of raw data to processed data:

For each raw file we have checked null values, duplicate values and other parameters and
then converted into processed dataset. here are some samples of codes.

Dept of ISE, DSATM 2023-2024 Page 6


HEALTH CARE INSURANCE ANALYSIS

6.2 Processed Dataset


Some snippets of processed dataset which is further used to create RDBMS

Dept of ISE, DSATM 2023-2024 Page 7


HEALTH CARE INSURANCE ANALYSIS

6.3 Hive and Sqoop

We have used Sqoop to import the data form RDBMS to Hive and there we can perform our
necessary tasks to get the outputs

Here is the HEALTHCARE_SYSTEM Database created in Hive.

6.4

Dept of ISE, DSATM 2023-2024 Page 8


HEALTH CARE INSURANCE ANALYSIS

6.5 Apache Spark

After uploading the data in to HDFS we connected spark. Here we analyze the data with
help of python. Here we get our desired result in tabular form and that result is used to
visualize our use cases.

Dept of ISE, DSATM 2023-2024 Page 9


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 7
OUTPUT SCREENS
Use Case -1: User Interface Design for Health Care Insurance Analysis.

Use Case-2: Average Monthly premium for each subgroup

Dept of ISE, DSATM 2023-2024 Page 10


HEALTH CARE INSURANCE ANALYSIS

Use Case-3: Number of people whose claim either got accepted or rejected.

Use case-4: Which disease have maximum number of claims

Dept of ISE, DSATM 2023-2024 Page 11


HEALTH CARE INSURANCE ANALYSIS

Use Case-5: Which company/group is most profitable

Use case-6: Monthly premium paid by each subgroup Average

Dept of ISE, DSATM 2023-2024 Page 12


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 7
CONCLUSION

We have collected data from various 3rd party sources and processed them and with the
help of Big Data tools we computed the data to visualize some of necessary use case. Based
on the above analysis the health care insurance company will create a new business strategy
to acquire more customers, engagement and send offers. As well as fetching the company
and customer details and provide easy access to information regarding customers.

Balancing affordability, inclusivity, and quality of services is crucial for crafting effective
healthcare policies that cater to diverse needs. As healthcare systems continue to evolve, a
collaborative approach involving policymakers, insurers, and healthcare providers is
necessary to address emerging challenges and enhance the overall effectiveness of health
insurance programs.

Dept of ISE, DSATM 2023-2024 Page 13


HEALTH CARE INSURANCE ANALYSIS

CHAPTER 8

FURTHER ENHANCEMENTS

This project has a very vast scope in future in this field. We developed this project on the
requirement of our client but it can be generalized in future. If we get required resources,
we can get more accurate results. There are various use cases that can be achieved by this
project. Some of future scopes are bellow-

 Real time data can also be used for real time processing.

 We can automate the whole procedure where data coming from sources and
getting executed at a same time.

 Not in the Healthcare industry we can generalized the whole procedure to


other sectors like cars, online education system etc.

Dept of ISE, DSATM 2023-2024 Page 14


HEALTH CARE INSURANCE ANALYSIS

REFERENCES

[1] Beranger, Jérôme. 2016. Ethics in Big Data: the medical datasphere.
London: Elsevier.

[2] Davis, Cord and Patterson, Doug. 2012. Ethics of Big Data. Farnham, O’Reilly.

[3] Big Data Ethics

[4] GeekforGeeks

Dept of ISE, DSATM 2023-2024 Page 15

You might also like