0% found this document useful (0 votes)
40 views5 pages

Data Mining Assignment 1: Group 3: Ankita (BLP008), Arnab (BLP013), Kaustav (BLP025), Pubali (BLP040)

The document discusses a life insurance company in Germany wanting to expand into new markets. It analyzed socioeconomic data from 168 countries to identify variables to group similar countries. It used dimension reduction, clustering, and factor analysis. This identified 4 clusters - countries with high health spending but low economic factors in Cluster 1, high spending and economic factors in Cluster 2 (including Germany), high inflation countries in Cluster 3, and poorly performing countries in Cluster 4. The initial expansion strategy is to target Western European countries like Austria and France in Cluster 2 due to their strong economies and health sectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views5 pages

Data Mining Assignment 1: Group 3: Ankita (BLP008), Arnab (BLP013), Kaustav (BLP025), Pubali (BLP040)

The document discusses a life insurance company in Germany wanting to expand into new markets. It analyzed socioeconomic data from 168 countries to identify variables to group similar countries. It used dimension reduction, clustering, and factor analysis. This identified 4 clusters - countries with high health spending but low economic factors in Cluster 1, high spending and economic factors in Cluster 2 (including Germany), high inflation countries in Cluster 3, and poorly performing countries in Cluster 4. The initial expansion strategy is to target Western European countries like Austria and France in Cluster 2 due to their strong economies and health sectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA MINING ASSIGNMENT 1

Group 3: Ankita (BLP008), Arnab (BLP013), Kaustav


(BLP025), Pubali (BLP040)
0
Business Formulation
A Life Insurance company "Live Life Pvt Ltd." based out of Germany wanted to expand their market to other countries
after doing fairly well in Germany. The company specialised in giving health insurance policies with a wide range of
benefits to different segments of customers. The company already had a good reputation in Germany and in last few
years, the company had a steady revenue growth of 8%. The company wanted to invest its earnings in expanding to
other countries.

Objective
The company wanted to expand its business in countries. To do so, the company collected socio-economic data from
168 countries around the world and shortlisted about nine variables or deciding factors on the basis of which they
would be able to design their insurance plans from different organisations under UNO. Then the company wanted to
group the data of all the countries which were similar in terms of the social and economic parameters and identify the
countries which were similar to Germany and determine a list of their target market.

Methodology
• Firstly, the reliability of the data was checked by the chronbach alpha.
• Then the no of predictors was reduced by the dimension reduction method (PCA and EFA).
• After this the records and observations were grouped using the k-means algorithm of clustering.

Variables identified
Column Name Description
Country Name of the country
child_mort Death of children under 5 years of age per 1000 live births
exports Exports of goods and services per capita. Given as %age of the GDP per capita
health Total health spending per capita. Given as %age of GDP per capita
imports Imports of goods and services per capita. Given as %age of the GDP per capita
Income Net income per person
Inflation The measurement of the annual growth rate of the Total GDP
The average number of years a new born child would live if the current mortality patterns are
life_expec to remain the same
The number of children that would be born to each woman if the current age-fertility rates
total_fer remain the same.
gdpp The GDP per capita. Calculated as the Total GDP divided by the total population.

1
Procedure and Analysis

R output

cd3<-read.csv("D:/Misc_study_files/Analytics/Data Mining and Machine


Learning/Country-data.csv")
cd2<-cd3[,-c(1,4,7)]
We have dropped the categorical variables (in column 1), as well as health (in column 4) and inflation (in column 7)
as the uniqueness was found to be higher for these variables (90% and 75% respectively)
alpha(cd2)

Since the alpha value is 0.55, we can say that the data is reliable.
KMOS(cd2)

The KMO Criterion is more than 0.685, which signifies that there is 68.52% variability of the data.
bart_spher(cd2)

The p-value here is less than 0.05 i.e. 5% LOS. So, we reject H0 i.e. there is multicollinearity in the data.
pc<-princomp(cd2, cor = TRUE)
summary(pc)

Here the Eigen values or Standard deviation values of components 1 and 2 are 1.98 and 1.22 which are all more than
1 so according to the thumb rule these three components should be considered. Also, from proportion of variance we
can see that components 1and 2 explains 56.4% and 21.5% variability of the data.
loadings(pc)

2
fa<-factanal(cd2, factors = 2, rotation = "varimax", scores = "regression")
fa

Here we do the mapping of the predictors with the components.


Factor 1 – Child mortality, Life expectancy, Total fertility (Health Factor)
Factor 2 – Income, Export, Import, GDPP (Economic Factor)
Other 2 variables are health and inflation
fa1<-fa$scores
fa1

i<-data.frame(cd3$country, cd3$health, cd3$inflation, fa1)


head(i)

Here we are creating a data frame consisting of the country name, health, inflation and the 2 factors
i1<-i[,-c(1)]
3
c<-scale(i1)
set.seed(24)
result<-kmeans(c, 4)
result$centers

result$size

table(cd3$country, result$cluster)

Cluster Analysis

From the data we observed:


• In Cluster 1, the countries which have high value of Health Factor (Factor 1), low in Economic Factor (Factor 2)
& spending on health infrastructure.
• In Cluster 2, the countries which spends high on health infrastructure and have high value in Economic Factors
are clustered.
• In Cluster 3, Countries with high inflation are clustered.
• In Cluster 4, countries which are performing poorly in all the factors are clustered.
• The number of countries in each cluster is 30, 28, 27, 82 respectively.

Strategies
• From the analysis we found out that Germany is in Cluster 2.
• There are 28 countries in cluster 2, spread over all continents.
• Our initial strategy will be to expand in Western European countries in cluster 2 like Austria, Belgium, Denmark,
Finland and France for ease of operations.
• The policies to be introduced should focus on providing premium facilities in the health sector to the
beneficiaries and charge a higher amount for it in order to derive more profits. Since the economy of these
countries are good, they can pay the amount charged.

You might also like