Applying Data Mining Techniques in Predicting Index and Non-Index Crimes
Applying Data Mining Techniques in Predicting Index and Non-Index Crimes
4, August 2019
534
International Journal of Machine Learning and Computing, Vol. 9, No. 4, August 2019
cluster. These processes will be repeated until the latest Fig. 3 shows that among the city and all municipalities in
centroids do not change [22]. Surigao del Norte, Surigao City has the highest crime rate
The experimental result for clustering was implemented with a total of 7,267 recorded index and non-index crimes
using KNIME (Konstanz Information Miner) [23] analytics from the year 2013-2017.
platform. Fig. 2 shows the node structure of K-means In Fig. 4, violation of special laws is the top recorded crime
clustering executed in KNIME. The node for the K-Means is in Surigao City. It is followed by other non-index crimes, theft,
connected and then positioned after the node of the imported physical injury, and robbery. The least recorded crimes in the
csv file of the dataset. The node color manager comes after as city are the cattle rustling, homicide, and reckless imprudence
it put distinctions to the results to be generated later. The node resulting to homicide.
scatter plot shows the scatter plot of the clusters while the
interactive table is used to view the result in a table manner
535
International Journal of Machine Learning and Computing, Vol. 9, No. 4, August 2019
index crimes, while the ARIMA(1,0,7) model was used in this from the actual homicide data from the past five years. It can
paper in determining the occurrence of index crimes for the be seen in Fig. 8 that there is a predicted increase of homicide
next five years. Fig. 7 to Fig. 14 showed the graph of the from year 2017 to 2018 but a decreasing trend from 2020 to
predicted index crimes from 2018 to 2020 having 80% and 2022.
95% interval. The total forecasted rape from 2018-2022 in the province
An autoregressive integrated moving average (ARIMA) of Surigao del Norte is 306 which is 5% higher from the actual
model makes prediction of time series values based upon rape data of year 2013-2017. It is forecasted that there is a
prior values (AR terms) as well as the errors made by previous considerable increase of rape cases from 2017 to 2018 and a
predictions (MA terms). This allows the model to adjust itself decrease years after. A slightly increase pattern from year
to sudden changes in the time series. Therefore, the ARIMA 2020 to 2022 is also evident in Fig. 10.
forecasting equation for a stationary time series is a linear
regression equation in which the predictors are the lags of the
dependent variable and/or lags of the prediction errors. This
model is explained in more detail in [18], [24]. In this paper,
this method was implemented in R Studio using R language.
It is shown in Fig. 7 that there is a decrease of recorded
murder cases in the province of Surigao del Norte since 2016
to 2017. The highest predicted rate of murder is 57 in the
years 2019 and 2022. The total predicted murder case for the
year 2018-2022 is 279 which is 4% lesser from the 290 actual Fig. 10. Forecasted rape from 2018-2022.
murder cases from the past five years.
It is evident in Fig. 9 that there is a rapid increase rate of It is presented in Fig. 11 that there is a decreasing trend of
physical injury from 2017 to 2018 with crime data of 608 robbery from year 2013 to year 2017. Meanwhile, the
which is also considered as the highest. A decrease is forecasted crime data of robbery shows a rapid increase from
forecasted in year 2018 to 2019 and an increase a year after. 62 actual data for the year 2017 to 146 forecasted data in year
The total forecasted crime data of physical injury from 2018 2018. An increasing and decreasing pattern is shown from
to 2022 in the province of Surigao del Norte is 2,508 which is 2017 to 2020 and 2020 to 2022, respectively. The total
26% higher from the past five years. forecasted crime data of robbery from 2018-2022 in the
province of Surigao del Norte is 1,203 which is 7% higher
from the past.
536
International Journal of Machine Learning and Computing, Vol. 9, No. 4, August 2019
crime data of car napping from 2018-2022 in the province of The total predicted data for reckless imprudence resulting
Surigao del Norte is 492 which is 24% higher from the past to physical injury from 2018-2022 in the province of Surigao
five years. del Norte is 1,482 which is 24% higher from the 1,123 actual
data from the past five years. It can be seen in Fig. 16 that
there is a predicted increase from year 2017 to 2020 but a
decreasing trend from 2020 to 2022.
537
International Journal of Machine Learning and Computing, Vol. 9, No. 4, August 2019
the forecasted crime data shows a rapid increase from 256 [4] K. Rajalakshmi, S. S. Dhenakaran, and N. Roobini, “Comparative
analysis of K-means algorithm in disease prediction,” Int. J. Sci. Eng.
actual data for the year 2017 to 344 forecasted data in year Technol. Res., vol. 4, no. 7, pp. 2697–2699, 2015.
2018. An increasing and decreasing pattern is shown from [5] J. Agarwal, “Crime analysis using K-means clustering,” Int. J. Comput.
2017 to 2020 and 2020 to 2022, respectively. The total Appl. , vol. 83, no. 4, pp. 975–8887, 2013.
forecasted crime data of other non-index crimes from [6] O. Vaidya, S. Mitra, R. Kumbhar, S. Chavan, and R. Patil,
Comprehensive Comparative Analysis of Methods For Crime, pp.
2018-2022 in the province of Surigao del Norte is 2,530 715–718, 2018.
which is 23% higher from the 1,950 actual data from [7] D. Kaur and K. Jyoti, Enhancement in the Performance of K-means
2014-2017. Algorithm, vol. 2, no. 1, pp. 29–32, 2013.
[8] A. Bansal, M. Sharma, and S. Goel, “Improved K-mean clustering
algorithm for prediction analysis using classification technique in data
mining,” Int. J. Comput. Appl., vol. 157, no. 6, pp. 975–8887, 2017.
[9] E. Şuşnea, “Using data mining techniques in higher education,” High.
Educ., vol. 1, no. 1, pp. 68–72, 1996.
[10] R. Kitchin, “Big Data, new epistemologies and paradigm shifts,” Big
Data Soc., vol. 1, no. 1, p. 205395171452848, 2014.
[11] J. Chan and L. B. Moses, “Is big data challenging criminology?” Theor.
Criminol., vol. 20, no. 1, pp. 21–39, 2016.
[12] C. Yu, M. W. Ward, M. Morabito, and W. Ding, Crime Forecasting
Using Data Mining Techniques, 2011.
[13] P. Gupta, A. S. Sabitha, and T. Choudhury, Terrorist Attacks Analysis
Fig. 19. Forecasted other non-index crimes from 2018-2022. Using Clustering Algorithm, © Springer Nat. Singapore Pte Ltd., pp.
317–328, 2018.
[14] J. Azeez and D. J. Aravindhar, “Hybrid approach to crime prediction
using deep learning,” in Proc. 2015 Int. Conf. Adv. Comput. Commun.
IV. CONCLUSION Informatics, 2015, pp. 1701–1710.
With the use of K-Means clustering algorithm, determining [15] A. Malik, R. Maciejewski, S. Towers, S. Mccullough, and D. S. Ebert,
“Proactive spatiotemporal resource allocation and predictive visual
the groupings of municipality with identical traits and values analytics for community policing and law enforcement,” IEEE Trans
became possible. In cluster 1, the Surigao City topped as the Vis Comput Graph., vol. 20, no. 1, pp. 1863–1872, 2014.
municipality in Surigao del Norte with most number of [16] W. Gorr, A. Olligschlaeger, and Y. Thompson, “Short-term forecasting
of crime,” Int. J. Forecast., vol. 19, no. 4, pp. 579–594, 2003.
reported index and non-index crimes. In cluster 2, the [17] P. Chen, H. Yuan, and X. Shu, “Forecasting crime using the ARIMA
municipality of Placer, Claver and Dapa has the highest crime model,” in Proc. 5th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD,
rate. Meanwhile in cluster 3, Del Carmen, Sta. Monica and 2008, vol. 5, pp. 627–630.
[18] E. Cesario, C. Catlett, and D. Talia, “Forecasting crimes using
Pilar was identified. autoregressive models,” in Proc. 2016 IEEE 14th Int. Conf.
Among the index crimes in the province of Surigao del Dependable, Auton. Secur. Comput., pp. 795–802, 2016.
Norte, theft was identified as the highest number of recorded [19] M. Y. Orong, A. M. Sison, and A. A. Hernandez, “Mitigating
vulnerabilities through forecasting and crime trend analysis,” in Proc.
crime with a total of 2,565 from 2013-2017 with the highest
2018 5th Int. Conf. Bus. Ind. Res., 2018, pp. 57–62.
occurrence in 2014. Furthermore, the highest predicted crime [20] Y. Kang, R. J. Hyndman, and K. Smith-Miles, “Visualising forecasting
for the year 2018-2022 is the physical injury having the algorithm performance using time series instance spaces,” Int. J.
predicted value of 2,508 or 26% increase from 2014-2017. Forecast., vol. 33, no. 2, pp. 345–358, 2017.
[21] A. Ben Ayed, M. Ben Halima, and A. M. Alimi, “Survey on clustering
Moreover, the least reported crime in the province is cattle methods: Towards fuzzy clustering for big data,” in Proc. 6th Int. Conf.
rustling. Soft Comput. Pattern Recognition, 2015, pp. 331–336.
For the non-index crimes, violation of special laws was [22] A. Thammano and A. K. Algorithm, Enhancing K-means Algorithm
for Solving Classification Problems, pp. 1652–1656, 2013.
identified as the highest reported incident in the province with [23] L. Feltrin, “KNIME an open source solution for predictive analytics in
the highest occurrence in 2014. Moreover, violation of special the geosciences [software and data sets],” IEEE Geosci. Remote Sens.
laws has the highest predicted value of 3,959 or 25% increase Mag., vol. 3, no. 4, 2015.
[24] A. Rege et al., “Predicting adversarial cyber intrusion stages using
from the data of year 2014-2017 with the highest occurrence autoregressive neural networks,” IEEE Intell. Syst., 2018.
in 2020.
538