Cyber Threat Detection From Twitter
Cyber Threat Detection From Twitter
Abstract—To be prepared against cyberattacks, most organi- as a natural aggregator of multiple sources [5]. This social
zations resort to security information and event management media platform offers a large and diverse pool of users, high
systems to monitor their infrastructures. These systems depend
arXiv:1904.01127v1 [cs.LG] 1 Apr 2019
A1 A2 A3 B1 B2 B3 C1 C2 C3
Stanford CRF 1.000 0.917 0.852 0.999 0.879 0.810 0.999 0.899 0.906
BiLSTM + Random 0.999 0.932 0.906 0.999 0.919 0.882 0.987 0.925 0.928
BiLSTM + GloVE 0.999 0.926 0.894 0.997 0.932 0.888 0.998 0.924 0.932
BiLSTM + Word2Vec 1.000 0.893 0.864 0.998 0.927 0.890 0.984 0.928 0.934
TABLE VI
V ULNERABILITIES PUBLISHED ON TWITTER PRIOR TO BEING DISCLOSED IN NVD
08/06/2017 27/03/2017 VMware Player12.x < 12.5.4Drag-and-Drop Feature Guest-to-HostCode Execution(VMSA-2017-0005)(Linux)https://t.co/xMIP5JlvOZ 9.9
VER VUL
A3 PRO ID PRO
27/03/2017 24/03/2017 Vuln: Broadcom BCM4339 SoC CVE-2017-6957 Stack-Based Buffer Overflow Vulnerability https://t.co/vR6EznOsBi 8.1
PRO ID ID VUL
B2 20/03/2017 29/01/2017 Vuln: Apache Tomcat CVE-2016-6816 Security Bypass Vulnerability https://t.co/PfOdfDGIfy 7.1
PRO ID VUL
27/07/2018 01/03/2017 Vuln: Red Hat CloudForms Management Engine CVE-2017-2632 Privilege Escalation Vulnerability https://t.co/Vm0fMMM1Rc 4.9
PRO ID
B3 VUL
20/03/2017 16/03/2017 Vuln: Apache Tomcat CVE-2016-6816 Security Bypass Vulnerability https://t.co/FK5nXKcfy8 #bugtraq 7.1
PRO ID VUL
16/03/2017 06/02/2017 #Vuln: #Microsoft #Windows CVE-2017-0016 Memory Corruption Vulnerability https://t.co/ZR3DVVgx3j #bugtraq 5.9
ORG PRO ID
C2 VUL
15/02/2017 14/02/2017 ZDI-17-109 : Adobe Flash Player MessageChannel Type Confusion Remote Code Execution Vulnerability https://t.co/hTaiCS671W 8.8
ID PRO VUL
27/07/2018 01/03/2017 Vuln: Red Hat CloudForms Management Engine CVE-2017-2632 Privilege Escalation Vulnerability https://t.co/Vm0fMMM1Rc 4.9
PRO ID
C3 VUL
11/06/2018 08/03/2017 #Vuln: #Mozilla #Firefox MFSA 2017-05 Multiple Security Vulnerabilities https://t.co/POFeaWjREj #bugtraq 7.5
ORG PRO ID VUL
Thus, the choice of the best model may depend on favoring the The BiLSTM NER model recognized the most important
testing set F1-score over the validation sets’ or on considering aspects of these tweets, such as the infrastructure asset, the
an average of both preferable to make the decision. vulnerability, and useful identifiers such as the Common
Regarding the dimension of the character embedding vector Vulnerabilities and Exposures (CVE). These identified entities
and the character-level hidden state of BiLSTM networks, could then have been used to issue a security warning or to
the best performing models favored (8 out of 9) the highest fill an IoC in a threat sharing platform.
available value of 100. For the word-level hidden state vector Although our current datasets are not large, the results ob-
dimension, most models (8 out of 9) used the smallest avail- tained and the information relevance and timeliness justify the
able value of 100. Finally, regarding the dropout layers, no possibility of using Twitter as an OSINT source for cyberthreat
clear trend could be observed in the results. discovery. Furthermore, even though we did not identify one
case in the datasets used where the tweet references a zero-day
VI. A NALYSIS OF I NDICATORS OF C OMPROMISE exploit without mentioning a CVE or similar identifier, such
was the case in late August of 2018. A Twitter user made
Regarding the applicability of Twitter for cyberthreat aware- public a zero-day vulnerability in Microsoft Windows’ task
ness, we analyzed the tweets labeled relevant by the classifier scheduler, providing a proof-of-concept exploit.2 We sent the
in the validation and testing sets. By using the ID label, we original tweet through our pipeline and the classifier and NER
analyzed the corresponding NVD vulnerability entry to verify models correctly labeled the tweet as relevant and identified
the existence of tweets mentioning these vulnerabilities priorly the asset in question to be Microsoft Windows. The exploit was
to the disclosure date, and to find their severity according to officially made public only on the 9th of September, regardless
the Common Vulnerability Scoring System (CVSS) [31]. of the original tweet appearance on the 24th of August.3
A sample of such tweets found is displayed in Table VI. Thus, by combining the timeliness of the Twitter informa-
Each entry shows the tweet date and the NVD disclosure date, tion stream with the ability of deep neural architectures to
the tweet with the labels identified by the NER model, and the accurately detect relevant tweets and identify useful pieces of
CVSS score. The number of days since tweet publication to information therein, OSINT-based threat intelligence platforms
NVD disclosure ranged from 1 to 148, clearly showing the can improve significantly from the current state-of-art, to
timeliness with which the deep neural network-based OSINT provide targeted, timely, and relevant threat intelligence.
processing pipeline can provide vulnerability information to
2 https://www.zdnet.com/article/windows-zero-day-vulnerability-disclosed-
organizations’ SOCs. The tweets CVSS score range from a
through-twitter/
medium 4.9 score to a 9.9 critical score, thus showing the 3 https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-
relevance of the information found. 2018-8440
VII. C ONCLUSIONS [9] S. Zhou, Z. Long, L. Tan, and H. Guo, “Automatic identification
of indicators of compromise using neural-based sequence labelling,”
This paper proposes deep neural network architectures to Computing Research Repository, 2018.
implement the core tasks of a processing pipeline to ob- [10] A. Severyn and A. Moschitti, “Twitter Sentiment Analysis with Deep
Convolutional Neural Networks,” in Proc. of the 38th International
tain timely, relevant and targeted security-related information ACM SIGIR Conference on Research and Development in Information
from Twitter. The proposed system is capable of gathering Retrieval. Association for Computing Machinery, 2015.
tweets from a set of accounts, filtering them based on a set [11] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for
Hate Speech Detection in Tweets,” in Proc. of the 26th International
of keywords defining an infrastructure to monitor, selecting Conference on World Wide Web Companion (WWW). International
the tweets containing relevant information, and identifying World Wide Web Conferences Steering Committee, 2017.
useful pieces of information in these tweets. For that, we [12] A. J. Yepes and A. MacKinlay, “Ner for medical entities in twitter using
sequence to sequence neural networks,” in Proc. of the Australasian
implemented convolutional and bidirectional long short-term Language Technology Association Workshop 2016, 2016.
memory neural networks. We compare the performance of the [13] C. Wagner, A. Dulaunoy, G. Wagener, and A. Iklody, “MISP: The Design
proposed approach to well-established methodologies, veri- and Implementation of a Collaborative Threat Intelligence Sharing Plat-
form,” in Proc. of the 2016 ACM on Workshop on Information Sharing
fying that the deep neural network architectures outperform and Collaborative Security (WISCS). Association for Computing
those methodologies. Three case studies specified by one Machinery, 2016.
nation-wide and two world-wide private organizations were [14] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors
for Word Representation,” in Proc. of the Empirical Methods in Natural
used to validate the approach. Across the three case studies, Language Processing, 2014.
the convolutional neural network binary classifier achieved [15] T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, “Efficient estimation
an average TPR and TNR of 92%, while the named entity of word representations in vector space,” 2013.
[16] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and
recognition BiLSTM model achieved an average F1-score of P. Kuksa, “Natural Language Processing (Almost) from Scratch,” Jour-
92% in detecting specified labels. nal of Machine Learning Research, 2011.
Future research will focus on exploring multi-task learning [17] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
dinov, “Dropout: A Simple Way to Prevent Neural Networks from
architectures to shape our pipeline into a fully end-to-end Overfitting,” Journal of Machine Learning Research, 2014.
neural network and to evaluate its impact on the models’ [18] S. Wager, S. Wang, and P. S. Liang, “Dropout Training as Adaptive
performance and on the requirements for pipeline adaptation Regularization,” in Advances in Neural Information Processing Systems
26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q.
over time. Weinberger, Eds. Curran Associates, Inc., 2013.
[19] ENISA, “Risk Management - Glossary,” https://www.enisa.europa.
ACKNOWLEGMENT eu/topics/threat-risk-management/risk-management/current-risk/
risk-management-inventory/glossary, accessed: Sept. 2018.
This work was partially supported by the EC through [20] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional Random
funding of the H2020 DiSIEM project (H2020-700692), and Fields: Probabilistic Models for Segmenting and Labeling Sequence
by the LASIGE Research Unit (UID/CEC/00408/2019). Data,” in Proc. of the 18th International Conference on Machine
Learning (ICML). Morgan Kaufmann Publishers Inc., 2001.
[21] Information Technology Laboratory, “National Vulnerability Database,”
R EFERENCES https://nvd.nist.gov/, accessed: Jan. 2019.
[1] Center for Strategic and International Studies (CSIS) and McAfee, [22] C. Cortes and V. Vapnik, “Support-Vector Networks,” Journal of Ma-
“Economic Impact of Cybercrime — No Slowing Down Report,” https: chine Learning Research, 1995.
//www.csis.org/analysis/economic-impact-cybercrime, accessed: Nov. [23] F. Rosenblatt, “The perceptron: A probabilistic model for information
2018. storage and organization in the brain.” Psychological review, 1958.
[2] Q. Le Sceller, E. B. Karbab, M. Debbabi, and F. Iqbal, “SONAR: [24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal
Automatic Detection of Cyber Security Events over the Twitter Stream,” representations by error propagation,” Tech. Rep., 1986.
in Proc. of the 12th International Conference on Availability, Reliability [25] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating Non-local
and Security (ARES). Association for Computing Machinery, 2017. Information into Information Extraction Systems by Gibbs Sampling,”
[3] X. Liao, K. Yuan, X. Wang, Z. Li, L. Xing, and R. Beyah, “Acing the in Proc. of the 43rd Annual Meeting on Association for Computational
IOC Game: Toward Automatic Discovery and Analysis of Open-Source Linguistics (ACL). Association for Computational Linguistics, 2005.
Cyber Threat Intelligence,” in Proc. of the 2016 ACM SIGSAC Confer- [26] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
ence on Computer and Communications Security (CCS). Association S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga,
for Computing Machinery, 2016. S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden,
[4] C. Sabottke, O. Suciu, and T. Dumitras, “Vulnerability Disclosure in M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: A System for Large-
the Age of Social Media: Exploiting Twitter for Predicting Real-World scale Machine Learning,” in Proc. of the 12th USENIX Conference
Exploits,” in Proc. of the 24th USENIX Security Symposium (USENIX on Operating Systems Design and Implementation (OSDI). USENIX
Security 15). USENIX Association, 2015. Association, 2016.
[5] A. Attarwala, S. Dimitrov, and A. Obeidi, “How efficient is Twitter: [27] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
Predicting 2012 U.S. presidential elections using Support Vector Ma- in Proc. of the 3rd International Conference on Learning Representa-
chine via Twitter and comparing against Iowa Electronic Markets,” in tions (ICLR), 2015.
Intelligent Systems Conference, 2017. [28] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu,
[6] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” J. Freeman, D. Tsai, M. Amde, S. Owen et al., “Mllib: Machine learning
in Proc. of the 2014 Conference on Empirical Methods in Natural in apache spark,” Journal of Machine Learning Research, 2016.
Language Processing. Association for Computational Linguistics, 2014. [29] L. Bottou, “Large-scale machine learning with stochastic gradient de-
[7] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, scent,” in Proc. of COMPSTAT’2010. Springer, 2010.
“Neural architectures for named entity recognition,” in Proc. of the 2016 [30] D. C. Liu and J. Nocedal, “On the limited memory BFGS method for
Conference of the North American Chapter of the Association for Com- large scale optimization,” Mathematical programming, 1989.
putational Linguistics: Human Language Technologies. Association for [31] Forum of Incident Response and Security Teams, “Common Vulnera-
Computational Linguistics, 2016. bility Scoring System,” https://www.first.org/cvss/, accessed: Jan. 2019.
[8] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey, “Deep learning,”
Nature, 2015.