Teamdl at Semeval-2018 Task 8: Cybersecurity Text Analysis Using Convolutional Neural Network and Conditional Random Fields

The document describes a system developed by TeamDL for SemEval-2018 Task 8 on cybersecurity text analysis. The system uses a convolutional neural network for malware sentence classification (subtask 1) and conditional random fields for malware token label prediction (subtask 2). The authors experimented with different word embeddings, feature sets, and achieved competitive performance on both subtasks. Code is available online.

Uploaded by

Manikandan R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views6 pages

Teamdl at Semeval-2018 Task 8: Cybersecurity Text Analysis Using Convolutional Neural Network and Conditional Random Fields

Uploaded by

Manikandan R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

000 050
001 TeamDL at SemEval-2018 Task 8: Cybersecurity Text Analysis using 051
002 Convolutional Neural Network and Conditional Random Fields 052
003 053
004 Manikandan R 1∗, Krishna Madgula2 , Snehanshu Saha1,2 054
1
005 CAMMS, Dept of CSE ,PESIT-Bangalore South Campus 055
2
006 PESIT-Bangalore South Campus 056
007 [email protected] 057
008 [email protected] 058
009 [email protected] 059
010 060
011 061
Abstract of deep learning approaches (Zhou et al., 2016;
012 062
Liang and Zhang, 2016; Kim, 2014; Kalchbrenner
013 In this paper we present our participation to 063
et al., 2014; Zhang et al., 2015), Support vector
014 SemEval-2018 Task 8 subtasks 1 & 2 respec- 064
machines, logistic regression (Genkin et al., 2007;
015 tively. We developed Convolution Neural Net- 065
work system for malware sentence classifi-
Jiang et al., 2016) and Tree based approaches
016 (Bouaziz et al., 2014). On the other hand, sub- 066
cation (subtask 1) and Conditional Random
017 task 2 was formulated as sequence tagging prob- 067
Fields system for malware token label pre-
018 diction (subtask 2). We experimented with lem which is addressed till date by CRF (Finkel 068
019 couple of word embedding strategies, fea- et al., 2005; R. et al., 2016, 2017), deep learn- 069
020 ture sets and achieved competitive perfor- ing approaches (Chiu and Nichols, 2016; Ma and 070
021 mance across the two subtasks. Code is made Hovy, 2016; Lample et al., 2016) and SVM (Ekbal 071
available at https://bitbucket.org/
022 and Bandyopadhyay, 2012). 072
vishnumani2009/securenlp
023 In this paper, we describe our system that ad- 073
024 1 Introduction dresses subtasks 1 and 2 involving malware sen- 074
025 tence classification and malware token label pre- 075
026 Cybersecurity risks and malware threats are be- diction. We designed these systems by adapting 076
027 coming common and increasingly dangerous re- various insights from previous works on text clas- 077
028
quiring analysis of large repositories of malware sification and sequence tagging. We submitted a 078
029
related information in realtime to understand its Convolutional Neural Network(CNN) based sys- 079
capabilities and mount an effective defense. The tem based system for subtask 1 and Conditional
030 080
sheer volume of data and its potential applica- Random Field (CRF) based system for subtask 2.
031 081
tions alone have increased traction in recent times The rest of the paper is organized as follows. In
032 082
among NLP researchers. In this line, SemEval section section 2, we discuss datasets and prepro-
033 083
2018 Task-8 offers 4 subtasks addressing text clas- cessing. In section 3, we describe the algorithms
034 084
sification and token, relation and attribute label and features used in the process of model devel-
035 085
prediction in cybersecurity domain using Mal- opment. In section 4, we describe our results and
036 086
wareTextDB (Lim et al., 2017). While subtask 1 some of our findings. Finally in section 5, we con-
037 focuses on predicting sentences relevance to mal- 087
clude with summary and possible implications on
038 ware , subtasks 2, 3 and 4 focus on predicting to- 088
future work.
039 ken, relation and attribute labels for malware text 089
040 from subtask 1. More details about the each of the 2 Dataset and Preprocessing 090
041 subtasks can be found in Lim et al. (2017). 091
042 Concerning subtask 1, which was inherently The MalwareTextDB corpus used for this work 092
043 formulated as a text classification problem very consists of APT reports describing malware re- 093
044 few works are done till date in cybersecurity do- ported information taken from APTnotes1 . We de- 094
main (Lim et al., 2017; Zhang et al., 2016). How- signed an end-to-end pipeline consisting on three
045 095
ever, in general domain the problem of text clas- module which process input text across multiple
046 096
sification is well addressed with extensive usage stages. In stage 1, the input sentence is fed to
047 097
a preprocessing module which pre-processes the
048 ∗
Work performed during weekend part time assistantship 098
1
049 at CAMMS https://github.com/aptnotes/ 099

1
NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

100 Ţoken P̧laceholder 150

C:/ProgramData/Mail/ __PATH__
101 151
www.ducklink.com/ __URL__
102 [email protected] __EMAILID__ 152
103 profapi.dll __EXE__ 153
"epsilon" __SPECIAL__
104 154
105 Table 1: Tokens and placeholders used in stage 1 155
106 156
107 157
text for stage 2 where the sentence are subject to
108 classification and finally stage 3 sequence tags the 158
109 tokens of the input sentence. We used following 159
110 preprocessing steps in stage 1. 160
111 161
112 1. All the words are lower-cased. 162
113 163
2. All the words that can be grouped under com-
114 164
mon category were replaced by a category
115 165
placeholder as shown in table 1.
116 166
117 We used following opensource tools 1) Stanford 167
118 Core-NLP (Manning et al., 2014) 2) Keras (Chol- 168
119 let et al., 2015) 3) CNTK (Seide and Agarwal, 169
120 2016) 4) Gensim (Řehůřek and Sojka, 2010) 5) 170
121 NLTK for preprocessing (Loper and Bird, 2002) Figure 1: CNN Architecture 171
122 6) Scikit-learn (Pedregosa et al., 2011) for grid 172
123 search 7) Glove (Pennington et al., 2014). 2-6 grams of input malware text. Each channel is 173
124 comprised of the following elements: 174
3 Model
125 175
126 In this section, we explain the algorithms and 1. Input layer that defines the length of input se- 176
127 hyperparameters used for system development. quences. 177
128 More specifically, in section 3.1 we explain our 178
2. Embedding layer set to the size of the vocab-
129 CNN architecture for subtask 1 and in section 3.2 179
ulary and 100-dimensional real-valued repre-
130 we show our CRF architecture for subtask 2. 180
sentations.
131 181
3.1 Algorithm - Subtask 1
132 3. One-dimensional convolutional layer with 182
133 For subtask 1, we focused more towards deep 128 filters and a kernel size set to the num- 183
134 learning. Previous works (Yin et al., 2017) sug- ber of words to read at once. 184
135 gests that both Convolutional Neural Network 185
136
(CNN) and Recurrent Neural Network (RNN) ar- 4. Channel wise Pooling layer with pool size of 186
137
chitectures has been successfully applied for vari- 5 to consolidate the output from the convolu- 187
ous instances of text classification analysis at var- tional layer.
138 188
ious level. With most of recent works (Zhang and
139 189
Wallace, 2017) showing success of CNN, we de- Following CNN, we use a Fully Connected
140 190
veloped a CNN architecture based on work of Kim Neural Network (FCNN) to transfer the the con-
141 catenated feature map (600 dimension) to a prob- 191
(2014). The architecture developed in this work is
142 ability distribution over the two class labels. The 192
as shown in figure 1.
143 number of layers in FCNN is set to be 2. The first 193
144 3.1.1 Convolutional Neural Network layer uses 128 units with a tanh activation func- 194
145 Our CNN architecture was derived from original tion. The second layer produces the classifica- 195
146 works of Kim (2014) by using grid search over in- tion probability distribution over 2 units combined 196
147 put channel size, number of convolution layers and with a softmax activation function. 197
148 number of filters. We use a multichannel model ar- Further to handle overfitting we use regular- 198
149 chitecture with five input channels for processing ization via dropout (Srivastava et al., 2014) with 199

2
NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

200 F̧eature V̧alue Word2vec Glove 250

Sentence pad length 1000
201
Dimensions of wordvectors 100
P R F P R F 251
202 Number of CNN layers 8 test17 0.47 0.77 0.58 0.48 0.78 0.47 252
203 Dimension of CNN layers 1 dev18 0.18 0.32 0.23 0.35 0.80 0.18 253
Number of CNN filters 128
204 Activation function relu test18 0.24 0.34 0.28 0.38 0.72 0.50 254
205 Initialization function Xavier 255
Number of FC layers 2 Table 3: Results of subtask 1 on native Embeddings
206 Dimension of 1st FC layers 128 256
207 Dimension of 2nd FC layers 2 Word2vec-Task Glove-Task 257
208
Activation of Final layer Softmax 258
Optimizer Adam P R F P R F
209 Batch size 32 test17 0.28 0.50 0.36 0.43 0.71 0.54 259
210 Max Epoch 10 dev18 0.18 0.30 0.22 0.33 0.72 0.45 260
Loss function Cross Entropy
211 test18 0.20 0.30 0.24 0.38 0.67 0.48 261
212 Table 2: Hyper parameters of CNN 262
Table 4: Results of subtask 1 on task specific embed-
213 263
dings
214 threshold of 0.25. Additionally, we also apply cost 264
215 sensitive learning (Zhou and Liu, 2006) in order 265
216 to balance the effect of the larger negative samples word shape features, word lemma of current, pre- 266
217 present in the training dataset. For each class, we vious and next words, word-tag pair features, POS 267
218 assigned weight proportional to class frequency. tags, prefix and suffixes. 268
219 We implemented the neural network model using Additional features: Based on analysis of cor- 269
220 Keras. We trained our networks using Adam opti- pus, to tackle unknown malware entities we used 270
mizer (Kingma and Ba, 2014). All the hyper pa- a gazette with token that describes malware entity.
221 271
rameters are listed in table 2. These tokens were taken from training corpus and
222 272
internet2 .
223 273
3.1.2 Input Embeddings
224 4 Experiments and Results 274
We experimented with two category of word
225 275
embeddings namely native embeddings and task
226 In this section, we present results for each of the 276
specific embedding using Word2vec (Le and
227 developed systems. The original dataset was split 277
Mikolov, 2014) and Glove (Pennington et al.,
228 into train17, test-173 released at the start of com- 278
2014) algorithms. Characteristics of each of the
229
petition and dev-18, test-18 released during the 279
embedding is as explained below.
230
competition pre-evaluation period for tuning of 280
1. Native Embeddings: All words includ- parameters and final evaluation respectively . We
231 281
ing the unknown ones that are randomly submitted CNN system for subtask 1 and CRF sys-
232 282
initialized use embeddings from original tem for subtask 2. Tables 3-5 show the results of
233 283
Word2vec/Glove models. subtasks 1 and 2 respectively across the datasets.
234 284
Our systems achieve F-score of 0.5 for subtask 1
235 2. Task specific : The embeddings are gener- and 0.25, 0.36 for subtask 2 over strict, relaxed
285
236 ated by training Word2vec/Glove algorithms runs of subtask 2. 286
237 on sentences from MalwareTextDB. 287
238 4.1 Discussion 288
3.2 Algorithm - Subtask 2
239 289
For subtask 2, we developed a Conditional Ran- In previous sections we described the system de-
240 veloped for malware text analysis using which we 290
241
dom Field (CRF) system (Finkel et al., 2005) 291
based on previous works of Lim et al.(2017). achieved competitive performance for subtask 1
242 and subtask 2. 292
243 3.2.1 Conditional Random Fields For subtask 1, we developed a CNN system and 293
244 We used Conditional Random Fields with follow- experimented the same with different embedding 294
245 ing features that is available as part of Stanford strategies as explained in section 3.1.2. Across all 295
246 CoreNLP ToolKit. the subset of datasets, glove embeddings consis- 296
247 Common Features: N-grams of size 6, previ- 2
297
https://www.mcafee.com/threat-intelligence/malware/
248 ous, next tokens and labels, features giving dis- 3
(train/dev/test)-(17/18) is not an official naming conven- 298
249 junctions of words anywhere in the left or right, tion , instead used here for ease of understanding 299

3
NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

300 CRF-Strict CRF-Relaxed gazette features owing to its deterministic nature. 350
301 P R F P R F Hence, we submitted CRF only with common fea- 351
302 test17 0.51 0.26 0.34 0.45 0.36 0.40 tures described in section 3.2.1 for final evalua- 352
303 dev18 0.18 0.25 0.21 0.38 0.22 0.29 tion. With this system we achieved a result of 353
304 test18 0.29 0.23 0.25 0.42 0.30 0.36 0.25 and 0.36 in strict and relaxed evaluation re- 354
305 spectively. Our accuracy is 3.5% (avg) behind the 355
Table 5: Results of subtask 2 on Conditional Random
306
fields
top performing system across the evaluations. We 356
307 identified following sources of errors i) Tagging of 357
308 tokens in sentences containing only actions but not 358
tently outperformed Word2Vec embeddings. This entities - these are sentences with only attackers
309 359
is in line with works of Kim (2014). We ini- actions in line with error from subtask 1 ii) Lack
310 360
tially hypothesized that since ”the context of the of sensitivity to context - some tokens in test doc-
311 361
malware texts are different from normal English ument are given same label from train irrespective
312 362
texts”, task-specific embeddings would improve of context iii) Miss tagging of some of the tokens
313 363
the results of subtask 1. However, we observe with common suffixes. For subtask 2, we exper-
314 364
that task specific embeddings produced lower re- imented with simple CRF architecture with basic
315 365
sults compared to native embeddings. Observa- features, hence we believe further exploration of
316 tions of results revealed high false negative pre- 366
future engineering is needed to reduce context re-
317 dictions of non-malware texts, we believe that this 367
lated errors. As far as addressing rest of the errors,
318 may attributed to limited dataset used for develop- 368
we plan to explore combination of rule based and
319 ing embeddings, unlike native embeddings which 369
deep learning approaches.
320 was created using very large corpus. This results 370
321 also agrees the general observation, that the size 5 Conclusion 371
322 of the training corpus has often a greater impact 372
323 on results than its strict matching with the target In this work, we developed CNN and CRF sys- 373
324 domain(Tourille et al., 2017). tems for malware text classification and token la- 374
325 For subtask 1, we achieved an accuracy of bel prediction, achieving competitive results. For 375
326 0.50 and were 7% behind the top performing sys- subtask 1, we experimented with couple of word 376
327 tems. We identified three different sources of er- embedding strategies and found native glove em- 377
328 rors across the sentences in line with previous bedding to be useful. For subtask 2, we used CRF 378
329 works(Lim et al., 2017) namely misclassification with simple features achieving results closer to top 379
330 of i) Sentences consisting of malware related key- performing system and above the official bench- 380
331
words without implication on actions; ii) Sen- mark. Further, we described various sources of er- 381
332
tences describing attacker actions and addition- rors identified in the due process of analysis. In 382
333
ally we also found iii) misclassification of sen- future, we plan to further improve our system to 383
tences containing specific patterns like presence of show higher performance based on the above ob-
334 384
PATH and EXE . Further, we had initially servations.
335 385
hoped that the multichannel architecture would
336 386
prevent overfitting(Kim, 2014) and thus work bet- Acknowledgments
337 387
ter than the single channel model, especially on
338 388
small datasets like MalwareTextDB. The results, We thank the task organizers for providing ac-
339 389
however, are vice versa and hence further work on cess to MalwareTextDB corpus and organizing the
340 390
regularizing the training process and simpler sin- shared task. Further, we would like to thank vari-
341 gle channel architecture is warranted. 391
ous authors for open sourcing the codes of various
342 For subtask 2, during analysis we found that algorithms used in this work. 392
343 there were multiple malware names which were 393
344 previously unseen and felt only orthographic fea- 394
345 tures would be insufficient. Hence in addition to References 395
346 commonly used features, we also included gazette 396
Ameni Bouaziz, Christel Dartigues-Pallez, Célia
347 features with words that quantify malware entity. da Costa Pereira, Frédéric Precioso, and Patrick
397
348 However, during evaluation on development set Lloret. 2014. Short text classification using seman- 398
349 we found high drop in precision when we used tic random forest. In DaWaK. 399

4
NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

400 Jason P. C. Chiu and Eric Nichols. 2016. Named en- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gram- 450
401 tity recognition with bidirectional lstm-cnns. TACL, fort, Vincent Michel, Bertrand Thirion, Olivier 451
402 4:357–370. Grisel, Mathieu Blondel, Peter Prettenhofer, Ron 452
Weiss, Vincent Dubourg, Jacob VanderPlas, Alexan-
403 François Chollet et al. 2015. Keras. https:// dre Passos, David Cournapeau, Matthieu Brucher, 453
404 github.com/fchollet/keras. Matthieu Perrot, and Edouard Duchesnay. 2011. 454
405 Scikit-learn: Machine learning in python. Journal 455
Asif Ekbal and Sivaji Bandyopadhyay. 2012. Named of Machine Learning Research, 12:2825–2830.
406 entity recognition using support vector machine: A 456
407 language independent approach. Jeffrey Pennington, Richard Socher, and Christo- 457
408 pher D. Manning. 2014. Glove: Global vectors for 458
Jenny Rose Finkel, Trond Grenager, and Christo- word representation. In EMNLP.
409 459
pher D. Manning. 2005. Incorporating non-local
410 information into information extraction systems by Sarath P. R., Manikandan R, and Yoshiki Niwa. 2016. 460
411 gibbs sampling. In ACL. Hitachi at semeval-2016 task 12: A hybrid approach 461
412
for temporal information extraction from clinical 462
Alexander Genkin, David D. Lewis, and David Madi- notes. In SemEval@NAACL-HLT.
413 gan. 2007. Large-scale bayesian logistic regression 463
414 for text categorization. Technometrics, 49:291–304. Sarath P. R., Manikandan R, and Yoshiki Niwa. 2017. 464
415
Hitachi at semeval-2017 task 12: System for tem- 465
Mingyang Jiang, Yanchun Liang, Xiaoyue Feng, Xiao- poral information extraction from clinical notes. In
416 jing Fan, Zhili Pei, Yu Xue, and Renchu Guan. 2016. SemEval@ACL. 466
417 Text classification based on deep belief network and 467
418 softmax regression. Neural Computing and Appli- Radim Řehůřek and Petr Sojka. 2010. Software Frame- 468
cations, pages 1–10. work for Topic Modelling with Large Corpora. In
419 Proceedings of the LREC 2010 Workshop on New 469
420 Nal Kalchbrenner, Edward Grefenstette, and Phil Blun- Challenges for NLP Frameworks, pages 45–50, Val- 470
421 som. 2014. A convolutional neural network for letta, Malta. ELRA. http://is.muni.cz/ 471
modelling sentences. In ACL. publication/884893/en.
422 472
423 Yoon Kim. 2014. Convolutional neural networks for Frank Seide and Amit Agarwal. 2016. Cntk: Mi- 473
424 sentence classification. In EMNLP. crosoft’s open-source deep-learning toolkit. In 474
KDD.
425 475
Diederik P. Kingma and Jimmy Ba. 2014. Adam:
426 A method for stochastic optimization. CoRR, Nitish Srivastava, Geoffrey E. Hinton, Alex 476
427 abs/1412.6980. Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdi- 477
428
nov. 2014. Dropout: a simple way to prevent neural 478
Guillaume Lample, Miguel Ballesteros, Sandeep Sub- networks from overfitting. Journal of Machine
429 ramanian, Kazuya Kawakami, and Chris Dyer. 2016. Learning Research, 15:1929–1958. 479
430 Neural architectures for named entity recognition. 480
431
In HLT-NAACL. Julien Tourille, Olivier Ferret, Xavier Tannier, and 481
Aurélie Névéol. 2017. Limsi-cot at semeval-2017
432 Quoc V. Le and Tomas Mikolov. 2014. Distributed rep- task 12: Neural architecture for temporal infor- 482
433 resentations of sentences and documents. In ICML. mation extraction from clinical narratives. In Se- 483
434 mEval@ACL. 484
Depeng Liang and Yongdong Zhang. 2016. Ac-
435 blstm: Asymmetric convolutional bidirectional Wenpeng Yin, Katharina Kann, Mo Yu, and Hin- 485
436 lstm networks for text classification. CoRR, rich Schütze. 2017. Comparative study of cnn 486
437 abs/1611.01884. and rnn for natural language processing. CoRR, 487
abs/1702.01923.
438 488
Swee Kiat Lim, Aldrian Obaja Muis, Wei Lu, and
439 Ong Chen Hui. 2017. Malwaretextdb: A database Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 489
440 for annotated malware articles. In ACL. 2015. Character-level convolutional networks for 490
text classification. In NIPS.
441 491
Edward Loper and Steven B Bird. 2002. Nltk: The
442 natural language toolkit. CoRR, cs.CL/0205028. Ye Zhang and Byron C. Wallace. 2017. A sensitiv- 492
443 ity analysis of (and practitioners’ guide to) convo- 493
Xuezhe Ma and Eduard H. Hovy. 2016. End-to-end lutional neural networks for sentence classification.
444 494
sequence labeling via bi-directional lstm-cnns-crf. In IJCNLP.
445 CoRR, abs/1603.01354. 495
446 Yunan Zhang, Qingjia Huang, Xinjian Ma, Zeming 496
447
Christopher D. Manning, Mihai Surdeanu, John Bauer, Yang, and Jianguo Jiang. 2016. Using multi- 497
Jenny Rose Finkel, Steven Bethard, and David Mc- features and ensemble learning method for imbal-
448 Closky. 2014. The stanford corenlp natural language anced malware classification. 2016 IEEE Trust- 498
449 processing toolkit. In ACL. com/BigDataSE/ISPA, pages 965–973. 499

5
NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

500 Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, 550
501 Hongyun Bao, and Bo Xu. 2016. Text classification 551
502
improved by integrating bidirectional lstm with two- 552
dimensional max pooling. In COLING.
503 553
504 Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost- 554
505
sensitive neural networks with methods addressing 555
the class imbalance problem. IEEE Transactions on
506 Knowledge and Data Engineering, 18:63–77. 556
507 557
508 558
509 559
510 560
511 561
512 562
513 563
514 564
515 565
516 566
517 567
518 568
519 569
520 570
521 571
522 572
523 573
524 574
525 575
526 576
527 577
528 578
529 579
530 580
531 581
532 582
533 583
534 584
535 585
536 586
537 587
538 588
539 589
540 590
541 591
542 592
543 593
544 594
545 595
546 596
547 597
548 598
549 599

Convolutional Neural Networks For Malware Classification
100% (1)
Convolutional Neural Networks For Malware Classification
100 pages
Nazenin Ahin Tez
No ratings yet
Nazenin Ahin Tez
78 pages
Mini Project
No ratings yet
Mini Project
30 pages
A, Sign Language Detection
No ratings yet
A, Sign Language Detection
32 pages
Chap 7.1 Sequence Analysis Using FFN
No ratings yet
Chap 7.1 Sequence Analysis Using FFN
47 pages
Cyberspace Monitoring Using AI and Graph Theoretic Tools
No ratings yet
Cyberspace Monitoring Using AI and Graph Theoretic Tools
38 pages
End-To-End Text Recognition With Convolutional Neural Networks
No ratings yet
End-To-End Text Recognition With Convolutional Neural Networks
60 pages
Transformer
No ratings yet
Transformer
39 pages
2659 Emmela Venkata Pavan Kalyan Compressed
No ratings yet
2659 Emmela Venkata Pavan Kalyan Compressed
34 pages
CV Lab Manual
No ratings yet
CV Lab Manual
126 pages
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
No ratings yet
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
17 pages
Mal BERTv 2
No ratings yet
Mal BERTv 2
33 pages
Advertisement No 07 2018
No ratings yet
Advertisement No 07 2018
22 pages
Paper 3
No ratings yet
Paper 3
10 pages
CNN Text Classification
No ratings yet
CNN Text Classification
12 pages
An Analysis Method For Interpretability of CNN Text Classification Model
No ratings yet
An Analysis Method For Interpretability of CNN Text Classification Model
14 pages
Convolutional Layer Implementation To Classify Malware in Banking Financial Services Industry
No ratings yet
Convolutional Layer Implementation To Classify Malware in Banking Financial Services Industry
100 pages
A Sensitivity Analysis of Convolutional Neural Networks For Sentence Classification
No ratings yet
A Sensitivity Analysis of Convolutional Neural Networks For Sentence Classification
18 pages
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
No ratings yet
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
66 pages
CNN and RNN Comparative Study For Intrusion Detection System
No ratings yet
CNN and RNN Comparative Study For Intrusion Detection System
12 pages
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
No ratings yet
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
12 pages
Data Mining For Business Analyst Assignment
100% (1)
Data Mining For Business Analyst Assignment
9 pages
Auto-Detection of Programming Code Vulnerabilities With Natural L
No ratings yet
Auto-Detection of Programming Code Vulnerabilities With Natural L
37 pages
CSC 407 Report
No ratings yet
CSC 407 Report
9 pages
1.machine Learning and Its Applications
No ratings yet
1.machine Learning and Its Applications
75 pages
BTP Report
No ratings yet
BTP Report
27 pages
NLP - PBL - Project Report - Draft.02
No ratings yet
NLP - PBL - Project Report - Draft.02
32 pages
Asl
No ratings yet
Asl
34 pages
Task1 23142391
No ratings yet
Task1 23142391
4 pages
Final Finaldoc
No ratings yet
Final Finaldoc
52 pages
AIPT LAB 24-25 MANUAL EXPE 4 To8
No ratings yet
AIPT LAB 24-25 MANUAL EXPE 4 To8
15 pages
RigmaUmesh Finalprojectreport
No ratings yet
RigmaUmesh Finalprojectreport
60 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
APznzaYD23xZzgrNn UY T9fGgJbB0 Kfhgt21x0vaHH4qfIvCmiqGVPY37T19O
No ratings yet
APznzaYD23xZzgrNn UY T9fGgJbB0 Kfhgt21x0vaHH4qfIvCmiqGVPY37T19O
10 pages
Sequential Short-Text Classification With Recurrent and Convolutional Neural Networks
No ratings yet
Sequential Short-Text Classification With Recurrent and Convolutional Neural Networks
6 pages
Individual Report - CA 2 - 20000086
No ratings yet
Individual Report - CA 2 - 20000086
3 pages
Language Model Evaluation in Open-Ended Text Gener
No ratings yet
Language Model Evaluation in Open-Ended Text Gener
70 pages
Automated Neural Image Caption Generator For Visually Impaired People
No ratings yet
Automated Neural Image Caption Generator For Visually Impaired People
6 pages
Intrusion Detection Algorithm Based On Convolutional Neural Network
No ratings yet
Intrusion Detection Algorithm Based On Convolutional Neural Network
5 pages
Ug4 Proj
No ratings yet
Ug4 Proj
44 pages
FPGA Implementation of A Convolutional Neural Network For Wake Up Word Detection - Project Assignment - Ole Martin Skafsa - NTNU
No ratings yet
FPGA Implementation of A Convolutional Neural Network For Wake Up Word Detection - Project Assignment - Ole Martin Skafsa - NTNU
120 pages
Character-Level Convolutional Networks For Text Classification
No ratings yet
Character-Level Convolutional Networks For Text Classification
9 pages
ChatGPT Cybersecurity
No ratings yet
ChatGPT Cybersecurity
24 pages
Research On Web Text Classification Algorithm Based On Improved CNN and SVM
No ratings yet
Research On Web Text Classification Algorithm Based On Improved CNN and SVM
4 pages
System Analysis For Cyber Attack Detection Using Machine Learning 1
No ratings yet
System Analysis For Cyber Attack Detection Using Machine Learning 1
14 pages
Text Classification Reseach Paper
No ratings yet
Text Classification Reseach Paper
4 pages
Aasl
No ratings yet
Aasl
34 pages
American SIGN - LANGUAGE - DETECTION
No ratings yet
American SIGN - LANGUAGE - DETECTION
35 pages
Technovate Poster - Template (AutoRecovered)
No ratings yet
Technovate Poster - Template (AutoRecovered)
1 page
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
UNIT 2 Sequence Labeling-1
No ratings yet
UNIT 2 Sequence Labeling-1
6 pages
Literature Review On Vulnerability Detection Using
No ratings yet
Literature Review On Vulnerability Detection Using
10 pages
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
No ratings yet
Report On Text Classification Using CNN, RNN & HAN - Jatana - Medium
15 pages
Week 1
No ratings yet
Week 1
7 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
How To Prepare The Sar
100% (1)
How To Prepare The Sar
107 pages
A Unified Architecture For Natural Language Processing
No ratings yet
A Unified Architecture For Natural Language Processing
15 pages
Clement Machine Learning Methods For Malware Recognition Based On Semantic Behaviours
No ratings yet
Clement Machine Learning Methods For Malware Recognition Based On Semantic Behaviours
5 pages
RD Sharma Class 8 Maths Chapter 1 Rational Numbers
No ratings yet
RD Sharma Class 8 Maths Chapter 1 Rational Numbers
54 pages
Grand Test-P-II
50% (2)
Grand Test-P-II
18 pages
ME12 - Ch. 1
No ratings yet
ME12 - Ch. 1
13 pages
Shark Tank - Web and Social Media Analytics Case Study
100% (1)
Shark Tank - Web and Social Media Analytics Case Study
9 pages
Unit-1 Data Representation
No ratings yet
Unit-1 Data Representation
57 pages
20171101131130chapter 1 - Measurement in Chemistry
No ratings yet
20171101131130chapter 1 - Measurement in Chemistry
43 pages
Limits, Tolerances, and Fits: Dharm d:/N-Design/Des15-1.pm5
No ratings yet
Limits, Tolerances, and Fits: Dharm d:/N-Design/Des15-1.pm5
12 pages
Rounding To 2dp 2
No ratings yet
Rounding To 2dp 2
2 pages
Btech Ce 3 Sem Engineering Mechanics Kce301 2022
No ratings yet
Btech Ce 3 Sem Engineering Mechanics Kce301 2022
3 pages
Rotational Motion Project
No ratings yet
Rotational Motion Project
18 pages
Pröbsting Mahadik Schuler Hofmann
No ratings yet
Pröbsting Mahadik Schuler Hofmann
16 pages
NCERT Solutions For Class 11 Science Maths Chapter 3 - Trigonometric Functions
No ratings yet
NCERT Solutions For Class 11 Science Maths Chapter 3 - Trigonometric Functions
46 pages
Math2 - Law Q2 Week3 4
No ratings yet
Math2 - Law Q2 Week3 4
8 pages
Stochastic Modeling in Operations Research
No ratings yet
Stochastic Modeling in Operations Research
89 pages
Maclaurin Series From OCR Exam Questions
No ratings yet
Maclaurin Series From OCR Exam Questions
37 pages
Class - 11 Exercise - 1.1
No ratings yet
Class - 11 Exercise - 1.1
135 pages
TrendsBeamforming Thomenius
No ratings yet
TrendsBeamforming Thomenius
113 pages
Pythonintroin Your Cs0160Directory. It Should Contain Two
No ratings yet
Pythonintroin Your Cs0160Directory. It Should Contain Two
36 pages
LECTURE 4 Part 2: Analysis of Statically Determinate Beams: Equation 4-1
No ratings yet
LECTURE 4 Part 2: Analysis of Statically Determinate Beams: Equation 4-1
23 pages
IEEE Formate
No ratings yet
IEEE Formate
5 pages
Maths Ch-1 Real Numbers Test 01
No ratings yet
Maths Ch-1 Real Numbers Test 01
2 pages
Introduction To Binary Student Worksheets
No ratings yet
Introduction To Binary Student Worksheets
12 pages
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
No ratings yet
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
30 pages
An Overview of Thermodynamics-I
No ratings yet
An Overview of Thermodynamics-I
13 pages
Drone Delivery Scheduling Optimization Conside
No ratings yet
Drone Delivery Scheduling Optimization Conside
17 pages
Ge4 - Mathematics in The Modern World Measure of Variation
No ratings yet
Ge4 - Mathematics in The Modern World Measure of Variation
9 pages
Answer All Questions in This Section.: STPM 2016 T2 Ulangan (Maths. T/954/2U) Section A (45 Marks)
No ratings yet
Answer All Questions in This Section.: STPM 2016 T2 Ulangan (Maths. T/954/2U) Section A (45 Marks)
2 pages
Angular Momentum and Rotations: Classical Mechanics Homework
No ratings yet
Angular Momentum and Rotations: Classical Mechanics Homework
2 pages
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
From Everand
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
William Smith
No ratings yet

Teamdl at Semeval-2018 Task 8: Cybersecurity Text Analysis Using Convolutional Neural Network and Conditional Random Fields

Uploaded by

Teamdl at Semeval-2018 Task 8: Cybersecurity Text Analysis Using Convolutional Neural Network and Conditional Random Fields

Uploaded by

NAACL-HLT 2018 Submission ***. Confidential Review Copy. DO NOT DISTRIBUTE.

100 Ţoken P̧laceholder 150

200 F̧eature V̧alue Word2vec Glove 250

You might also like