Rule-Att&Ck Mapper (Ram) : Mapping Siem Rules To Ttps Using Llms
Rule-Att&Ck Mapper (Ram) : Mapping Siem Rules To Ttps Using Llms
Using LLMs
Prasanna N. Wudali Moshe Kravchik Ehud Malul, Parth A. Gandhi,
Ben-Gurion University of the Negev Rafael Advanced Defense Systems Yuval Elovici, Asaf Shabtai
Ben-Gurion University of the Negev
Abstract Each SIEM platform employs its own rule definition language (RDL),
The growing frequency of cyberattacks has heightened the demand a schema-based structure for defining these rules that standardizes
for accurate and efficient threat detection systems. Security infor- the creation and execution of SIEM rules, making them inherently
mation and event management (SIEM) platforms are important for structured data and a foundational component of modern cyber-
arXiv:2502.02337v1 [cs.CR] 4 Feb 2025
analyzing log data and detecting adversarial activities through rule- security operations. Examples of such schemas include the search
based queries, also known as SIEM rules. The efficiency of the threat processing language (SPL) from Splunk, the Lucene query language
analysis process relies heavily on mapping these SIEM rules to the by Elasticsearch, and the Kusto query language (KQL) by Microsoft.
relevant attack techniques in the MITRE ATT&CK framework. Inac- Security alerts are triggered when the execution of SIEM rules
curate annotation of SIEM rules can result in the misinterpretation yields search results. When such alerts are generated, security ana-
of attacks, increasing the likelihood that threats will be overlooked. lysts must examine each alert individually, performing tasks such
Such misinterpretation can expose an organization’s systems and as triage, analysis, and interpretation, and determine whether the
networks to potential damage and security breaches. Existing solu- alert corresponds to an actual attack. A critical aspect of effective
tions for annotating SIEM rules with MITRE ATT&CK technique threat detection and hunting is the precise mapping and understand-
and sub-technique labels have notable limitations: manual annota- ing of the tactics, techniques, and procedures (TTPs) employed by
tion of SIEM rules is both time-consuming and prone to errors, and adversaries, as defined in the MITRE ATT&CK framework.1 In-
machine learning-based approaches mainly focus on annotating un- corporating MITRE ATT&CK techniques in the analysis provides
structured free text sources (e.g., threat intelligence reports) rather valuable insights, enabling analysts to discern potential attack flows.
than structured data like SIEM rules. Structured data often contains Such mapping enhances security professionals’ ability to anticipate
limited information, further complicating the annotation process and mitigate the strategies employed by cyber adversaries.
and making it a challenging task. To address these challenges, we Mapping SIEM rules to specific MITRE ATT&CK techniques is
propose Rule-ATT&CK Mapper (RAM), a novel framework that a complex manual process that is prone to errors and can be time-
leverages large language models (LLMs) to automate the mapping consuming. Cybero, a leading cybersecurity company, reported [8]
of structured SIEM rules to MITRE ATT&CK techniques. RAM’s that "organizations collect sufficient log data to potentially detect 94%
multi-stage pipeline, which was inspired by the prompt chaining of techniques outlined in the MITRE ATT&CK framework; however,
technique, enhances mapping accuracy without requiring LLM pre- only 24% of these techniques are effectively covered due to gaps in
training or fine-tuning. Using the Splunk Security Content dataset, detection rules, with an additional 12% of SIEM rules rendered non-
we evaluate RAM’s performance using several LLMs, including functional or misconfigured." In its best practices guide [7] to MITRE
GPT-4-Turbo, Qwen, IBM Granite, and Mistral. Our evaluation high- ATT&CK mapping, CISA, an American cyber defense agency, listed
lights GPT-4-Turbo’s superior performance, which derives from (i) leaping to conclusions (i.e., prematurely deciding on a mapping
its enriched knowledge base, and an ablation study emphasizes based on insufficient evidence or examination of the facts), (ii)
the importance of external contextual knowledge in overcoming missing opportunities (i.e., not considering, being unaware of, or
the limitations of LLMs’ implicit knowledge for domain-specific overlooking other potential technique mappings based on implied
tasks. These findings demonstrate RAM’s potential in automating or unclear information), and (iii) miscategorization (i.e., the selec-
cybersecurity workflows and provide valuable insights for future tion of incorrect techniques due to misinterpreting, misreading,
advancements in this field. or inadequately understanding the techniques, specifically the dif-
ference between two techniques) as common mistakes committed
Keywords by security analysts when manually performing the mapping task.
Given the above, there is a need to automate the mapping process
SIEM rules, LLMs, MITRE ATT&CK
and thereby reduce the workload on security analysts and increase
1 Introduction the speed and accuracy of threat detection.
Recent cybersecurity research has explored various techniques
The rapid advancement of technology and widespread adoption for mapping unstructured data from cyber threat intelligence (CTI)
of digital applications have resulted in a significant increase in reports to the MITRE ATT&CK framework [3, 4, 19, 22, 30]. While
cyberattacks [5]. To gain visibility into their digital ecosystems, these methods have demonstrated effectiveness in handling un-
organizations deploy security information and event management structured data, they have a limited ability to adapt to structured
(SIEM) systems in their networks. These systems store and analyze data use cases, such as intrusion detection system and SIEM rules.
log data generated by various digital entities in the network [11]. Also, these methods use supervised learning-based approaches to
SIEM systems enable threat detection by allowing users to exe-
cute search queries, referred to as rules, on the ingested log data. 1 https://attack.mitre.org/
classify structured data (i.e., intrusion detection system and SIEM logs or the mitigation strategy being applied upon which the rule
rules) to MITRE ATT&CK technique classes, which require retrain- operates. This natural language representation, along with the data
ing when new threats emerge. Their reliance on retraining limits source or mitigation-related information, serves as input to another
their scalability and efficiency in dynamic threat landscapes. Măr- LLM that maps the rule in question to probable MITRE ATT&CK
mureanu et al. [20] proposed a method to map structured data, techniques. In the final stage, the pipeline refines the mapping and
specifically Splunk rules, to the MITRE ATT&CK framework. This provides reasoning by extracting the most relevant techniques from
approach utilizes a BERT model trained as a classifier to categorize the list of potential matches, facilitating precise alignment of the
Splunk rules into 14 high-level MITRE ATT&CK tactic classes. How- rule with the MITRE ATT&CK framework.
ever, this method shares the same limitations as other supervised We conducted a comprehensive series of experiments to eval-
learning approaches discussed earlier, particularly the need for re- uate RAM’s ability to map SIEM rules to the MITRE ATT&CK
training with updated data to address new threats. Furthermore, framework. The evaluation focused on common metrics such as
the task of mapping rules to high-level tactics is comparatively precision and recall, which are indicators of the method’s accuracy
easier than mapping them to MITRE ATT&CK techniques and sub- and completeness in correctly classifying the SIEM rules to relevant
techniques, which involve around 670 distinct classes and present techniques within the framework. Various LLMs were examined,
a much greater challenge. Despite focusing on this simplified task, including Qwen, IBM Granite, Mistral, and GPT-4-Turbo, and we
the method failed to achieve high performance in their evaluation, evaluated RAM’s effectiveness when each LLM was employed in
due to its inherent limitations. In a recent study, Fayyazi et al. [12] the pipeline. We used the threat detection rules published in the
employed large language models (LLMs) to map CTIs in the form Splunk Security content dataset2 in our experiments; to ensure that
of unstructured text to MITRE ATT&CK techniques, while Nir et the rules were not already known to the LLM, we carefully selected
al. [9] employed them to map Snort intrusion detection rules to rules for the dataset based on their creation or modification dates.
MITRE ATT&CK techniques. Specifically, we included only those rules with dates later than the
These investigations highlight the potential of LLMs in cyber- knowledge cut-off date of the LLMs utilized in our experiments.
security tasks but also underscore their limitations. Solely relying Using various configuration settings, we aimed to identify the
on the implicit knowledge of LLMs has proven insufficient for ad- optimal strategies for maximizing the performance of these mod-
dressing the domain-specific requirements of cybersecurity. This els. Our study not only demonstrates the potential of LLMs in
gap highlights the need for more adaptable and scalable method- automating threat analysis but also provides insights into the most
ologies tailored to the dynamic nature of cyber threats. To produce effective configurations for deploying these models in real-world cy-
accurate and reliable predictions, they require additional contextual bersecurity environments. This study is among the first to explore
information that is not inherently available to the LLM. leveraging LLMs to map structured data to the MITRE ATT&CK
To address these shortcomings, we propose RAM, a novel LLM- framework, and our results, which highlight RAM’s potential, leave
based framework for analyzing SIEM rules and recommending room for further refinement in future research. We also provide
relevant MITRE ATT&CK techniques. RAM eliminates dependence valuable insights regarding the challenges encountered during this
on training data, utilizes LLM agents to retrieve supplementary study, which can guide subsequent advancements in this domain
contextual information, and transforms structured rule into un- for example, the lack of a completely labeled SIEM rules dataset.
structured natural language to preserve the syntactic and semantic The main contributions of this paper are as follows:
meaning of the rule. This innovative approach ensures reliable and • We demonstrate the feasibility of using LLMs to automate the
accurate predictions while overcoming the limitations of existing mapping of SIEM rules to MITRE ATT&CK techniques and pro-
methods. vide reasoning, which could significantly enhance the capabilities
LLMs, with their advanced natural language processing (NLP) of current cybersecurity tools.
capabilities, can process and analyze structured data, automatically • We propose an AI agent-based framework that utilizes both
identify patterns, and understand the syntactic meaning of the implicit and explicit knowledge in automating the mapping of
data, but they often fall short in understanding the semantic mean- structured SIEM rules to MITRE ATT&CK techniques.
ing of the data. This study leverages LLMs to autonomously map • We demonstrate the effective utilization of LLMs without the
structured data in the form of SIEM rules to MITRE ATT&CK tech- need for pretraining or fine-tuning, thereby eliminating the need
niques, enabling the automation of cybersecurity threat detection for any training data.
and investigation. • We provide a practical guide for deploying LLMs in cybersecurity,
RAM is a multi-stage AI agent pipeline (see Figure 1) inspired by by identifying the optimal configurations for these models.
the prompt chaining technique [27] and designed to enhance the • We present valuable insights regarding the challenges encoun-
understanding and application of SIEM rules. The pipeline begins tered during the experimentation process, providing increased
with the extraction of indicators of compromise (IoCs) from the understanding of the obstacles and considerations that shaped
rule (e.g., process names, file names, registry keys and values, IP our research and findings.
addresses, network ports). Then, a web search LLM agent retrieves
additional contextual information related to the IoCs identified in
the rule. Leveraging the information gathered in the preceding
stages, the next AI agent translates the rule into natural language
text, providing a comprehensive description. This textual descrip-
tion is then used by an LLM to identify the data source [15] of the 2 https://github.com/splunk/security_content/tree/develop/detections
Rule-ATT&CK Mapper (RAM): Mapping SIEM Rules to TTPs Using LLMs
neural network), employs a two-step classification process: first, LLM predictions. Mărmureanu et al. [20] proposed method to map
classifying the unstructured text into MITRE ATT&CK tactics and structured data in the form of Splunk rules to MITRE ATT&CK
then further classifying it into MITRE ATT&CK techniques. In other framework. The authors proposed the use of ML classifiers to map
work, Alves et al. [4] explored the application of BERT models the Splunk rules to tactics specified in MITRE ATT&CK framework.
for the classification of unstructured text into MITRE ATT&CK A significant limitation of these methods is their dependence on su-
techniques. The study uses eleven different BERT models to map pervised learning approaches to train the machine learning models
unstructured texts to the MITRE ATT&CK framework, aiming to within their frameworks. These models are unable to dynamically
enhance automation in cyber threat intelligence. adapt to evolving threat landscapes or newly introduced MITRE
Similarly, Alam et al. [3] proposed LADDER, a framework de- ATT&CK techniques without undergoing retraining with updated
signed to enhance cybersecurity by automatically extracting attack datasets. This retraining process is not only time-consuming but
patterns from CTI sources. LADDER uses different BERT-based also resource-intensive, substantially restricting the methods’ abil-
models for extracting attack patterns from unstructured texts and ity to keep pace with the rapid evolution of cyber threats.
then mapping these patterns to MITRE ATT&CK framework. Rani In summary, the limitations of prior studies can be categorized
et al. [22] proposed TTPXHunter, a method designed for the au- as follows: (1) reliance on supervised learning tasks, (2) inability
tomated mapping of attack patterns extracted from cyber threat to adapt to structured texts, and (3) dependence on additional con-
reports to MITRE ATT&CK framework. This method is an exten- textual information to effectively interpret rules. To address these
sion of TTPHunter [21], improving its ability to cover a broader shortcomings, we propose RAM, a novel LLM-based approach that
range of techniques from MITRE ATT&CK framework and preci- eliminates the need for training data, coherently processes struc-
sion with the help of a cyber domain-specific language model called tured text into natural language, and employs LLM agents to re-
SecureBERT. Sentences are transformed into embeddings using Se- trieve supplementary contextual information, enabling reliable and
cureBERT and then sent to a linear classifier for TTP prediction. accurate predictions.
Fayyazi et al. [12] evaluated how well LLMs, specifically encoder-
only (e.g., RoBERTa) and decoder-only (e.g., GPT-3.5) models, can
summarize and map cyberattack procedures to the appropriate 4 Methodology
ATT&CK tactics. The authors compared various mapping approaches. Each SIEM system uses its own RDL to define threat detection
Howerver, they focused on mapping cyberattack procedures to rules, and each RDL has its own schema. For example, the Splunk
MITRE ATT&CK tactics (which represent higher-level categoriza- SIEM uses the SPL to define its threat detection rules. The task of
tions in MITRE ATT&CK framework). While this is useful, it is com- understanding threat detection rules and recommending relevant
paratively easier than mapping cyberattack procedures to MITRE MITRE ATT&CK techniques (or sub-techniques) requires complex
ATT&CK techniques and sub-techniques, which are more granular reasoning skills. In the case of LLMs, this can be achieved with a
and detailed, offering deeper insights into the specific actions and technique called prompt chaining in which each task is divided into
methods employed in an attack. multiple sub-tasks in order to understand the complex reasoning
Fengrui et al. [14] introduced a method combining data augmen- behind the task. Therefore, we employ a multi-phase architecture
tation and instruction supervised fine-tuning using LLMs to classify based on prompt chaining that leverages the power of LLMs to
TTPs effectively in scenarios with limited data. Similarly, Zhang take a SIEM rule defined in any RDL and map it to relevant MITRE
et al. [30] introduced a novel framework for constructing attack ATT&CK techniques using the power of LLMs. Our approach is
knowledge graphs (KGs) from CTI reports, by leveraging LLMs. based on the following intuitions:
While these methods demonstrate remarkable progress in han- • LLMs’ implicit knowledge: LLMs possess deep understanding of
dling unstructured data, their applicability to structured data use diverse RDLs. This enables them to interpret any rule, regardless
cases, such as mapping SIEM rules, is limited. These approaches of the RDL it is defined in, and convert it into comprehensible
are specifically designed for unstructured data, where relationships natural language text.
between entities and contextual information are often explicitly • LLMs’ similarity comparison capability: LLMs are adept at analyz-
defined, simplifying the mapping process to the MITRE ATT&CK ing and comparing textual descriptions. They can intelligently
framework. Adapting these methods to structured formats like assess the similarity between two textual inputs to establish a
SIEM rules would require extensive modifications, reducing their meaningful connection.
effectiveness and suitability for such scenarios. RAM has two main phases: (1) the rule to text translation phase,
To the best of our knowledge, only two studies have specifi- and (2) the MITRE ATT&CK techniques recommendation phase.
cally focused on mapping structured data, such as IDS rules and These two phases in the pipeline include six key steps to determine
SIEM rules, to the relevant MITRE ATT&CK technique (or sub- relevant TTPs, as illustrated in Figure 1.
techniques) using language models. Although LLMs excel at translating SIEM rules into natural lan-
Nir et al. [9] investigated the integration of LLMs, specifically guage, they often lack critical domain-specific contextual informa-
ChatGPT, into cybersecurity workflows to automate the associ- tion related to IoCs in the rules. To overcome this limitation, the
ation of network intrusion detection system (NIDS) rules with rule to text translation phase includes three steps: IoC extraction,
corresponding MITRE ATT&CK techniques. While their method contextual information retrieval, and natural language translation.
represents one of the first applications of LLMs for this purpose, The workflow begins with the extraction of IoCs from the rules
their findings highlight the necessity of incorporating additional (for example, processes, log source, event codes, and file names) that
contextual information to enhance the accuracy and reliability of the rule searches for in the logs (step (1)).In the next sstep a web
Table 1: Summary of previous work.
Our method Techniques SIEM rules (structured) Use prompt engineering techniques and Coverage of all technique level classes. Achieved a recall of
(RAM) implement LLM agents to enhance LLM 0.75 on the test rules.
performance
search agent performs the task of obtaining additional contextual direct extraction of IoCs from the rules without requiring extensive
information about the IoCs discovered ((step 2)). By incorporating pre-training on specific datasets.
this additional domain-specific information, the pipeline enhances The result of this stage is a dictionary structure, where:
the language translation, resulting in a more accurate and mean- • Keys represent types of IoC, such as processes, files, IP addresses,
ingful interpretation of SIEM rules. The rule itself and the IoCs’ and log sources.
contextual additional information from the previous stage are then • Values are lists containing specific IoC details, such as process
used to translate the rule from RDL to natural language (step (3)). names, file names, IP addresses, and log source identifiers.
The MITRE ATT&CK techniques recommendation phase of the In the example depicted in Figure 2(a), the pipeline processes
pipeline includes the following three steps. The rule in processed in a rule for which relevant MITRE ATT&CK techniques need to
data source identification step in which the probable origin of the be recommended. The IoC extractor LLM produces a dictionary
data is identified (step (4)). The description of the rule is then used structure as output, organizing the IoCs in a structured format to
to determine probable MITRE ATT&CK techniques based on the support subsequent stages in the analysis pipeline.
implicit knowledge of the LLM (step (5)). Finally, using chain-of-
thought [25] prompting, the most relevant techniques are extracted 4.2 Contextual Information Retrieval
from the list of probable techniques (step (6)). Each step of our In this step, an LLM agent is employed to retrieve relevant infor-
method is further described in detail below. mation pertaining to the IoCs extracted from the rule. A REACT
agent [1] was used in this case to generate both reasoning traces
and task-specific actions in an interleaved manner. REACT agents
4.1 IoC Extraction interact with external tools to retrieve additional information that
The context associated with a SIEM detection rule is crucial for leads to more factual and reliable responses. The LLM agent con-
its accurate interpretation and effective application. Obtaining this ducts a systematic search across web resources to gather additional
contextual understanding requires comprehensive analysis of the contextual information for each IoC value present in the rule. This
embedded IoCs in the SIEM rule. In the first step, RAM system- step addresses LLMS’ lack of up-to-date knowledge or specialized
atically identifies and extracts all IoCs, identifying the types of domain expertise (which is critical to understanding the role and
IoCs and their corresponding values that form the foundational significance of the IoCs in the rule), without the need for retraining
elements of the detection rules. Leveraging the LLM’s inherent or fine-tuning. Figure 2(b) presents an example in which the rule
understanding of rule structures and IoCs, we employ a zero-shot includes the process name soaphound.exe as an IoC. As can be
prompting approach for this task. Zero-shot prompting enables the seen, the web search results indicate that soaphound.exe is being
Rule-ATT&CK Mapper (RAM): Mapping SIEM Rules to TTPs Using LLMs
used for active directory (AD) enumeration, which is important for 4.4 Data Source or Mitigation Identification
the understanding of the attack. Identifying the most relevant data component or mitigation asso-
ciated with the rule description in this step is critical for filtering
4.3 Natural Language Translation out irrelevant MITRE ATT&CK techniques (or sub-techniques) in
subsequent steps of the pipeline. In the MITRE ATT&CK frame-
The translation of detection rules into natural language textual
work, data sources represent various categories of information that
descriptions fulfills three key objectives:
can be gathered from sensors or logs. These data sources include
(1) Ensures that RAM is format-agnostic: It converts rules
data components, which are specific attributes or properties within
defined in various RDL formats into a generic, unstructured
a data source that are directly relevant to detecting a particular
text format, ensuring compatibility with different SIEM systems,
technique or sub-technique . For example, in the context of the
regardless of the specific rule format.
rule described in Figure 2(a), the term Endpoint.Processes indi-
(2) Provides contextual explanation: It includes all relevant con-
cates that the activity is happening on an endpoint. Presence of
textual information to produce a concise and comprehensible
the terms such as, soaphound.exe, –buildcache, –certdump and
explanation of the rule.
etc. indicate that the rule searches for command line execution
(3) Enhances the comprehension for LLMs: It enables LLMs
of an executable named soaphound.exe with specific parameters.
to more effectively compare the translated rule with descrip-
Therefore, the appropriate data source in this example is Com-
tions in the MITRE ATT&CK framework by providing a unified
mand, with the corresponding data component being Command
textual representation.
Execution. Additionally, mitigations are defined as categories of
To achieve these objectives, a zero-shot prompting technique is
technologies or strategies that can prevent or reduce the impact
employed. The input to the LLM comprises two components:
of specific techniques or sub-techniques. The MITRE ATT&CK
• Syntactical information: The rule itself, providing the framework explicitly establishes relationships between data com-
structural and operational details. ponents, mitigations, and techniques (or sub-techniques), enabling
• Contextual information: Details of the IoCs extracted a systematic approach for identifying relevant elements.
from the rule, providing semantic insights into the rule’s To identify the most relevant data component or mitigation
intent and function. associated with a given rule description, we utilize agentic retrieval
augmented generation (RAG), which incorporates an AI Agent-
The LLM utilizes these inputs to generate a natural language textual based implementation of the RAG framework. Data from the MITRE
description of the rule. This transformation not only ensures a ATT&CK framework, specifically related to data components and
more interpretable representation but also facilitates further steps mitigations, is stored in a vector database (e.g., ChromaDB). The
of analysis and comparison, particularly in aligning the rule with process begins with the rule description from the previous stage,
MITRE ATT&CK techniques and sub-techniques.
which serves as the input to the AI Agent. The LLM-powered agent The chain-of-thought (CoT) rationale generated during the com-
automatically generates a search query tailored to retrieve relevant parison of each rule to its probable technique is also provided as an
information from the RAG database. output in this step. This rationale offers a detailed natural language
For each query, the system retrieves the five most similar docu- explanation, articulating why a particular technique is relevant to
ments from the database, each containing contextual information the given rule. Such explanations are highly valuable for security
about data components or mitigations. These documents are then analysts, as they provide clear and transparent reasoning behind
utilized by the LLM agent to contextualize the rule description. By the mapping, enabling analysts to better understand and validate
comparing the content of these retrieved documents with the rule the association between the rule and the technique. Other classi-
description, the LLM agent determines and outputs the most rele- fication models proposed in previous works within this domain
vant data component or mitigation along with a chain-of-thought also suffer from the limitation of being black-box models, which
as to why the data component or mitigation is related to the rule. lack the ability to provide clear reasoning or explanations. Unlike
RAM, these models fail to generate transparent, CoT rationales that
4.5 Probable Technique Recommendation explain why a particular rule is mapped to a specific technique,
making them less interpretable and less useful for security analysts.
In this step, an LM agent is utilized to propose probable MITRE
ATT&CK techniques (and sub-techniques) that may be relevant to
5 Evaluation
the description of the provided rule. We used a REACT agent in this
step as well to utilize both implicit and explicit knowledge during 5.1 Dataset
reasoning. For explicit knowledge, the agent searches the MITRE The Splunk Security Content dataset is a comprehensive repository
ATT&CK framework to obtain the list of probable techniques (and of security resources dedicated to improving the detection and
sub-techniques). The natural language description of the rule from mitigation of threats, maintained by Splunk Inc.’s research team.
the previous step serves as input to the LLM agent. The output of This dataset features over 1,600 analytic rules, all written in Splunk’s
this stage consists of a list of JSON objects, each containing the Search Processing Language (SPL), to effectively identify malicious
MITRE technique ID, technique name, and technique description behaviors.
as seen in Figure 2(c). These analytic rules are systematically organized into distinct
Throughout our experiments, we observed that as the number domains such as endpoint, network, application, etc., based on their
of recommendations (𝑘) increases, both the framework’s average intended scope of applicability. For instance, the endpoint domain
recall and precision initially improve, however beyond a certain comprises rules specifically crafted to detect malicious activities
threshold of 𝑘, the precision begins to decline. Based on these occurring on endpoint devices. Moreover, each rule is annotated
observations(please refer Table 5), we selected a 𝑘-value of 11 to with corresponding MITRE ATT&CK technique identifiers, thereby
ensure a high recall. offering a ground-truth framework for our experiments. We chose
the endpoint domain as it presents a higher diversity of MITRE
4.6 Relevant Technique Extraction ATT&CK technique labels than other domains in the dataset.
In this step, RAM refines the set of probable MITRE ATT&CK To ensure experimental integrity, we carefully evaluated the
techniques identified in the previous stage by eliminating irrelevant knowledge cut-off dates of the models utilized. Among the hosted
entries. This step in the pipeline serves two primary purposes: (1) models, GPT-4-Turbo had the most recent cut-off date of December
to enhance precision while maintaining recall achieved in previous 2023,3 while Granite, the latest local model, was released in October
step, and (2) to provide a clear rationale for the selection of the 2024. To prevent any potential data leakage, we exclusively analyzed
labels, ensuring transparency and interpretability of the mapping rules from the Splunk Security Content dataset that were created
process. This refinement process is grounded in the assumption or modified on or after November 2024. During our review of the
that LLMs are effective for text similarity matching tasks. Splunk Security Content repository in December 2024, we identified
The process comprises two key steps: 360 rules within the endpoint domain that met our criteria. None
that none of the hosted or local models employed in our experiments
• Rule-technique comparison: The description of each tech- were capable of inherently performing web searches since they were
nique in the set of probable techniques is compared with used via an API [6].
the rule description. A chain-of-thought technique is then
applied to elucidate the reasoning behind the association 5.2 Evaluation Setup
of each technique with the rule. We implemented our method RAM in Python 3.9, leveraging mul-
• Confidence calculation: The generated chain-of-thought ra- tiple libraries and frameworks for model interaction. We utilized
tionale for each technique (or sub-technique) is compared the transformers library [26] for local models and for hosted mod-
with the rule description to compute a relevance (or confi- els, we employed the LangChain and LangGraph frameworks. The
dence) score, as done in prior work [16]. hosted models used in our experiments are GPT-4 Turbo, GPT-4o,
Techniques with higher confidence scores are deemed more rel- and GPT-4o-mini, whereas the local models are Mistral-7B, IBM
evant to the rule. Conversely, techniques with scores falling below Granite-3.0-8B, and Qwen-2.5-7B. Table 2 provides a summary of
a predefined threshold are excluded. The techniques retained after all the models used in our experiments.
this filtering step represent the most relevant techniques corre- 3 https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=
sponding to the given rule’s description. global-standard%2Cstandard-chat-completions#gpt-4-and-gpt-4-turbo-models
Rule-ATT&CK Mapper (RAM): Mapping SIEM Rules to TTPs Using LLMs
Model Type Model Name Parameter Size Context Window Max Tokens Knowledge Cut-off
GPT-4-Turbo ∼1.8 Trillion 128,000 4096 December 2023
Hosted GPT-4o ∼2 Trillion 128,000 16,384 October 2023
GPT-4o-mini ∼8 Billion 128,000 16,384 October 2023
IBM Granite ∼8 Billion 128,000 8,192 Not available
Local Mistral ∼7 Billion 32,000 4,096 Not available
Qwen ∼3 Billion 128,000 8,192 Not available
compute the AP, we first calculated the precision for each sample The context window of an LLM refers to the maximum amount of
in the dataset, and then, the AP was determined by averaging the text, measured in tokens, that the model can process at a time. This
precision values across all samples in the dataset. parameter plays a crucial role in determining the model’s ability to
In addition, the task of mapping a SIEM rule to the MITRE handle tasks requiring extensive context. In the case of RAM, the
ATT&CK framework is inherently a multi-class classification prob- performance using the Mistral model was poorer compared to other
lem, where the number of techniques associated with each sample models. This can be attributed to the significantly smaller context
varies. Appendix A provides an analysis of the distribution of sam- window of the Mistral model, which limited its ability to process
ples based on the number of techniques they contain. Therefore, we and utilize the full breadth of information required for effective
also compute the Weighted Average Precision (WAP) and Weighted recommendations. In contrast, GPT-4o-mini, a similar model with
Average Recall (WAR) metrics. These metrics apply weights based a larger context window, performed better as it could handle more
on the relative number of techniques for each sample. comprehensive inputs, leading to improved accuracy and reliability
in mapping SIEM rules to MITRE ATT&CK techniques
5.5 Results
Table 3: RAM’s performance using various hosted and local
Comparison of RAM with baselines. We compared RAM with
models.
the baseline methods described in the previous section, and the
results of this comparison are presented in Table 3. As evident from
the table, RAM outperformed the baseline methods when utilizing Model Type Model Name AR (WAR) AP (WAP)
GPT-4-Turbo and GPT-4o as the underlying models. These findings GPT-4-Turbo 0.75 (0.724) 0.52 (0.51)
underscore the critical role of both implicit and explicit domain- Hosted GPT-4o 0.71 (0.69) 0.49 (0.47)
specific knowledge in enabling LLMs to deliver optimal results for GPT-4o-mini 0.38 (0.36) 0.29 (0.275)
the task of mapping SIEM rules to the MITRE ATT&CK framework. IBM Granite 0.34 (0.31) 0.28 (0.24)
Local Mistral 0.12 (0.09) 0.08 (0.05)
RAM’s performance with different LLMs. RAM’s pipeline Qwen 0.23 (0.19) 0.16 (0.15)
was executed using several LLMs to evaluate its ability to recom- Zero-shot LLM 0.46 (0.43) 0.31 (0.30)
mend MITRE ATT&CK techniques (or sub-techniques) for a given BERT 0.68 (0.65) 0.39 (0.38)
Baselines
SIEM rule. As can be seen in Table 3, GPT-4-Turbo demonstrated CodeBERT 0.65 (0.63) 0.47 (0.45)
superior performance, delivering the most accurate and relevant TTPxHunter [22] 0.59 (0.56) 0.42 (0.41)
recommendations when compared to other hosted and local models.
These results provided insights into how the model configuration,
including model size and architectural differences, influence the Ablation study. In order to analyze the importance of the dif-
quality of recommendations generated by RAM. ferent components and steps in RAM, we performed an ablation
study, mainly focusing on the impact of language translation and
Effect of model’s size and context window length on per- additional contextual information on the recommendations made
formance. As can be seen in Table 3, RAM’s performance is sig- by the LLM. We performed the ablation study by running the ex-
nificantly influenced by the size of the LLMs used in its pipeline periment in three different scenario. In the first scenario, the rule
(see Table 2 for different models used and their sizes and context was provided as-is to the next stage of the pipeline; in this case,
window lengths used). Larger models, such as GPT-4-Turbo and RAM achieved an AR of approximately 0.46 and an AP of around
GPT-4o, demonstrated superior performance compared to their 0.39. In the second scenario, the rule was first translated into a
smaller counterparts like GPT-4o-mini or local models with fewer natural language description, without adding any contextual infor-
parameters. This performance disparity highlights the ability of mation, before being passed to subsequent stages of the pipeline.
larger models to better understand complex relationships and pat- This improved the AR to around 0.54 and the AP to approximately
terns in the input data. The large number of parameters and greater 0.42. Finally, in the third scenario, the rule was translated into a
context window allow them to capture nuanced information that natural language description enriched with contextual information
smaller models might overlook, leading to more accurate and reli- before being processed by the later stages of the pipeline. This
able recommendations. setup yielded the best results, with an AR of around 0.75 and an
Rule-ATT&CK Mapper (RAM): Mapping SIEM Rules to TTPs Using LLMs
References [25] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi,
[1] [n. d.]. ReAct Prompting. https://www.promptingguide.ai/techniques/react Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reason-
[2] Bader Al-Sada, Alireza Sadighian, and Gabriele Oligeri. 2024. Mitre att&ck: State ing in large language models. Advances in neural information processing systems
of the art and way forward. Comput. Surveys 57, 1 (2024), 1–37. 35 (2022), 24824–24837.
[3] Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, and Nidhi Rastogi. 2023. [26] Thomas Wolf. 2020. Transformers: State-of-the-Art Natural Language Processing.
Looking beyond IoCs: Automatically extracting attack patterns from external arXiv preprint arXiv:1910.03771 (2020).
CTI. In Proceedings of the 26th International Symposium on Research in Attacks, [27] Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina,
Intrusions and Defenses. 92–108. Michael Terry, and Carrie J Cai. 2022. Promptchainer: Chaining large language
[4] Paulo MMR Alves, PR Geraldo Filho, and Vinícius P Gonçalves. 2022. Leveraging model prompts through visual programming. In CHI Conference on Human Factors
BERT’s Power to Classify TTP from Unstructured Text. In 2022 Workshop on in Computing Systems Extended Abstracts. 1–10.
Communication Networks and Power Systems (WCNPS). IEEE, 1–7. [28] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan,
[5] Checkpoint. [n. d.]. Check Point Research. https://blog.checkpoint.com/research/ and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language
check-point-research-reports-highest-increase-of-global-cyber-attacks-seen- Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629
in-last-two-years-a-30-increase-in-q2-2024-global-cyber-attacks/ [29] Yizhe You, Jun Jiang, Zhengwei Jiang, Peian Yang, Baoxu Liu, Huamin Feng,
[6] Xiang Chen, Chaoyang Gao, Chunyang Chen, Guangbei Zhang, and Yong Liu. Xuren Wang, and Ning Li. 2022. TIM: threat context-enhanced TTP intelligence
2024. An Empirical Study on Challenges for LLM Application Developers. mining on unstructured threat data. Cybersecurity 5, 1 (2022), 3.
arXiv:2408.05002 [cs.SE] https://arxiv.org/abs/2408.05002 [30] Yongheng Zhang, Tingwen Du, Yunshan Ma, Xiang Wang, Yi Xie, Guozheng Yang,
[7] CISA. [n. d.]. Best Practices for Mapping to MITRE ATT&CK. https://www.cisa. Yuliang Lu, and Ee-Chien Chang. 2024. AttacKG+: Boosting Attack Knowledge
gov/news-events/news/best-practices-mitre-attckr-mapping Graph Construction with Large Language Models. arXiv preprint arXiv:2405.04753
[8] Cybero. [n. d.]. SIEM Optimization Through MITRE ATT&CK. https: (2024).
//www.cyrebro.io/blog/siem-optimization-through-mitre-attck-staying-
ahead-of-threats-with-cyrebro/
[9] Nir Daniel, Florian Klaus Kaiser, Anton Dzega, Aviad Elyashar, and Rami Puzis.
A Labels Distribution
2023. Labeling NIDS Rules with MITRE ATT &CK Techniques Using ChatGPT.
In European Symposium on Research in Computer Security. Springer, 76–91.
[10] Jacob Devlin. 2018. Bert: Pre-training of deep bidirectional transformers for
language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11] Exabeam. [n. d.]. What is SIEM and How Does it Work? https://www.exabeam.
com/explainers/siem-tools/siem-solutions/
[12] Reza Fayyazi, Rozhina Taghdimi, and Shanchieh Jay Yang. 2023. Advancing TTP
Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language
Models with Retrieval Augmented Generation. arXiv preprint arXiv:2401.00280
(2023).
[13] Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong,
Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. Codebert: A pre-trained
model for programming and natural languages. arXiv preprint arXiv:2002.08155
(2020).
[14] Yu Fengrui and Yanhui Du. 2024. Few-Shot Learning of TTPs Classification
Using Large Language Models. (2024).
[15] MITRE ATT&CK framework. [n. d.]. Data Sources. https://attack.mitre.org/
datasources/
[16] Scott Freitas, Jovan Kalajdjieski, Amir Gharib, and Robert McCann. 2024. AI-
Driven Guided Response for Security Operation Centers with Microsoft Copilot
for Security. arXiv:2407.09017 [cs.LG] https://arxiv.org/abs/2407.09017
[17] Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian,
Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the
planning of LLM agents: A survey. arXiv:2402.02716 [cs.AI] https://arxiv.org/
abs/2402.02716 Figure 6: Distribution of length of labels in test samples.
[18] Roman Kryukov, Vladimir Zima, Elena Fedorchenko, Evgenia Novikova, and
Igor Kotenko. 2022. Mapping the Security Events to the MITRE ATT &CK Attack
Patterns to Forecast Attack Propagation. In International Workshop on Attacks
and Defenses for Internet-of-Things. Springer, 165–176.
[19] Chenjing Liu, Junfeng Wang, and Xiangru Chen. 2022. Threat intelligence
ATT&CK extraction based on the attention transformer hierarchical recurrent
neural network. Applied Soft Computing 122 (2022), 108826.
[20] Marius Mărmureanu and Ciprian Oprişa. 2023. MITRE Tactics Inference from
Splunk Queries. In 2023 IEEE 19th International Conference on Intelligent Computer
Communication and Processing (ICCP). 277–283. doi:10.1109/ICCP60212.2023.
10398612
[21] Nanda Rani, Bikash Saha, Vikas Maurya, and Sandeep Kumar Shukla. 2023.
TTPHunter: Automated Extraction of Actionable Intelligence as TTPs from
Narrative Threat Reports. In Proceedings of the 2023 Australasian Computer Sci-
ence Week (Melbourne, VIC, Australia) (ACSW ’23). Association for Computing
Machinery, New York, NY, USA, 126–134. doi:10.1145/3579375.3579391
[22] Nanda Rani, Bikash Saha, Vikas Maurya, and Sandeep Kumar Shukla. 2024.
TTPXHunter: Actionable Threat Intelligence Extraction as TTPs form Finished
Cyber Threat Reports. arXiv preprint arXiv:2403.03267 (2024).
[23] Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal,
and Aman Chadha. 2024. A systematic survey of prompt engineering in large
language models: Techniques and applications. arXiv preprint arXiv:2402.07927
(2024).
[24] Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune
bert for text classification?. In Chinese computational linguistics: 18th China
national conference, CCL 2019, Kunming, China, October 18–20, 2019, proceedings
18. Springer, 194–206.