Splunk Validated Architectures: October 2020
Splunk Validated Architectures: October 2020
WHITE PAPER
Table of Contents
Introduction ............................................................................................................................................................... 2
Document Structure .......................................................................................................................................................................2
Reasons to Use Splunk Validated Architectures ...........................................................................................................................3
Pillars of Splunk Validated Architectures .......................................................................................................................................3
What to Expect From Splunk Validated Architectures ...................................................................................................................4
Roles and Responsibilities .............................................................................................................................................................4
Available Indexing and Search Topologies ........................................................................................................... 5
Splunk Cloud Deployment Architecture (CLOUD) .........................................................................................................................6
Single Server Deployment (S1) .....................................................................................................................................................8
Distributed Non-Clustered Deployment (D1 / D11) ........................................................................................................................9
Distributed Clustered Deployment - Single Site (C1 / C11) .........................................................................................................11
Distributed Clustered Deployment + SHC - Single Site (C3 / C13) .............................................................................................13
Distributed Clustered Deployment - Multi-Site (M2 / M12)...........................................................................................................15
Distributed Clustered Deployment + SHC - Multi-Site (M3 / M13) ...............................................................................................17
Distributed Clustered Deployment + SHC - Multi-Site (M4 / M14) ...............................................................................................19
Splunk Indexer Architecture Options ...........................................................................................................................................21
Classic Indexer Architecture Using File System Storage.............................................................................................................21
SmartStore Indexer Architecture Using Object Storage ..............................................................................................................22
Data Collection Architecture ........................................................................................................................................................23
Important Architectural Considerations and Why They Matter ....................................................................................................23
General Data Collection Architecture Guidance ..........................................................................................................................24
Data Collection Topology Overview .............................................................................................................................................25
Data Collection Components ................................................................................................................................ 26
(UF) Universal Forwarder ............................................................................................................................................................26
(HF) Heavy Forwarder .................................................................................................................................................................26
(HEC) HTTP Event Collector .......................................................................................................................................................27
(DCN) Heavy Forwarder as Data Collection Node ......................................................................................................................29
(IF) Intermediary Forwarding Tier ................................................................................................................................................30
(KAFKA) Consuming Log Data From Kafka Topics .....................................................................................................................32
(KINESIS) Consuming Log Data From Amazon Kinesis Firehose ..............................................................................................33
(METRICS) Metrics Collection .....................................................................................................................................................34
(SYSLOG) Syslog Data Collection...............................................................................................................................................35
Splunk Connect for Syslog (SC4S - recommended best practice) ..............................................................................................35
Syslog (File monitoring in conjunction with a syslog server)........................................................................................................37
Splunk UDP Input ........................................................................................................................................................................38
High-Availability Considerations for Forwarding Tier components ..............................................................................................38
Design Principles and Best Practices .................................................................................................................. 39
Deployment Tiers .........................................................................................................................................................................39
Aligning Your Topology With Best Practices................................................................................................................................39
Best Practices: Tier-Specific Recommendations .........................................................................................................................39
Search Tier Recommendations ...................................................................................................................................................40
Indexing Tier Recommendations .................................................................................................................................................41
Collection Tier Recommendations ...............................................................................................................................................42
Management / Utility Tier Recommendations ..............................................................................................................................43
Summary & Next Steps .......................................................................................................................................... 44
Appendix ................................................................................................................................................................. 45
Appendix "A": SVA Pillars Explained ...........................................................................................................................................45
Appendix "B": Topology Components ..........................................................................................................................................46
1
Splunk Validated Architectures
Introduction
Splunk Validated Architectures (SVAs) are proven reference architectures for stable, efficient and repeatable
Splunk deployments. Many of Splunk's existing customers have experienced rapid adoption and expansion,
leading to certain challenges as they attempt to scale. At the same time, new Splunk customers are increasingly
looking for guidelines and certified architectures to ensure that their initial deployment is built on a solid
foundation. SVAs have been developed to help our customers with these growing needs.
Whether you are a new or existing Splunk customer, SVAs will help you build an environment that is easier to
maintain and simpler to troubleshoot. SVAs are designed to provide you with the best possible results while
minimizing your total cost of ownership. Additionally, your entire Splunk foundation will be based on a repeatable
architecture which will allow you to scale your deployment as your needs evolve over time.
SVAs offer topology options that consider a wide array of organizational requirements, so you can easily
understand and find a topology that is right for your requirements. The Splunk Validated Architectures selection
process will help you match your specific requirements to the topology that best meets your organization's needs.
If you are new to Splunk, we recommend implementing a Validated Architecture for your initial deployment. If you
are an existing customer, we recommend that you explore the option of aligning with a Validated Architecture
topology. Unless you have unique requirements that make it necessary to build a custom architecture, it is very
likely that a Validated Architecture will fulfill your requirements while remaining cost effective.
This document contains all SVA topologies that are available at time of publication. For a custom
document that meets your specific requirements, please use the Interactive Splunk Validated
Architecture (iSVA) tool available here. The custom document will reflect the best practice approach
to search, indexing and data collection architecture, given your specific requirements identified when
working with the tool.
It is always recommended that you involve Splunk or a trusted Splunk partner to ensure that the
recommendations in this document meet your needs.
If you need assistance implementing a Splunk Validated Architecture, contact Splunk Professional Services.
Document Structure
SVAs are broken into three major content areas:
1. Indexing and search topology
2. Data collection architecture components
3. Design principles and best practices
Indexing and search covers the architecture tiers that provide the core indexing and search capabilities of a
Splunk deployment. The data collection component section guides you in choosing the right data collection
mechanism for your requirements.
Design principles and best practices apply to your architecture as a whole and will help you make the correct
choices when working out the details of your deployment.
2
Splunk Validated Architectures
These pillars are in direct support of the Platform Management & Support Service in the Splunk Center Of
Excellence model.
3
Splunk Validated Architectures
Clustered and non-clustered deployment Implementation choices (OS, baremetal vs. virtual vs. Cloud etc.).
options.
Deployment sizing.
Diagrams of the reference architecture.
A prescriptive approval of your architecture. Note: SVAs provide
Guidelines to help you select the architecture recommendations and guidelines, so you can ultimately make the
that is right for you right decision for your organization.
Role Description
Enterprise Architects Responsible for architecting Splunk deployments to meet enterprise needs.
Consultants Responsible for providing services for Splunk architecture, design, and
implementation.
Managed Service Entities that deploy and run Splunk as a service for customers.
Providers
4
Splunk Validated Architectures
5
Splunk Validated Architectures
6
Splunk Validated Architectures
When you are chosing Splunk Cloud, all deployment All apps need to be vetted and certified by
decisions regarding your indexing and search Splunk to ensure security of your environment.
topologies have already been made for you. The At the time of this writing, 900+ apps available
Splunk Cloud team will build and operate your on Splunkbase have been vetted and are
dedicated (single-tenant) AWS environment in a way Splunk Cloud Certified.
that allows for meeting Splunk's compliance
Hybrid Search support is limited to an on-prem
requirements and our service SLAs with you.
Search Head searching Splunk Cloud indexers,
The indexers that support your cloud environment are not vice versa.
distributed across multiple fault domains to ensure a
The IDM (Inputs Data Manager) shown in the
highly available service within a single region (for a list of
diagram is the Splunk Cloud-managed
supported regions, refer to the Splunk Cloud
implementation of a Data Collection Node
documentation).
(DCN) that supports scripted and modular
Besides listening on the standard TCP port (9997) for inputs only. For data collection needs beyond
forwarder traffic, you can also send data via the HTTP that, you can deploy and manage a DCN in
Event Collector (HEC, see details on HEC in the data your environment using a Splunk Heavy
collection section of this document). You may also Forwarder.
request access to the REST API endpoints via port 8089
Splunk Cloud will not manage or monitor your
via a support ticket to support programmatic access.
on-prem Splunk components.
The search head(s) are deployed into their own
security group. Depending on your Splunk Cloud
service agreement, that will either be a single search
head or a search head cluster.
If you license a premium app (like the Splunk App for
Enterprise Security [ES]), Splunk Cloud will provision a
dedicated search head as appropriate to meet the
premium app's requirements. If you intend to run ES, you
can forward adaptive response actions to your on- prem
environment using the Adaptive Response Relay as
described here.
All aspects of operating and managing this AWS
environment are the responsibility of the Splunk Cloud
Operations team, so you can focus on onboarding data
and getting value out of your Splunk deployment.
Data archiving and restoring is supported.
7
Splunk Validated Architectures
This deployment topology provides you with a very cost- • No High Availability for Search/Indexing
effective solution if your environment meets all of the
following criteria: • Scalability limited by hardware capacity
(straightforward migration path to a
a) You do not have any requirements to provide high- distributed deployment)
availability or automatic disaster recovery for your
Splunk deployment
b) Your daily data ingest is under ~300GB/day and
c) You have a small number of users with non-critical
search use cases
This topology is typically used for smaller, non business-
critical use-cases (often departmental in nature).
Appropriate use cases include data onboarding test
environments, small DevOps use cases, application test
and integration environments and similar scenarios.
The primary benefits of this topology include easy
manageability, good search performance for smaller
data volumes and a fixed TCO.
Multiple independent single-instance deployments can
be managed by a single management tier, as needed.
8
Splunk Validated Architectures
9
Splunk Validated Architectures
You require a distributed topology in either of the following • No High Availability for Search Tier
situations:
• Limited High Availability for indexing tier,
a) Your daily data volume to be sent to Splunk exceeds node failure may cause incomplete search
the capacity of a single-server deployment or results for historic searches, but will not
b) You want/need to provide highly available data ingest impact data ingest.
10
Splunk Validated Architectures
11
Splunk Validated Architectures
This topology introduces indexer clustering in conjunction • No High Availability for Search Tier
with an appropriately configured data replication policy.
This provides high-availability of data in case of indexer • Total number of unique buckets in indexer
peer node failure. However, you should be aware that this cluster limited to 20MM (V8.x), 40MM total
applies only to the indexing tier and does not protect buckets
against search head failure. Multiple independent search • No automatic DR capability in case of data
heads can be used for availability/capacity reasons or to center outage
run Splunk premium app solutions, like ES, although the
recommended approach to scale search capacity is to
employ Search Head Clustering.
Note for ES customers: If your category code is C11 (i.e.
you intend to deploy the Splunk App for Enterprise
Security), a single dedicated search head is required to
deploy the app (this is not pictured in the topology
diagram).
This topology requires an additional Splunk component
named the Cluster Manager (CM). The CM is responsible
for coordination and enforcement of the configured data
replication policy. The CM also serves as the authoritative
source for available cluster peers (indexers). Search Head
configuration is simplified by configuring the CM instead of
individual search peers.
You can optionally configure the forwarding tier to
discover available indexers via the CM. This simplifies the
management of the forwarding tier.
Be aware that data is replicated within the cluster in a non-
deterministic way. You will not have control over where
requested copies of each event are stored.
Additionally, while scalability is linear, there are limitations
with respect to total cluster size (~50PB of searchable
data under ideal conditions).
Splunk recommends deployment of the Monitoring
Console (MC) to monitor the health of your Splunk
environment.
12
Splunk Validated Architectures
13
Splunk Validated Architectures
Search Head Clustering (SHC) adds horizontal • No DR capability in case of data center outage
scalability and removes the single point of failure ES requires dedicated SH/SHC
from the search tier. A minimum of three search
heads are required to implement a SHC. • Professional services recommended for
implementing ES on a SHC
To manage the SHC configuration, an additional
Splunk component called the Search Head Cluster • SHC cannot have more than 100 nodes
Deployer is required for each SHC. This
component is necessary in order to deploy changes
to configuration files in the cluster. The Search
Head Cluster Deployer has no HA requirements
(no runtime role).
The SHC provides the mechanism to increase
available search capacity beyond what a single
search head can provide. Additionally, the SHC
allows for scheduled search workload distribution
across the cluster. The SHC also provides optimal
user failover in case of a search head failure.
A network load-balancer that supports sticky
sessions is required in front of the SHC members
to ensure proper load balancing of users across the
cluster.
Note for ES customers: If your category code is
C13 (i.e. you intend to deploy the Splunk App for
Enterprise Security), a dedicated search head
cluster is required to deploy the app (this is not
pictured in the topology diagram). The search tier
can contain clustered and non-clustered Search
Heads depending on your capacity and
organizational needs (this is also not pictured in the
topology diagram).
14
Splunk Validated Architectures
15
Splunk Validated Architectures
To provide near-automatic disaster recovery in case • No sharing of available Search Head capacity
of a catastrophic event (like a data center outage), and no search artifact replication across sites
multi- site clustering is the deployment architecture
of choice. A healthy multi-site cluster requires • Failure of Management functions need to be
acceptable inter-site network latency as specified in handled outside of Splunk in case of site failure
the Splunk documentation. • Cross-site latency for index replication must be
This topology allows you to deterministically within recommended limits
replicate data to two or more groups (=sites) of
indexer cluster peers by configuring the site
replication and search factor. This site-replication
factor allows you to specify where replica copies
are being sent to and ensures data is distributed
across multiple failure domains.
A multi-site cluster is still managed by a single
cluster manager, which has to be failed over to the
DR site in case of a disaster.
Multi-site clustering provides data redundancy
across physically separated distributed locations,
with the possibility for geographically separated
distribution (subject to limits mentioned above).
Available search peer (indexer) capacity across
sites can be utilized for search execution in an
active/active model. Site-affinity can be configured
to ensure that users logged on to a specific site's
search head(s) will only search local indexers.
Note for ES customers: If your category code is
M12 (i.e. you intend to deploy the Splunk App for
Enterprise Security), a single dedicated search
head is required to deploy the app (this is not
pictured in the topology diagram). For the ES
search head, failover involves setting up a
"shadow" search head in the failover site that is
only activated and used in a DR situation.
Please engage Splunk Professional Services to
design and implement a site failover mechanism for
your Enterprise Security deployment.
16
Splunk Validated Architectures
17
Splunk Validated Architectures
This topology utilizes a search head cluster to add • No search artifact replication across sites,
horizontal scalability and removes the single point SHCs are standalone
of failure from the search tier in each site. A
minimum of three search heads are required to • Cross-site latency for index replication must be
implement a SHC (per site). within documented limits
One or more independent SHCs can be deployed • A single SHC cannot contain more than 100
to meet specific requirements, for example to run nodes
some of Splunk's premium apps that require
dedicated search environments.
To manage the SHC configuration, an additional
Splunk component called the Search Head Cluster
Deployer is required for each SHC. This
component is necessary in order to deploy changes
to configuration files in the cluster. The Search
Head Cluster Deployer has no HA requirements
(no runtime role).
The SHC provides the following benefits:
a) Increased available search capacity beyond
what a single search head can provide
b) Scheduled search workload distribution across
the cluster and
c) Optimal user failover in case of a search head
failure
A network load-balancer that supports sticky
sessions is required in front of the SHC members in
each site to ensure proper load balancing of users
across the cluster.
Note for ES customers: If your category code is
M13 (i.e. you intend to deploy the Splunk App for
Enterprise Security), a single dedicated search
head cluster contained within a site is required to
deploy the app (this is not explicitly pictured in the
topology diagram). To be able to recover an ES SH
environment from a site failure, 3rd party
technology can be used to perform a failover of the
search head instances or a "warm standby" ES SH
can be provisioned and kept in synch with the
primary ES environment. It is strongly
recommended to engage with Splunk Professional
Services when deploying ES in a HA/DR
environment.
18
Splunk Validated Architectures
19
Splunk Validated Architectures
20
Splunk Validated Architectures
Note: Please refer to the Splunk documentation for information on supported storage and file system types.
This architecture tightly couples indexer compute and storage, is able to provide a consistent search performance profile
across all data at rest and has few external dependencies. While scaling out is relatively straightforward in this model, scaling
down is not easily possible and requires potentially time-consuming procedures. In clustered indexer topologies, Splunk
maintains multiple copies of the data across the configured retention period. This requires a potentially significant amount of
storage, especially when requirements for long-term data retention exist, and increases the TCO for the solution accordingly.
This architecture is recommended when you have requirements for either
• Short-term data retention (<=3 months) or
• Long-term retention and performance-critical search use cases that frequently access older historic data
21
Splunk Validated Architectures
All data reliability responsibility is transferred to the object store to ensure lossless storage. That means that redundant
bucket copies created in the classic architecture do no longer need to be created by Splunk in the object store, leading to
the substantial storage savings for clustered deployments mentioned above.
Note: You can find a more detailed architecture diagram for SmartStore here.
Splunk's own analysis of customer search profiles has shown that 95%+ of all searches are run over time periods 7 days or
less. If your use cases fall into the same category, SmartStore can be a great choice to provide you more flexibility with scaling
compute and storage at a lower TCO. For environments with small total data volume and short data retention, or environments
where elasticity is not required, it is typically more economical to use the classic architecture.
To utilize the SmartStore indexer architecture, you will need an S3 API-compliant object store. This is readily available in
AWS, but may not be available in other cloud and on-prem environments. This object store needs to provide the same or
better availability as your indexing tier.
For details on how to implement SmartStore, please refer to the documentation here. For a
list of current SmartStore restrictions, please refer to the documentation here.
Important Note: If you intend to deploy SmartStore on-prem with an S3-compliant object store in a multi-site deployment
spanning data centers, please make sure you understand the specific requirements for the object store documented here.
22
Splunk Validated Architectures
Data is ingested properly (timestamps, The importance of ideal event distribution across indexers cannot be
line breaking, truncation) overstated. The indexing tier works most efficiently when all available
indexers are equally utilized. This is true for both data ingest as well
as search performance. A single indexer that handles significantly
more data ingest compared to peers can negatively impact search
response times. For indexers with limited local disk storage, uneven
event distribution may also cause data to be prematurely aged out
before meeting the configured data retention policy.
Data is optimally distributed across If data is not ingested properly because event timestamps and line
available indexers breaking are not correctly configured, searching this data will become
very difficult. This is because event boundaries have to be enforced
at search time. Incorrect or missing timestamp extraction
configurations can cause unwanted implicit timestamp assignment.
This will confuse your users and make getting value out of your data
much more difficult than it needs to be.
All data reaches the indexing tier Any log data that is collected for the purpose of reliable analytics
reliably and without loss needs to be complete and valid, such that searches performed on the
data provide valid and accurate results.
All data reaches the indexing Delays in data ingest will increase the time between a potentially
tier with minimum latency critical event occurring and the ability to search for and react to it.
Minimal ingest latency is often crucial for monitoring use cases that
trigger alerts to staff or incur automated action.
23
Splunk Validated Architectures
Data is secured while in transit If the data is either sensitive or has to be protected while being sent
over non-trusted networks, encryption of data may be required to
prevent unauthorized third- party interception. Generally, we
recommend all connections between Splunk components to be SSL
enabled.
Network resource use is minimized The network resource impact of log data collection must be
minimized so as not to impact other business critical network traffic.
For leased-line networks, minimizing network utilization also
contributes to a lower TCO of your deployment.
Authenticate/authorize data sources To prevent rogue data sources from affecting your indexing
environment, consider implementing connection
authentication/authorization. This may be covered by using network
controls, or by employing application-level mechanisms (e.g.,
SSL/TLS).
Because of its vital role for your deployment, the guidance in this document focuses on architectures that support
ideal event distribution. When a Splunk environment does not provide expected search performance, it is in
almost all cases either caused by not meeting minimum storage performance requirements and/or uneven event
distribution that limits exploiting search parallelization.
Before we take a look at the recommended data collection architecture components we identified during your use
of the iSVA tool, let's share some general guidance for the data collection tier.
24
Splunk Validated Architectures
The diagram above shows the Deployment Server (DS) in the management tier, which is used to manage the
configurations on data collection components. Also, the License Manager (LM) is shown here since data
collection nodes require access to the LM to enable Splunk Enterprise features. The Cluster Manager (CM), if
available, can be used by forwarders for indexer discovery, removing the need to manage available indexers in
the forwarder output configuration.
In the above diagram, AutoLB represents the Splunk built-in auto-load balancing mechanism. This mechanism is
used to ensure proper event distribution for data sent using the Splunk proprietary S2S protocol (default port
9997). Note: Using an external network load-balancer for S2S traffic is currently not supported and not
recommended.
To load-balance traffic from data sources that communicate with an industry-standard protocol (like HTTP or
syslog), a Layer 7 network load balancer is used to ensure even load and event distribution across indexers in the
indexing tier.
25
Splunk Validated Architectures
26
Splunk Validated Architectures
In general, HFs are not installed on endpoints for the purpose of data collection. Instead, they are used on
standalone systems to implement data collection nodes (DCN) or intermediary forwarding tiers. Use a HF only
when requirements to collect data from other systems cannot be met with a UF.
Examples of such requirements include:
• Reading data from RDBMS for the purpose of ingesting it into Splunk (database inputs)
• Collecting data from systems that are only reachable via an API (cloud services, VMWare monitoring,
proprietary systems, etc.)/li>
• Providing a dedicated tier to host the HTTP event collector service (HEC)
• Implementing an intermediary forwarding tier that requires a parsing forwarder for routing/filtering/masking
The management tier contains the license manager (required by HF) as well as the deployment server to manage
the HTTP inputs on the listening components. Note: If the indexing tier is clustered and receives HEC traffic
directly, HEC configuration is managed via the cluster manager instead of the deployment server.
27
Splunk Validated Architectures
The decision for which deployment topology you choose depends largely on your specific workload. For example,
if you need to provide connectivity for a very large number of HTTP clients, a dedicated listener tier that can
provide the required system resources may be needed to not starve indexers for those resources. On the other
hand, if a large volume of data is coming in from a relatively small number of producers, ingesting this traffic
directly on indexers is preferable. Also, consider that a dedicated HEC listener tier introduces another
architectural component into your deployment, so avoid it unless your requirements dictate otherwise. On the
positive side, it can be scaled independently and provides a level of isolation from the indexing tier from a
management perspective. Also, since the dedicated HEC tier requires a HF, it will parse all inbound traffic,
removing some or all of that workload off of the indexers.
On the other hand, hosting the HEC listener directly on the indexers will likely ensure better event distribution
across the indexing tier, because HTTP is a well-understood protocol for all network load balancers and the
appropriate load balancing policy can help ensure that incoming data is evenly spread across available indexers.
In the spirit of deploying the simplest possible architecture that meets your requirements, we recommend you
consider hosting your HEC listener on the indexers, assuming you have sufficient system capacity to do so (see
above). This decision can easily be reverted later if the need arises, simply by deploying an appropriately sized
and configured HF tier and changing the LB configuration to use the HF's IP addresses instead of the indexers'.
That change should be transparent to client applications.
Note: If you do require indexer acknowledgment for data sent via HEC, a dedicated HEC listener tier is
recommended to minimize duplicate messages caused by rolling indexer restarts.
Note: This HEC deployment architecture is also used for providing the transport for some of the other data
collection components discussed later, specifically Syslog and metrics data collection.
28
Splunk Validated Architectures
29
Splunk Validated Architectures
In a scenario with a single intermediary forwarder, all endpoints connect to this single forwarder (potentially
thousands), and the intermediary forwarder in turn only connects to one indexer at any given time. This is not an
optimal scenario because the following consequences are likely to occur:
• A large data stream from many endpoints is funneled through a single pipe that exhausts your system and
network resources
• Limited failover targets for the endpoints in case of IF failure (your outage risk is inversely proportional to the
number of IFs)
• A small number of indexers are served at any given point in time. Searches over short time periods will not
benefit from parallelization as much as they could otherwise
30
Splunk Validated Architectures
Intermediary forwarders also add an additional architecture tier to your deployment which can complicate
management and troubleshooting and adds latency to your data ingest path. Try to avoid using intermediary
forwarding tiers unless this is the only option to meet your requirements. You may consider using an intermediary
tier if you have:
• Sensitive data that needs to be obfuscated/removed before sending across the network to indexers. An
example is when you must use a public network
• Strict security policies that do not allow for direct connections between endpoints and indexers such as multi-
zone networks or cloud-based indexers
• Bandwidth constraints between endpoints and indexers requiring a significant subset of events to be filtered
• Requirements to perform event-based routing to dynamic targets
Consider sizing and configuration needs for any intermediary forwarding tier to ensure availability of this tier,
provide sufficient processing capacity to handle all traffic and support good event distribution across indexers.
The IF tier has the following requirements:
Sufficient number of data processing pipelines overall Redundant IF infrastructure
Properly tuned Splunk load-balancing configuration. For example, autoLBVolume, EVENT_BREAKER,
EVENT_BREAKER_ENABLE, possibly forceTimeBasedAutoLB as needed
The general guideline suggests having twice as many IF processing pipelines as indexers in the indexing tier.
Note: A processing pipeline does not equate to a physical IF server. Provided sufficient system resources (CPU
cores, memory and NIC bandwidth) are available, a single IF can be configured with multiple processing
pipelines.
If you need an IF tier (see questionnaire), default to using UF for the tier since they provide higher throughput at a
lower resource footprint for both the system and network. Use HF if the UF capabilities do not meet your
requirements.
31
Splunk Validated Architectures
The diagram shows Kafka Publishers sending messages to the Kafka bus. The tasks hosted in the Kafka Connect cluster
consume those messages via the Splunk Connect for Kafka and send the data to the HEC listening service using a network
load balancer. Again, the HEC listening service can be either hosted directly on the indexers, or on a dedicated HEC listener
tier. Please refer to the HEC section for details. Management tier components are only required if a dedicated HF tier is
deployed to host HEC listeners.
32
Splunk Validated Architectures
The diagram shows AWS log sources being sent using a Kinesis stream to the Firehose, which with proper
configuration will send the data to the HEC listening service via an AWS ELB. Again, the HEC listening service
can be either hosted directly on the indexers, or on a dedicated HEC listener tier. Please refer to the HEC section
for details.
Management tier components shown are only required if a dedicated HF tier is deployed to host HEC listeners.
33
Splunk Validated Architectures
Statsd currently supports UDP and TCP transport, which you can use as a direct input on a Splunk Forwarder, or
Indexer. However, it is not a best practice to send TCP/UDP traffic directly to forwarders in production as the
architecture is not resilient and prone to event loss (see Syslog collection) caused by required Splunk forwarder
restarts.
34
Splunk Validated Architectures
The diagram below shows syslog sources sending data to the SC4S instance that is physically closer to them.
Events from a source for which SC4S has a 'filter' log path configuration will be identified, parsed and formatted
into JSON before being streamed to Splunk via HTTP/HTTPS. Best practice is to implement a network load
balancer between SC4S and the Indexers to achieve the best data distribution possible. The events hitting the
Indexer are already prepared so there is no need for an Add-On on the Indexer in most cases.
35
Splunk Validated Architectures
36
Splunk Validated Architectures
The diagram shows syslog sources sending data using TCP or UDP on port 514 to a load-balanced pool of syslog
servers. Multiple servers ensure HA for the collection tier and can prevent data loss during maintenance
operations. Each syslog server is configured to apply rules to the syslog stream that result in syslog events being
written to dedicated files/directories for each source type (firewall events, OS syslog, network switches, IPS, etc.).
The UF that is deployed to each server monitors those files and forwards the data to the indexing tier for
processing into the appropriate index. Splunk AutoLB is used to distribute the data evenly across the available
indexers.
The deployment server shown in the management tier can be used to centrally manage the UF configuration.
37
Splunk Validated Architectures
Finally, where possible, UDP is preferred over TCP for syslog sources. While TCP can recover in some cases
(single packet loss at high throughput), the recovery is more than likely to result in the loss of more than one
event.
Forwarding tier
At the forwarding (endpoint) tier, HA for the agent itself is dependent upon the underlying OS. At the very
minimum, you should ensure that any services that implement forwarding functionality are restarted automatically
when the host OS restarts. Outside of that, best practices for the forwarders would involve the configuration and
proper use of AutoLB from the forwarders to multiple indexers. This also may involve use of the indexer
acknowledgement feature in order to guarantee data arrives at the indexing tier at least once (potential duplicates
with indexer acknowledgement must be handled when searching).
38
Splunk Validated Architectures
Deployment Tiers
SVA design principles cover all of the following deployment tiers:
Tier Definition
Search Search heads
Indexing Indexers
Collection Forwarders
Modular Inputs
Network
HEC (HTTP Event Collector) etc.
Management / Utility CM
DS
LM
DMS
SHC-D
39
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
AVAILABILITY
SCALABILITY
SECURITY
DESIGN PRINCIPLES / BEST PRACTICES
(Your requirements will determine which practices apply to you)
1 Keep search tier close (in network terms) to the indexing tier
Any network delays between search and indexing tier will have direct
impact on search performance
2 Avoid using multiple independent search heads
Independent search heads do not allow sharing of Splunk artifacts
created by users. They also do not scale well with respect to resource
utilization across the search tier. Unless there is a specific need to have
isolated search head environments, Search Head Clustering is a better
option to scale.
3 Exploit Search Head Clustering when scaling the search tier
A search head cluster replicates user artifacts across the cluster and
allows intelligent search workload scheduling across all members of the
cluster. It also provides a high availability solution.
4 Forward all search heads' internal logs to indexing tier
All indexed data should be stored on the indexing tier only. This
removes the need to provide high- performing storage on the search
head tier and simplifies management. Note: This also applies to any
other Splunk roles.
5 Consider using LDAP auth whenever possible Centrally managing user
identities for authentication purposes is a general enterprise best
practice, simplifies management of your Splunk deployment and
increases security.
6 Ensure enough cores to cover concurrent search needs
Every search requires a CPU core to execute. If no cores are available
to run a search, the search will be queued, resulting in search delays for
the user. Note: Applicable to Indexing Tier as well.
7 Utilize scheduled search time windows as possible / smooth scheduled
search load
Often, scheduled searches run at specific points in time (on the hour,
5/15/30 minute after the hour, at midnight). Providing a time window that
your search can run in helps avoiding search concurrency hotspots.
40
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
SCALABILITY
AVAILABILITY
SECURITY
DESIGN PRINCIPLES / BEST PRACTICES
(Your requirements will determine which practices apply to you)
41
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
SCALABILITY
AVAILABILITY
SECURITY
DESIGN PRINCIPLES / BEST PRACTICES
(Your requirements will determine which practices apply to you)
42
Splunk Validated Architectures
MANAGEABILITY
PERFORMANCE
SCALABILITY
AVAILABILITY
SECURITY
DESIGN PRINCIPLES / BEST PRACTICES
(Your requirements will determine which practices apply to you)
43
Splunk Validated Architectures
This whitepaper has also covered the 3-step Splunk Validated Architectures selection process:
• Definition of requirements,
• Choosing a topology,
• Applying design principles and best practices,
• and covered both search and indexing tiers as well as the data collection tier.
Now that you are familiar with the multiple benefits of Splunk Validated Architectures, we hope you are ready to
move forward with the process of choosing a suitable deployment topology for your organization.
Next Steps
So, what comes after choosing a Validated Architecture? The next steps on your journey to a working
environment include:
• Customizations
Consider any necessary customizations your chosen topology may need to meet specific requirements
• Deployment Model
Decide on deployment model (bare metal, virtual, cloud)
• System
Select your technology (servers, storage, operating systems) according to Splunk system requirements
• Sizing
Gather all the relevant data you will need to size your deployment (data ingest, expected search volume, data
retention needs, replication, etc.) and consult with a Splunk expert to size your environment
• Staffing
Evaluate your staffing needs to implement and manage your deployment. This is an essential part of building
out a Splunk Center of Excellence
We are here to assist you throughout the Validated Architectures process and with next steps. Please feel free to
engage your Splunk Account Team with any questions you might have. Your Account Team will have access to
the full suite of technical and architecture resources within Splunk and will be happy to provide you with further
information.
Happy Splunking!
44
Splunk Validated Architectures
Appendix
This section contains additional reference information used in the SVAs.
Availability The ability to be continuously 1. Eliminate single points of failure / Add redundancy
operational and able to recover 2. Detect planned and unplanned failures/outages
from planned and unplanned
outages or disruptions. 3. Tolerate planned/unplanned outages, ideally
automatically
4. Plan for rolling upgrades
Performance The ability to effectively use 1. Add hardware to improve performance; compute,
available resources to maintain storage, memory.
optimal level of service under 2. Eliminate bottlenecks 'from the bottom up'
varying usage patterns.
3. Exploit all means of concurrent processing
4. Exploit locality (i.e. minimize distribution of components)
5. Optimize for the common case (80/20 rule)
6. Avoid unnecessary generality
7. Time shift computation (pre-compute, lazily compute,
share/batch compute)
8. Trade certainty and accuracy for time (randomization,
sampling)
Scalability The ability to ensure that the 1. Scale vertically and horizontally
system is designed to scale on 2. Separate functional components that need to be scaled
all tiers and handle increased individually
workloads effectively.
3. Minimize dependencies between components
4. Design for known future growth as early as possible
5. Introduce hierarchy in the overall system design
Security The ability to ensure that the 1. Design for a secure system from the start
system is designed to protect 2. Employ state-of-the art protocols for all communications
data as well as
configurations/assets while 3. Allow for broad-level and granular access to event data
continuing to deliver value. 4. Employ centralized authentication
5. Implement auditing procedures
6. Reduce attack or malicious use surface area
Manageability The ability to ensure the system 1. Provide a centralized management function
is designed to be centrally 2. Manage configuration object lifecycle (source control)
operable and manageable
across all tiers. 3. Measure and monitor/profile application (Splunk) usage
4. Measure and monitor system health
45
Splunk Validated Architectures
License The license master is required by The license master role has
Master (LM) other Splunk components to minimal capacity and availability
enable licensed features and requirements and can be
track daily data ingest volume. colocated with other management
functions. It can be virtualized for
easy failure recovery.
Search Head The search head cluster deployer The SHC-D is not a runtime
Cluster is needed to bootstrap a SHC component and has minimal
Deployer and manage Splunk configuration system requirements. It can be
(SHC-D) deployed to the cluster. colocated with other management
roles. Note: Each SHC requires its
own SHC-deployer function. It can
be virtualized for easy failure
recovery.
Search Search Head The search head provides the UI Search heads are dedicated
(SH) for Splunk users and coordinates Splunk instances in distributed
scheduled search activity. deployments. Search heads can
be virtualized for easy failure
recovery, provided they are
deployed with appropriate CPU
and memory resources.
Search Head A search head cluster is a pool of Search head clusters require
Cluster (SHC) at least three clustered Search dedicated servers of ideally
Heads. It provides horizontal identical system specifications.
scalability for the search head tier Search head cluster members can
and transparent user failover in be virtualized for easy failure
case of outages. recovery, provided they are
deployed with appropriate CPU
and memory resources.
46
Splunk Validated Architectures
Indexing Indexer Indexers are the heart and soul of Indexers must always be on
Splunk. They process and index dedicated servers in distributed or
incoming data and also serve as clustered deployments. In a single-
search peers to fulfill search server deployment, the indexer will
requests initiated on the search also provide the search UI and
tier. license master functions. Indexers
perform best on bare metal servers
or in dedicated, high-performance
virtual machines, if adequate
resources can be guaranteed.
Data Forwarders General icon for any component This includes universal and heavy
Collection and other data involved in data collection. forwarders, network data inputs
collection and other forms of data collection
components (HEC, Kafka, etc.)
Splunk SC4S is the current best practice We created a dedicated icon for
Connect for approach for SYSLOG data SC4S to reflect a fundamentally
Syslog (SC4S) collection. different, containerized
deployment model for this data
collection tier component.
Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States 47
and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved.