Selecting The Correct Workstation Dell
Selecting The Correct Workstation Dell
Selecting The Correct Workstation Dell
W
hile professional workstations are capabilities than financial analysts using
similar to PCs in some ways, there software to model new derivative types.
are major design differences that
address the professional technical user and This e-guide aims to help those making
make workstations a vital tool to deliver workstation purchase decisions for their
maximum productivity for mission-critical organizations. First, it provides background
applications. Dell Precision workstations on how workstations can enhance user
deliver maximum performance, reliability and productivity for engineering simulation
maintainability for engineers and designers workflows and why that’s important. It briefly
to deliver the highest return on investment looks at the latest generational advancements
possible for their organizations. that might drive upgrade or replacement
decisions Then, it identifies the core principles
Workstation technology is constantly evolving to help guide buying decisions and ensure the
to bring new levels of performance in each right workstation fit.
generation, so it is important to understand
how to take advantage of this evolution In short, these principles support this
to maximize your productivity. But not all overarching consideration: Application and
workstation feature sets are the same. That’s workload requirements should define
because the types of applications for which workstation specifications.
workstations are used vary depending on
the discipline and applications. For example,
engineers using computer-aided design or
engineering (CAD/CAE) tools need different
Every computer user inherently experiences new methods, processes and even business
productivity losses while waiting for their models.
computer to respond - the coffee cup With ever more powerful workstations,
syndrome. Losing your train of thought whether in fixed or mobile configurations,
productivity
is a sure way to derail your concentration the stage has been set to blend digital and
and creativity. By maintaining focus a user physical workspaces. For example, CAD/CAE
becomes much more productive, as illustrated applications with virtual-reality features can
in figure 1. help designers and engineers visualize their
response time models in full-scale 3D, so they can evaluate
No matter what the phenomenon’s name, and share the effectiveness of their designs in
system latency can reduce productivity and a much more realistic fashion.
FIGURE 1: limit users’ creativity. That’s because they
The quicker a workstation’s response are less likely to try out different approaches This blending can help remove much of the
time, the more productive its user can be. if they know that they’ll incur a time penalty latency inherent not only in human-machine
for attempting one. For example, running interactions, but also along entire processes.
an analysis of a product design can take This can greatly accelerate time-to-market for
several minutes or even hours depending on a a product or service, sharpen its competitive
workstation’s processing power. With sufficient positioning, and ultimately improve the
processing power, a workstation can minimize profitability of the product or service throughout
this time and liberate the user to try different its lifecycle. That’s especially critical going forward
designs to find optimal solutions. as the lifecycles of products and services are
increasingly compressed, shortening their profit
The implication is that, extended over an windows, as illustrated in Figure 2.
entire working year, for an entire organization
companies can save months of engineering BEYOND CAD/CAE
Shorter Profit Window time by using the latest generation To use another example, imagine the digital
workstations. The result would be better transformation of a product development
design, faster time-to-market and first-mover process through the eyes of a product design
cost
competitive advantages. and engineering team.
shorter profit window
As always, the team will use CAD and CAE in
WORK TRANSFORMATIONS their work. Now, however, a digital twin — a
Today’s professionals operating in complex digital replica of a physical asset including
fields of architecture, engineering, construction, processes and systems — can integrate
energy, financial services, biotech/pharma, machine learning and analytics on their
cash flow
and media and entertainment, among workstations to make rapid suggestions about
many others, are experiencing digital the most efficient designs, given parameters
introduction growth maturity decline
transformations in how they work. Emerging of available components, costs, manufacturing
technologies — such as artificial intelligence, feasibility, and lifecycle data.
FIGURE 2: machine learning, advanced analytics, GPU
Shorter profit window due to compressed acceleration, augmented reality and virtual
product lifecycle reality — are changing the competitive playing
field by enabling these workers to pursue
2
The Key Criteria for Selecting the
Right Workstation for Simulation Workflows
The most important factor in procuring the right workstation is knowing
how it will be used. The intended purpose determines which components
are critical to performance and which are optional or even unnecessary. In
addition, the more you know about how the workstation will be used, the
more performance can be achieved per dollar spent. By first identifying the
various modes of use and then weighing the importance and frequency of
those tasks, you can better determine the right workstation for the job.
Solving
Pre-Processing
Post-Processing
3
Figure 6 illustrates the effects of the CPU frequencies are possible when only a single core
architecture of the Intel® Xeon® processor on is active; the lower frequencies are used when
graphics performance. For example, the Intel many or even all cores are active.
Xeon W Series and Intel Core™ i processor
product families of CPUs typically include This dynamic clocking allows interactive
the latest micro-architecture and higher workloads to operate at peak Turbo
frequencies. frequencies, while computational multi-core
workloads will operate at lower frequencies
Note, the Intel Xeon processor SP family is which allows a single workstation to provide
usually either one micro-architecture revision or balanced solution. This is important because
Life of the Workstation one die shrink behind the Core i Series of CPUs comparing the nominal frequency of two
at any given point in time. CPUs (or their “rated frequency,” commonly
Needs more cores quoted alongside the model name) isn’t always
Finally, the dual-socket Intel Xeon processor representative of the frequency they will be
SP family of CPUs incurs a slight performance operating at most of the time.
Needs higher frequency penalty in single-threaded workloads, such as
interactive applications feeding the GPU. To more precisely compare CPUs, you should
While it is best to maximize core counts for compare the low-frequency mode (LFM),
Needs both
computational workloads, interactive usage high-frequency mode (HFM), minimum Turbo
models provide the best performance with frequency (all cores loaded), and maximum
the highest CPU frequency. This is because Turbo frequency with one core loaded. More
Interactive Computational interactivity (as measured by frames per specifically, here is what’s important to consider:
second) is often limited by the efficiency of a
single core to feed the GPU with instructions • LFM: Decide if power efficiency at idle is
FIGURE 5: and data. critical. If the CPU isn’t doing any work, how
Weighing core count versus frequency important is it that it consumes as little
Most modern graphics programming power as possible?
interfaces today can only feed data and
instructions to the GPU using a single thread,
despite the GPU driver being multi-threaded. As • HFM: If the CPU doesn’t support Turbo
a result, performance benefits with increasing acceleration, this comparison is important.
core count are negligible beyond four cores. • Minimum Turbo frequency: Compare
The more time spent in interactive usage this spec if the CPU will spend most of
Graphics Performance models, the more of the workstation budget its time running computational or multi-
should be used to increase the maximum CPU threaded workloads.
Xeon W-Series frequency.
Xeon SP-Series • Maximum Turbo frequency: Compare
100
Dual Xeon SP-Series For most simulation applications or this spec if the CPU will spend most of
computational workloads, it makes sense its time running interactive or single-
to sacrifice some graphics performance threaded workloads.
using Xeon SP Series processors to gain the
Frames per second
4
CRITERIA #4: GPU Selection In fact, graphics cards in the same class application can use the GPU for computation
In general, graphics cards’ speed correlates can perform quite differently in the same and what type of GPU might be required. Some
to price. Graphics speed is most commonly application simply by changing the complexity applications that utilize GPU acceleration
associated with real-time rendering of the data set or the rendering mode. The include Mathworks MATLAB, ANSYS Fluent,
performance as measured in frames per recommended solution is to identify the Mechanical, Discovery Live (see next page)
second. The greater the frames per second class of GPU targeted for the workstation, Autodesk MoldFlow, Dassault Systèmes
rates are in an application, the more fluid users’ and then measure each of the cards in that SIMULIA, LSTC LS-DYNA Implicit, MSC
interactions will be, boosting productivity class to determine which provides the best NASTRAN and Altair OpiStruct/RADIOSS™.
of those users. Computational capabilities performance for a specific use. With the advent of General Purpose GPU
aside, finding the right graphics solution for a computing (sometimes referred to as GPGPU),
workstation depends on the desired frames BENCHMARKING WITH SPECviewperf software applications are increasingly
per second in the applications that will be most One tool to assist in this performance supporting compute by porting existing solver
used. comparison is SPECviewperf (available at methods to GPU, as well as including new
SPEC.org). It provides a good benchmark for methods and algorithms that are well-suited
A good rule of thumb for graphics performance comparing different workstation graphics cards to GPU parallelism. Apart from computing,
is to look for a GPU card that can deliver because it measures the frames per second GPUs with large frame buffers also help in
more than 30 frames per second in the most of several varied workloads using rendering visualization by handling large datasets in the
important applications, referencing data methods that mirror those of the many popular pre-processing and post-processing stages
models and rendering modes most like those in workstation applications. of simulation. Customer adoption has grown
a workstation’s day-to-day use. as simulation visualization and computing
With this benchmark, anyone can view the needs can be effectively handled with a single
While the persistence of vision phenomenon detailed frames per second measurements CPU socket and a GPU, as opposed to dual
suggests that 25 frames per second is the of several different methods of rendering and socket CPUs with many cores. Also, with the
minimum required to maintain the illusion of compare graphics card performance based given software investment, more simulations
smooth animation, more is usually better. Note on published results. The benchmark also can be run, thus helping boost innovation. To
there are special considerations for virtual and provides representative screen captures of the learn more about all the CAE applications for
augmented reality, where frame rates should image quality of these methods. For example, if modeling, visualization and computing, please
consistently be above 90 FPS to prevent VR someone is a frequent user of PTC Creo, a 3D see GPU Accelerated Applications catalog.
sickness. CAD program, this benchmarking data could be
used to compare how one GPU card performs
If a graphics card can deliver more than 100 versus another, not just with Creo, but also
frames per second in a particular rendering specifically with a data model and rendering
method using a specific model size and type, mode that best represent their particular use of
it is reasonable to assume that the complexity Creo. See figure 7 for details. Graphics Performance
and size of a model can be increased, and
users can still be able to interact with that When considering which GPU card is best
model without observable stuttering. for a particular workstation, it’s important GPU A
Figure 7 illustrates that graphics performance that the workstation will spend in either highly GPU C
can vary across the same application type. interactive work or in computational work
To select the best GPU choice for graphics, that utilizes the GPU. The more time spent in
it is best to evaluate performance within these usage types, the more of the workstation
Frames per second
60
the application or applications used on the budget should be spent on graphics.
workstation. Conversely, the less time spent in these usage
types, the more of the workstation budget
While graphics performance generally scales should be spent on other components such as
well moving up to higher-class graphics cards the CPU, memory, and storage. 0
(e.g., higher core clocks, faster memory, more
3dsmax CATIA Creo SolidWorks
GPU cores), you cannot look at aggregate GPU ACCELERATED COMPUTATION
performance across many applications and Like the question of whether a second CPU
decide based on this factor alone. This is socket is necessary, some computational
because two graphics cards in the same class workloads may scale in performance by FIGURE 7: Graphics performance can vary
can provide very different performance levels using the GPU as a computational resource. depending on a particular application
with the same application. That’s why it’s important to determine if an
5
BRINGING REAL-TIME SIMULATION EARLY phase, manufacturers can shorten the product
INTO DESIGN development process, save money, and speed
In early 2018, engineering simulation software time to market.
developer ANSYS announced availability of an
innovative product called Discovery Live. This This breakthrough software is GPU-accelerated
new tool offers users the ability to view in real and powered by NVIDIA CUDA™ technology.
time how variations to their design will affect That means performance scales up the NVIDIA
simulation so they can quickly explore dozens of Quadro GPU product line. The more powerful
design options early in the design process. the GPU, the faster Discovery Live runs. ANSYS
recommends a dedicated NVIDIA GPU with
Because is it easy to use, Discovery Live brings a minimum 4GB of frame buffer, although
simulation capability to broader range of 8GB is preferred. This is something to keep in
engineers, not just simulation experts, so that mind when selecting the right workstation for
all users can perform rapid, simple simulation simulation.
to assess product design modifications. Testing
multiple iterations quickly prior to submitting Since the GPU is doing most of the processing
models for final validation with complex CAE a faster clock speed CPU with lower cores
FIGURE 8: ANSYS Discovery Live simulation, helps designers and engineers (4) is ideal for maximum performance. This
instantaneous structural analysis to optimize product designs and reduce configuration is very similar to a standard
costs. For example, evaluating the effects of interactive CAD workstation recommendation
reducing material thickness to “lightweight” the for the most part with a higher end graphics
design of a complex bracket. By introducing card.
simulation during the conceptual design
6
CRITERIA #5: Memory Selection graphics cards have dedicated memory on
It’s often said that users can never have the card.
too much random-access memory (RAM).
Although that adage may be true for modern Finally, when the integrity of data used in
multi-core systems running massively multi- individual computations is paramount to the
threaded applications, it is still important to result, Error Correcting Code (ECC) memory
weigh other factors when considering which should be used. Intel Xeon CPUs support
type of memory to include in the workstation. this important function (although Intel Core™
processors do not). For example, one mistake
For computational workloads, workstations missed in early computations can have a
should typically have the maximum amount dramatic impact on the outcome, when iterating
of memory bandwidth available to the across a large data set where the outputs of
processing cores. So, if given the choice computations are continually provided as inputs
between populating eight DIMMs, each with into another computational sequence.
8GB capacity, or four DIMMs, each with 16GB
capacity, choose the option that populates In Dell Precision workstations, ECC is enhanced
more DIMM slots. The increase in available using Dell’s exclusive, patented Reliable
memory bandwidth will reduce the likelihood Memory Technology (RMT) Pro. This feature
that memory bandwidth becomes a bottleneck uses the ECC data to identify specific locations
for computational workloads, shifting the in the DIMM where errors are occurring (if they
computational burden back to the CPU cores, occur). This is yet another example of how
frequency, and cache. workstations differ from PCs.
Choosing the right frequency of RAM is RMT Pro can also identify an error at a
also important and varies depending on the memory-bit location and automatically mask
workload. In applications requiring maximum that location in memory to ensure that a
memory bandwidth, it is best to populate healthy location is used for subsequent reads
all available DIMM slots with the highest- and writes. The effect of this is that, rather
frequency memory. In some cases, however, than replacing an entire DIMM as ECC errors
applications may require the least latency become overwhelming, RMT masks only
possible, regardless of available bandwidth, so the small regions of concern. This allows the
populating all available DIMM slots with the DIMM to operate normally, increasing system
availability and reliability and eliminating “blue
Memory Bandwidth Comparison lower-frequency memory is a better approach.
screen” reboot situations. It also extends the
Although memory bandwidth remains usable life of the memory by allowing the user
important, the lower latency of slower memory to continue to work using a DIMM that would
4 x 8GB 1866 MHz
speeds can provide benefits to these types of otherwise require immediate replacement.
4 8 x 2GB 1600 MHz
random reads and writes. To learn more see Dell Reliable Memory
16 x 8GB 1866 MHz
Technology
Figure 9 illustrates this concept using
SPECwpc computational workloads. It CRITERIA #6: Storage Selection
Percent of Baseline
2 shows how a Dell Precision workstation with Many different storage performance
dual Intel® Xeon® processors can produce considerations depend completely on a
significantly higher scores as the number of workstation’s usage model. Is the data on the
DIMMs increase, all else being equal. network or stored locally on the workstation?
If on the network, how frequently are updates
0 committed to the network resource? If locally,
Prod. Prod. Life Sci. Energy MORE IS BETTER how much capacity is required? And is
Dev. Dev. Lammps FFTW
In practical terms, this translates to faster redundancy required for critical data?
Rodinia WPCcfd
times to complete jobs such as rendering, These factors are important in determining the
finite element analysis, and computational fluid right storage components for the workstation.
FIGURE 9: SPECwpc improvements with dynamics. It should be noted that the benefits Network bandwidth, frequency of updates and
increased DIMM slot population are specific to computational workloads. check-in/check-out procedures can affect the
However, the benefits of populating more use cases of local data storage. The following
DIMM slots for interactive workloads are are three common local storage use cases
difficult to measure, because most graphics that a single user of the workstation will have,
workloads fit into graphics memory and most usually with some blend of them:
7
• Office Productivity – reading and writing performance significantly due to the increased
small files with occasional large-file transfers bus speed. SDDs are highly recommended for
workstation applications.
• Interactive Workstation – opening and
saving a wide variety of file sizes Interactive Workstation storage
• Computational Workstation – iterating An Interactive Workstation usage model
Single Drive Performance across big data sets, generating large requires greater storage performance, which
temporary files is where SSDs, SAS drives and RAID arrays
SATA HDD SATA SSD
can play more important roles. For starters, if
SAS HDD PCIe SSD
Each of these has its own requirements. a single SSD can provide the capacity needs
Hybrid SSHD
Fortunately, the following storage technologies of both Office Productivity and Interactive
have evolved to help address them: Workstation use cases, it will be the best-
4 performing workstation option short of a multi-
Office Productivity storage drive RAID 0 array.
Optimizing for the Office Productivity use case
is as simple as weighing anticipated capacity RAID arrays enable the creation of a large
Relative to Baseline
2 needs with the highest-performing drive class virtual drive that spans one or more physical
within the budget. While rotational hard-disk (or logical) drives. Depending on the RAID type,
drives (HDDs) have traditionally dominated this greater performance and new features such
segment, in recent years the decreasing cost as redundancy are possible. If redundancy is
of MLC (multilevel cell) storage and controllers more important than performance (or just as
0
has brought the more favorable solid-state important), having a RAID array onfigured as
drives (SSDs) and hybrid (HDD-SSD) drives RAID 1, 10 or 5 would be a better choice.
Office Interactive Computational
Productivity Workstation Workstation within reach of more users.
If that’s the case, a decision must be made
FIGURE 10: Single-drive storage In general, hybrid hard drives provide the best between available matching drives to build
performance comparison price/performance for this use case, while the array. Note, however, that implementing a
SSDs provide the best outright performance. RAID storage model can increase related costs
Hybrid hard drives function by storing the considerably, making it prohibitively expensive
most frequently used data in cache, because to include high-performing drives in the array.
it is faster to access than from the rotating One way to mitigate this cost while maintaining
media in the drive. As long as the files in use high performance is to use an SSD boot drive
are relatively small, data is kept on the flash with the operating system and applications on
memory, resulting in faster performance. it, while building a RAID array out of lower-cost
rotational HDDs to store larger data sets.
Figure 10 shows the typical scaling one might
expect across the various usage models and Computational Workstation storage
drive types. Traditional spinning hard drives Computational Workstation use cases
(HDD) provide large capacities but are the typically require the manipulation of large
slowest in performance. Their differences data sets. If the data set size exceeds a
For detailed information see Precision are primarily related to disk interface type, single drive capacity, the only option is a RAID
Workstation Storage Classification rotational speed and buffer size. array composed of large drives. By using RAID
technology to combine smaller-capacity drives
For hybrid drives, performance can vary into a single large volume, an application can use
greatly depending on the workload. The more all this capacity as if it were a single large drive.
deterministic and repetitive the workload, Multiple drives in RAID 0 will maximize
the better a hybrid storage model performs. performance and capacity but provide no
However, hybrid drives are limited by the redundancy, and, therefore, it can be less
size of their flash cache, which means that reliable than a single drive. Multiple drives in
computational workloads that iterate across a RAID 1 provide redundancy but don’t maximize
large data set in a non-deterministic way can performance or capacity. RAID 10 increases
provide less performance benefit than office performance and capacity, and it adds
productivity uses. redundancy, but it is the costliest in terms of
the number of drives required.
Solid-state drives provide significant benefits
in read and write performance over rotational RAID 5 provides increased performance and
drives, but do not offer the same capacity as capacity, while adding redundancy with fewer
HDDs. Current SATA interface SSDs are limited drives than RAID 10. But RAID 5 requires more
by SATA 3.0 specification to 6 Gb/sec, 600 processing overhead to manage the array due
MB/sec. A PCIe interface SSD can enhance to the computation of parity data that is then
8
distributed across the array. Note: RAID 5 is machine learning, artificial intelligence and 2. Determining workstation component
not recommended on modern workstations advanced analytics. This is especially important utilization. This can be very useful for
with large capacity storage due to chance of given the complexity of a workstation’s identifying bottlenecks in your current
rebuild failures. software environment. It can involve a vast system and helping to specify your next
number of variations in operating system and purchase by generating systems reports
CONSIDER A CONTROLLER application versions, as well as in hardware, based on your real-world usage.
When considering whether to add a fourth drive including CPUs and GPUs, firmware and
driver versions. All these variables can 3. Keeping your system up to date with the
to an integrated storage controller and creating
affect application stability and performance. latest profiles and drivers.
RAID 10, consider the option to upgrade to a
discrete RAID controller with on-board memory Professional workstation GPUs are typically the See Dell Precision Optimizer for more details.
and move to RAID 5. Typically, the higher only GPUs that are certified, because they have
capacity and the addition of a discrete RAID professional drivers and applications are highly
controller can translate to higher performance tuned to deliver performance, reliability and
across all three workstation storage use cases. stability beyond consumer cards.
workstations and conventional PCs is that performance via customer application Office Interactive Computational
workstations are certified to run specific profiles. Productivity Workstation Workstation
applications, such as those for CAD, CAE, FIGURE 11: Hardware RAID
performance across the three
workstation storage use-case types
9
Conclusion: Invest for Today’s Application Requirements with
an Eye Toward Tomorrow’s
For many organizations worldwide, today’s workstation investments
will help drive their digital transformation, which includes fundamental
workforce and IT transformations. Together, these will enable quantum
gains in both individual and organizational performance. By adding
powerful productivity and collaboration enhancements — among them
artificial intelligence, machine learning, advanced analytics, AR and VR —
work teams can accelerate time-to-market, improve responsiveness to
new market opportunities and get more done in less time.
For detailed hardware recommendations
see Dell Precision Workstation Advisor These benefits are why it’s important to let
application requirements, rather than budget
considerations, define workstation specifications.
Too often, the latter approach will result in
under-equipped, under-powered workstations
that fall short of what users need for their
primary applications. In these cases, workstation
investments can result in the emergence of the
“coffee cup syndrome” and “invisible lost time”
phenomena described earlier, undermining user
productivity instead of increasing it.
JUSTIFIED INVESTMENTS
While properly matching workstation
specifications to application requirements
might require additional investments, those
investments can usually be justified. Consider a
single professional user earning $100,000 a year
in total compensation. Eliminating workstation
latencies alone could boost productivity by as
much as 10 percent. That can translate into
annual labor savings of $10,000. Conversely, an
under-equipped, under-powered workstation,
even a new one, can actually cause organizations
to incur hidden costs in less-than-optimal
productivity.
10