CERN Accelerating science

If you experience any problem watching the video, click the download button below
Download Embed
Internal Note
Report number CERN-IT-Note-2011-005
Title Evaluation of the Intel Nehalem-EX server processor
Author(s) Jarp, S ; Lazzaro, A ; Leduc, J (CERN) ; Nowak, A (CERN)
Corporate author(s) CERN. Geneva. IT Department
Publication 2010
Collaboration CERN openlab Collaboration
Imprint 01 May 2010
Number of pages 24
Subject category Computing and Computers
Keywords x86 ; x86-64 ; multi-core ; multi-threading ; many-core ; manycore ; multicore ; multithreading ; processor ; cpu ; intel ; amd ; architecture ; shrink ; WLCG ; hep ; C++ ; vector ; moore's law ; TDP ; Nehalem ; Westmere ; Sandy Bridge ; numa ; MIC ; scc ; threading ; WSM ; NHM ; WSM-EP ; NHM-EX ; Nehalem-EX ; Westmere-EP ; Geant4 ; ROOT ; SPEC ; HEPSPEC ; HEPSPEC06 ; ParFullCMS ; ParFullCMSmt ; benchmark ; benchmarking ; power consumption ; scalability ; scaling ; hyper-threading ; hyper threading ; smt
Abstract In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Symmetric MultiThreading (SMT), the cache sizes available, the configured memory topology, as well as the power configuration if throughput per watt is to be measured. We have tried to do a good job of comparing like with like. In summary, we saw a broad range of results. Our variant of the SPEC benc hmark rate, “HEPSPEC”, gave a stunning 3x overall improvement on the new server, thanks to good scaling with the 32 cores and a 26% additional gain when enabling SMT. In-house data analysis and simulation benchmarks showed throughput increases in the range of 11 to 60%. Oracle database tests will follow. Finally it should be mentioned that the 4-socket server can be equipped with 32 memory cards (DIMMs) which correspond to 512 GB total memory. This is a very impressive amount of memory that also comes with a very significant thermal load.
Submitted by [email protected]

 


 レコード 生成: 2011-01-28, 最終変更: 2015-01-15


Access to fulltext:
Download fulltext
PDF