Skip to main content

How much better is AMD's second-generation server CPU SPEC performance than Intel?



Although SPEC2006 may have been replaced by SPEC2017, we have accumulated a lot of experience with SPEC2006. Considering the problems we encountered in the data center infrastructure, this is our best choice for the first round of raw performance analysis.

Single-threaded performance is still very important, especially in maintenance and setup situations. In many cases, it may be running a large bash script, trying a very complex SQL query, or configuring new software, and the user does not use all the kernels at all.

Although SPEC CPU2006 is more oriented towards high-performance computing and workstations, it contains a wide variety of integer workloads. We firmly believe that we should try to imitate how performance-critical software is compiled instead of trying to get the highest score. To this end, we:

Use 64-bit GCC: Currently the most commonly used compiler on Linux, for integer workloads, a very good comprehensive compiler, it will not try to "break" benchmarks (libquantum...), nor will it only support specific architectures ;
Use 4 and 8.3 versions: standard compiler with Ubuntu 18.04 LTS and 19.04;
Use -Ofast -fno-strict-aliasing optimization: to achieve a good balance between performance and keeping it simple;
Add "-std=gnu89" to the portability settings to solve the problem that some tests cannot be compiled.

The ultimate goal is to measure performance in non-actively optimized applications. In these applications, usually for some reason, a multi-threaded unfriendly task will make us wait. The disadvantage is that there are still quite a few cases where GCC will generate sub-optimal code, which will cause a big sensation compared with the results of ICC or AOCC. They are optimized to find specific optimizations in SPEC code.

The first is the single-threaded result. It is worth noting that due to the turbo technology, the clock speed of all processors will be higher than the reference clock speed.

Xeon E5-2699 v4 ("Broadwell") can be upgraded to 6 GHz. Note: These are the old results compiled with GCC 5.4;
Xeon 8176 ("Skylake-SP") can be increased to 8 GHz;
EPYC 7601 ("Naples") can be increased to 2 GHz;
The frequency of EPYC 7742 ("Rome") is increased to 4 GHz. The result was compiled with GCC 7.4 and 8.3.

Unfortunately, we cannot test the data of Intel Xeon 8280 on time. However, the Intel Xeon 8280 will provide very similar results, with the main difference being that it runs at a 5% increase in clock speed (4 GHz vs 3.8 GHz). So we expect the result will be 3-5% higher than Xeon 8176.

According to the special specification permission rules, since these results have not been officially submitted to the special specification database, we must declare them as evaluation results.



SPEC CPU analysis is always complicated, it mixes the type of code generated by the compiler and the CPU architecture.



First of all, the most interesting data point is that the code generated by GCC 8 seems to be a big improvement for EPYC processors. We repeated the single-threaded test three times and the results were consistent.

Hmmer is one of the branch-intensive benchmarks, and it is also the other two workloads that have a greater impact on branch prediction (the percentage of branch misses is slightly higher) gobmk, sjeng uses the new TAGE predictor, which performs better on the second-generation EPYC.

Why the low IPC omnetpp ("network sim") did not show any improvement is a mystery to us, and we expect a larger L3 cache will help. However, this is a test that really likes large caches, so Intel Xeon processors are very advantageous (38.5-55 MB L3).

The video coding benchmark "h264ref" also relies on the L3 cache to some extent, but the benchmark is more dependent on DRAM bandwidth. It is obvious that EPYC 7002 has a higher DRAM bandwidth.

The pointer tracking benchmark (XML processing and pathfinding) performed poorly on the previous generation EPYC (compared to Xeons), but showed a very significant improvement on the EPYC 7002.

Multi-core SPEC CPU2006

For the record, we believe that the standard CPU "speed" indicator is not of much value in estimating server CPU performance. Most applications will not run many completely independent processes in parallel; there will be at least some interaction between threads.



We need to emphasize it again: SPECint rate testing may not be realistic. If you start 112 to 256 instances, it will cause a huge bandwidth bottleneck, no synchronization, and 100% consistent CPU load, all of which are very unrealistic in most integer applications.

The specific rate estimation results emphasize all the advantages of the new EPYC processor: more cores, higher bandwidth. At the time, it ignored a minor drawback: higher internal latency. So this is the ideal situation for EPYC processors.

However, even if we consider that AMD has a 45% memory bandwidth advantage, and Intel's latest chip (8280) provides about 7% to 8% of performance, this is also very amazing. On average, the SPECint rate of EPYC 7742 is twice that of the best embedded Intel Xeon processor available.

Interestingly, we see that most interest rate benchmarks run on the P1 clock or the highest p-1 state. For example, this is the result we see when we run libquantum:



Some benchmark tests such as h264ref run at a lower clock.




Current servers do not allow us to make accurate power measurements, but it would be very shocking if the AMD EPYC 7742 can stay within the 225-watt workload range when running integer workloads on all cores at 3.2 gigahertz. Long story short: the new EPYC 7742 seems to be able to run integer workloads on all cores while supporting higher clocks than comparable Intel models.

Comments

Popular posts from this blog

AMD's GPU technology enters the mobile phone chip market for the first time

In addition to the release of the Exynos2100 processor, Samsung also confirmed a major event at this Exynos event, that is, the custom GPU that they have worked with AMD for many years will soon appear and will be used on the next flagship machine. The current Exynos2100 processor uses ARM’s Mali-G78GPU core with a total of 14 cores, so the GPU architecture developed by Samsung will be the next Exynos processor, and the GPU will be the focus. This is probably the meaning of Exynos2100’s GPU stacking. The key reason. Dr. InyupKang, president of Samsung’s LSI business, confirmed that the next-generation mobile GPU in cooperation with AMD will be used in the next flagship product, but he did not specify which product. Samsung is not talking about the next-generation flagship but the next one, so it is very likely that a new Exynos processor will be available this year, either for the GalaxyNote21 series or the new generation of folding screen GalaxyZFold3. In 2019, AMD and Samsung reached

Apple and Intel want to join the game, what happened to the GPU market?

Intel recently announced that it will launch Xe-LP GPU at the end of this year, officially entering the independent GPU market, and will hand over to TSMC for foundry. At the 2020 WWDC held not long ago, Apple also revealed that it is possible to abandon AMD's GPU and use a self-developed solution based on the ARM architecture. It will launch a self-developed GPU next year. What happened to the GPU market? Why are the giants entering the game?    Massive data calls for high-performance GPU    Why has the demand for GPUs increased so rapidly in recent years? Because we are entering an era where everything needs to be visualized. Dai Shuyu, a partner of Aiwa (Beijing) Technology Co., Ltd., told a reporter from China Electronics News that visualization requires a large amount of graphics and image computing capabilities, and a large amount of high-performance image processing capabilities are required for both the cloud and the edge.    Aiwa (Beijing) Technology Co., Ltd. is an enterp

SSD vs. HDD: that is nice for You?

Deciding on the proper garage isn’t just about comparing ability and value. The type of garage your pc uses topics for overall performance, inclusive of electricity utilization and reliability. Stable country drives (SSDs) and difficult disk drives (HDDs) are the 2 major storage options to consider and it’s critical to realize the satisfactory use for each and how they compare facet by using aspect. What is an HDD? An HDD is a statistics garage tool that lives within the pc. It has spinning disks inside where statistics is saved magnetically. The HDD has an arm with several "heads" (transducers) that study and write statistics at the disk. It is much like how a turntable file participant works, with an LP record (tough disk) and a needle on an arm (transducers). The arm moves the heads throughout the surface of the disk to get admission to distinct information. HDDs are considered a legacy generation, that means they’ve been round longer than SSDs. In popular, they may be dec