





## $\textbf{TENSORFLOW* OPTIMIZED FOR INTEL}^{\circledast} \textbf{ XEON}^{\textsf{TM}}$

### Niranjan Hasabnis, Intel 24<sup>th</sup> May, 2018



\*Other names and brands may be claimed as the property of others



### OUTLINE

- 1. Current status
- 2. Intel-TensorFlow optimization details
- 3. Using Intel-optimized TensorFlow\*



\*Other names and brands may be claimed as the property of others

### **INTEL-OPTIMIZED TENSORFLOW**



\*Other names and brands may be claimed as the property of others



### **INTEL-OPTIMIZED TENSORFLOW PERFORMANCE AT A GLANCE**

#### **TRAINING THROUGHPUT**

# 14X

Intel-optimized TensorFlow ResNet50 training performance compared to default TensorFlow for CPU

Inference and training throughput uses FP32 instructions

#### **INFERENCE THROUGHPUT**



Intel-optimized TensorFlow InceptionV3 inference throughput compared to Default TensorFlow for CPU System configuration:

CPU Thread(s) per core: 2 Core(s) per socket: 28 Socket(s): 2 NUMA node(s): 2 CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Stepping: 4 HyperThreading: ON Turbo: ON Memory 376GB (12 x 32GB) 24 slots, 12 occupied 2666 MHz Disks Intel RS3WC080 x 3 (800GB, 1.6TB, 6TB) BIOS SE5C620.86B.00.01.0004.071220170215 OS Centos Linux 7.4.1708 (Core) Kernel 3.10.0-693.11.6.el7.x86\_64

#### TensorFlowSource:

https://github.com/tensorflow/tensorflow TensorFlow Commit ID: 926fc13f7378d14fa7980963c4fe774e5922e336.

#### TensorFlow benchmarks:

https://github.com/tensorflow/benchmarks

### Unoptimized TensorFlow may not exploit the best performance from Intel CPUs.





Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit <u>http://www.intel.com/performance</u>. Copyright © 2018, Intel Corporation

### **INTEL-OPTIMIZED TENSORFLOW TRAINING PERFORMANCE**

Training Improvement with Intel-optimized TensorFlow over Default (Eigen) CPU Backend



Improvement with Intel-optimized TensorFlow (NHWC)
 Improvement with Intel-optimized TensorFlow (NCHW)

#### System configuration:

CPU Thread(s) per core: 2 Core(s) per socket: 28 Socket(s): 2 NUMA node(s): 2 CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Stepping: 4 HyperThreading: ON Turbo: ON Memory 376GB (12 x 32GB) 24 slots, 12 occupied 2666 MHz Disks Intel RS3WC080 x 3 (800GB, 1.6TB, 6TB) BIOS SE5C620.86B.00.01.0004.071220170215 OS Centos Linux 7.4.1708 (Core) Kernel 3.10.0-693.11.6.el7.x86\_64

#### **TensorFlowSource**:

https://github.com/tensorflow/tensorflow TensorFlow Commit ID: 926fc13f7378d14fa7980963c4fe774e5922e336.

#### TensorFlow benchmarks: https://github.com/tensorflow/benchmarks

Model Data fo Intra Inter OMP NUM KMP BLO THREADS CKTIME rmat op op **NCHW** 56 1 56 1

 VGG16
 NCHW
 56
 1
 56
 1

 InceptionV3
 NCHW
 56
 2
 56
 1

 ResNet50
 NCHW
 56
 2
 56
 1

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit <u>http://www.intel.com/performance</u>. Copyright © 2018, Intel Corporation

### **INTEL-OPTIMIZED TENSORFLOW INFERENCE PERFORMANCE**



Improvement with Intel-optimized TensorFlow (NHWC)

Improvement with Intel-optimized TensorFlow (NCHW)

#### System configuration:

CPU Thread(s) per core: 2 Core(s) per socket: 28 Socket(s): 2 NUMA node(s): 2 CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Stepping: 4 HyperThreading: ON Turbo: ON Memory 376GB (12 x 32GB) 24 slots, 12 occupied 2666 MHz Disks Intel RS3WC080 x 3 (800GB, 1.6TB, 6TB) BIOS SE5C620.86B.00.01.0004.071220170215 OS Centos Linux 7.4.1708 (Core) Kernel 3.10.0-693.11.6.el7.x86\_64

#### TensorFlowSource:

https://github.com/tensorflow/tensorflow TensorFlow Commit ID: 926fc13f7378d14fa7980963c4fe774e5922e336.

#### TensorFlow benchmarks:

https://github.com/tensorflow/benchmarks

| Model       | Data_fo<br>rmat | Intra_<br>op | Inter_<br>op | OMP_NUM_<br>THREADS | KMP_BLO<br>CKTIME |
|-------------|-----------------|--------------|--------------|---------------------|-------------------|
| VGG16       | NCHW            | 56           | 1            | 56                  | 1                 |
| InceptionV3 | NCHW            | 56           | 2            | 56                  | 1                 |
| ResNet50    | NCHW            | 56           | 2            | 56                  | 1                 |

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit <u>http://www.intel.com/performance</u>. Copyright © 2018, Intel Corporation

### **PERFORMANCE GAINS REPORTED BY OTHERS**

Intel TensorFlow Scalability Results Presented by Google @TF Summit March 30, '18

#### TensorFlow with Intel® MKL-DNN integration



"By making use of [Intel's] open source library [MKL-DNN], we were able to achieve a 3x performance benefit and great scaling efficiency on training. This is an example of how important it is to have strong collaborations with companies like Intel." Matt Wood <

(intel)

Follow

New optimized TensorFlow build for EC2 C5 instances (7.4x training performance improvement over stock TF 1.6) - now available on the #AWS Deep Learning AMI, Ubuntu, and Amazon Linux:



Faster training with optimized TensorFlow 1.6 on Amazon EC2 C5 and P3 inst... The AWS Deep Learning AMIs come with latest pip packages of popular deep learning frameworks pre-installed in separate virtual environments so that develo... aws.amazon.com

### **INSIDE INTEL-OPTIMIZED TENSORFLOW**



### **INTEL-TENSORFLOW OPTIMIZATIONS**

- 1. Operator optimizations
- 2. Graph optimizations
- 3. System optimizations



### **OPERATOR OPTIMIZATIONS**

• In TensorFlow, computation graph is a data-flow graph.



### **OPERATOR OPTIMIZATIONS**

- Replace default (Eigen) kernels by highly-optimized kernels (using Intel<sup>®</sup> MKL-DNN)
- Intel<sup>®</sup> MKL-DNN has optimized a set of TensorFlow operations.
- Library is open-source (<u>https://github.com/intel/mkl-dnn</u>) and downloaded automatically when building TensorFlow.

| Forward         | Backward                       |  |  |  |  |  |  |
|-----------------|--------------------------------|--|--|--|--|--|--|
| Conv2D          | Conv2DGrad                     |  |  |  |  |  |  |
| Relu, TanH, ELU | ReLUGrad, TanHGrad,<br>ELUGrad |  |  |  |  |  |  |
| MaxPooling      | MaxPoolingGrad                 |  |  |  |  |  |  |
| AvgPooling      | AvgPoolingGrad                 |  |  |  |  |  |  |
| BatchNorm       | BatchNormGrad                  |  |  |  |  |  |  |
| LRN             | LRNGrad                        |  |  |  |  |  |  |
| MatMul, Concat  |                                |  |  |  |  |  |  |



### **OPERATOR OPTIMIZATIONS IN RESNET50**

| Record Save Load t                               | imeline.json    |                   |                 |             |
|--------------------------------------------------|-----------------|-------------------|-----------------|-------------|
|                                                  |                 | 0 s               |                 |             |
| <ul> <li>/job:localhost/replica:0/tas</li> </ul> | K:0/device:CPU: | U Compute (pid 1) |                 |             |
| )                                                |                 |                   |                 |             |
|                                                  |                 |                   |                 |             |
| 2                                                |                 |                   |                 |             |
| 3005 items selected.                             | Slices (3005)   | 1                 |                 |             |
| Name 🗢                                           |                 |                   | Wall Duration 💌 | Self time 🗢 |
| MklConv2DBackpropFilter                          | <u>r</u>        |                   | 545.502 ms      | 545.502 ms  |
| MklConv2DBackpropInpu                            | <u>t</u>        |                   | 440.090 ms      | 440.090 ms  |
| MklConv2D                                        | 391.094 ms      | 391.094 ms        |                 |             |
| _MklFusedBatchNormGrad                           | 1               |                   | 184.920 ms      | 184.920 ms  |
| _MkIFusedBatchNormWith                           | Relu            |                   | 158.366 ms      | 158.366 ms  |
| MkIReluGrad                                      |                 |                   | 155.874 ms      | 155.874 ms  |
| MklAdd                                           |                 |                   | 109.858 ms      | 109.858 ms  |
| MklAddN                                          |                 |                   | 103.248 ms      | 103.248 ms  |
| Slice                                            |                 |                   | 84.905 ms       | 84.905 ms   |
| Pad                                              |                 |                   | 38.684 ms       | 38.684 ms   |
| ApplyMomentum                                    |                 |                   | 32.977 ms       | 32.977 ms   |
| L2Loss                                           |                 |                   | 28.264 ms       | 28.264 ms   |
| <u>MkIToTf</u>                                   |                 |                   | 22.379 ms       | 22.379 ms   |
| VariableV2                                       |                 |                   | 19.422 ms       | 19.422 ms   |

| Record Save Lo                            | ad         | rn50.eigen.json    | View Options    |      |       |                |    |     |        |      |          |     |
|-------------------------------------------|------------|--------------------|-----------------|------|-------|----------------|----|-----|--------|------|----------|-----|
|                                           |            |                    | 0 s             |      |       |                |    |     |        |      |          | 5 s |
| <ul> <li>/job:localhost/replic</li> </ul> | a:0/ta     | ask:0/device:CPU:0 | Compute (pid 1) |      |       |                | _  |     |        |      |          |     |
| 0                                         |            |                    |                 |      |       |                |    |     |        |      |          |     |
| 1                                         |            |                    |                 |      |       |                |    |     |        |      |          |     |
| 2                                         |            |                    |                 |      |       |                |    |     |        |      |          |     |
| 3                                         |            |                    |                 |      |       |                |    |     |        |      |          |     |
| 4                                         |            |                    |                 |      |       |                |    |     |        |      |          |     |
| 5                                         |            |                    |                 |      |       |                |    |     |        |      |          |     |
| 1490 items select                         | ed.        | Slices (1490)      |                 |      |       |                |    |     |        |      |          |     |
| Name 🗢                                    |            |                    |                 | Wall | Dura  | ation          | •  | Se  | elf ti | me   | $\nabla$ |     |
| FusedBatchNormGra                         | ad         |                    |                 |      | 7,933 | 8.108          | ms | 5 7 | 7,933  | 3.10 | )8 m     | าร  |
| Conv2DBackpropInp                         | <u>out</u> |                    |                 |      | 3,139 | .385           | ms | 3 3 | 3,13   | 9.38 | 35 m     | ıs  |
| Conv2DBackpropFilt                        | ter        |                    |                 |      | 2,539 | 9. <b>36</b> 5 | ms | 3 2 | 2,539  | 9.36 | 65 m     | ıs  |
| <b>FusedBatchNorm</b>                     |            |                    |                 |      | 873   | 3.292          | ms | 3   | 873    | 3.29 | 92 m     | ıs  |
| Conv2D                                    |            |                    |                 |      | 640   | .633           | ms | 5   | 640    | 0.63 | 33 m     | าร  |
| ReluGrad                                  |            |                    |                 |      | 74    | .733           | ms | 3   | 74     | 4.73 | 33 m     | ıs  |
| AddN                                      |            |                    |                 |      | 68    | 8.955          | ms | 3   | 68     | 8.95 | 55 m     | ıs  |
| Add                                       |            |                    |                 |      | 38    | 8.213          | ms | 3   | 38     | 8.21 | 3 m      | าร  |
| Relu                                      |            |                    |                 |      | 38    | 8.010          | ms | 3   | 38     | 8.01 | 0 m      | าร  |

#### Intel-optimized TensorFlow timeline

Default TensorFlow timeline



### **GRAPH OPTIMIZATIONS: FUSION**



### **GRAPH OPTIMIZATIONS: FUSION**



### **GRAPH OPTIMIZATIONS: LAYOUT PROPAGATION**

- What is layout?
  - How do we represent N-D tensor as a 1-D array.



### **GRAPH OPTIMIZATIONS: LAYOUT PROPAGATION**

Input Filter Input Filter Converting to/from optimized layout can Convert Convert Conv2D be less expensive than MklConv2D operating on un-Convert optimized layout. ReLU Convert **MklReLU** All MKL-DNN operators Convert Shape use highly-optimized layouts for TensorFlow Shape tensors. Initial Graph After Layout Conversions

### **GRAPH OPTIMIZATIONS: LAYOUT PROPAGATION**



### **SYSTEM OPTIMIZATIONS: LOAD BALANCING**

- TensorFlow graphs offer opportunities for parallel execution.
- Threading model
  - 1. inter\_op\_parallelism\_threads = max
     number of operators that can be executed
     in parallel
  - 2. intra\_op\_parallelism\_threads = max
     number of threads to use for executing an
     operator
  - 3. OMP\_NUM\_THREADS = MKL-DNN equivalent
     of intra\_op\_parallelism\_threads





### **SYSTEM OPTIMIZATIONS: LOAD BALANCING**

- Incorrect setting of threading model parameters can lead to over- or under-subscription, leading to poor performance.
- Solution:
  - Set these parameters for your model manually.
  - Guidelines on TensorFlow webpage

OMP: Error #34: System unable to allocate necessary resources for OMP thread:

OMP: System error #11: Resource temporarily unavailable

OMP: Hint: Try decreasing the value of OMP\_NUM\_THREADS.



### **SYSTEM OPTIMIZATIONS: MEMORY ALLOCATION**

- Neural network operators (Conv2D) in TensorFlow can allocate large chunks of memory.
- Default CPU allocator did not handle this scenario well:
   frequent alloc/dealloc -> frequent mmap/munmap
- We implemented Pool allocator to fix the problem.



### RUNNING YOUR NEURAL NETWORK MODEL WITH Intel-optimized tensorflow

https://ai.intel.com/tensorflow



### **STEP 1: GETTING INTEL-OPTIMIZED TENSORFLOW**

## It is easy.



### **GETTING INTEL-OPTIMIZED TENSORFLOW: USING PIP**

# Python 2.7
pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow1.6.0-cp27-cp27mu-linux\_x86\_64.whl

# Python 3.5
pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow1.6.0-cp35-cp35m-linux\_x86\_64.whl

# Python 3.6
pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow1.6.0-cp36-cp36m-linux\_x86\_64.whl

# GETTING INTEL-OPTIMIZED TENSORFLOW: USING INTEL DISTRIBUTION OF PYTHON (IDP)

• If IDP is installed

conda install tensorflow -c intel

• Install and activate full IDP package

conda create -n idpFull -c intel intelpython3\_full
activate idpFull



### **GETTING INTEL-TENSORFLOW: BUILD FROM SOURCE**

- \$ git clone https://github.com/tensorflow/tensorflow.git
- \$ cd tensorflow
- \$ ./configure
- \$ bazel build --config=opt --config=mkl
- //tensorflow/tools/pip\_package:build\_pip\_package
- \$ bazel-bin/tensorflow/tools/pip\_package/build\_pip\_package ~/path\_to\_save\_wheel
- \$ pip install --upgrade --user ~/path\_to\_save\_wheel
  /<wheel\_name.whl>



## I got Intel-optimized TensorFlow, do I run my model now?



### **STEP 2: PERFORMANCE GUIDE**

|                                                                                                                                                                                                                                                                                                                         | Develop                                                                                                                                                                                                                   |                                                                                                                                                                                |                                                                                                                                                          |                                                                                                                                                                                        |                                                                                                                                                                            |                                                                                                                                                         |                                                                                                                                                                                                                                                                                                           |                                                                                        |     |                                                                                                                                                                                                                                                                                                                     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                                                                           | PERFC                                                                                                                                                                          | ORMANCE                                                                                                                                                  | MOBILE                                                                                                                                                                                 |                                                                                                                                                                            |                                                                                                                                                         |                                                                                                                                                                                                                                                                                                           |                                                                                        |     |                                                                                                                                                                                                                                                                                                                     |
| Performance<br>Performance Guide<br>Input Popline Performance Models<br>Benchmarks<br>Fixed Point Quantization<br>XLA<br>XLA Overview<br>Broadcasting semantics<br>Developing a new backend for XLA<br>Using JT Compliation<br>Operation Semantics<br>Shapes and Layout<br>Using AOT compliation<br>TensorFlow Versions | the instructions<br>Beyond using th<br>Networks (intel®<br>simply referred to<br>optimizations.<br>The two configur<br>• intra_op<br>the individ<br>• inter_op<br>These configura<br>the snippet belo<br>cores. Testing h | ludes Intel®<br>supported b<br>e latest instr<br>® MKL-DNN)<br>to as 'MKL' or<br>rations lister<br>_paralleli<br>ual pieces in<br>_paralleli<br>tions are set<br>w. For both a | wy the target<br>ruction sets<br>) to TensorFi<br>d below are<br>.sm_thread<br>to this poo<br>.sm_thread<br>t via the if<br>configuratic<br>iat the defa | t CPU.<br>s, Intel® has a<br>Flow. While the<br>pow with MKL.<br>e used to optim<br>is : Nodes that<br>I.<br>is : All ready nu<br>. ConfigProt<br>on options, if t<br>ult is effective | dded support fo<br>e name is not co<br>TensorFlow with<br>nize CPU perforr<br>can use multipl<br>odes are schedu<br>o and passed to<br>hey are unset or<br>for systems ran | the Intel® M<br>mpletely accu<br>Intel® MKL-D<br>nance by adju<br>e threads to p<br>led in this poor<br>tf.Session<br>set to 0, will d<br>ging from one | rrFlow is built from source<br>ath Kernel Library for Dee<br>irrate, these optimizations<br>NN contains details on ti<br>usting the thread pools.<br>avarallelize their execution<br>pl.<br>in the config attribute<br>default to the number of I<br>c CPU with 4 cores to mu<br>umber of threads in both | p Neural<br>are often<br>he MKL<br>will schedu<br>as shown<br>ogical CPU<br>tiple CPUs | ule | Contents<br>General best practices<br>Input pipeline<br>optimization<br>Data formats<br>Common fused Ops<br>RNN Performance<br>Building and installing<br>from source<br>Optimizing for GPU<br>Optimizing for GPU<br>Optimizing for GPU<br>TensorFlow with<br>Intel® MKL DNN<br>Comparing compiler<br>optimizations |
|                                                                                                                                                                                                                                                                                                                         | to the number of<br>config = tf<br>config.intr<br>config.intr<br>tf.session(<br>The Comparing<br>TensorFlow v                                                                                                             | f physical co<br>f.ConfigPro<br>a_op_paral<br>config=con<br>compiler op<br>with Intel®                                                                                         | ores rather t<br>oto()<br>llelism_th<br>llelism_th<br>ifig)<br>timizations<br>MKL DN                                                                     | than logical co<br>reads = 44<br>reads = 44<br>section conta                                                                                                                           | ins the results o                                                                                                                                                          | f tests that us                                                                                                                                         | sed different compiler op                                                                                                                                                                                                                                                                                 | •••                                                                                    | 6   |                                                                                                                                                                                                                                                                                                                     |
|                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                                                                           |                                                                                                                                                                                |                                                                                                                                                          |                                                                                                                                                                                        |                                                                                                                                                                            |                                                                                                                                                         | 'hi™ though the use of In<br>he optimizations also pro                                                                                                                                                                                                                                                    |                                                                                        |     |                                                                                                                                                                                                                                                                                                                     |

https://www.tensorflow.org/performance/performance guide#tensorflow with intel mkl dnn

speedups for the consumer line of processors, e.g. i5 and i7 Intel processors. The Intel published paper TensorFlow\*

Optimizations on Modern Intel® Architecture contains additional details on the implementation.



### **PERFORMANCE TIPS**

- Use pre-built wheel with MKL-DNN optimizations (method 1)
- 2. Setting the threading model correctly
  - We provide best settings for popular CNN models. (<u>https://ai.intel.com/tensorflow-optimizations-intel-xeon-scalable-processor</u>)

Tuning MKL for the best performance

This section details the different configurations and environment variables that can be used to tune the MKL to get optimal performance. Before tweaking various environment variables make sure the model is using the NCHW (channels\_first) data format. The MKL is optimized for NCHW and Intel is working to get near performance parity when using NHWC.

MKL uses the following environment variables to tune performance:

- KMP\_BLOCKTIME Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping.
- KMP\_AFFINITY Enables the run-time library to bind threads to physical processing units.
- KMP\_SETTINGS Enables (true) or disables (false) the printing of OpenMP\* run-time library environment variables during program execution.
- OMP\_NUM\_THREADS Specifies the number of threads to use.

#### https://www.tensorflow.org/performance/performance\_guide#te nsorflow\_with\_intel\_mkl\_dnn



### **SUMMARY**

- Intel-optimized TensorFlow improves TensorFlow CPU performance by up to 14X.
- Getting Intel-optimized TensorFlow is easy.
- TensorFlow performance guide is the best source on performance tips.
- Stay tuned for updates <u>https://ai.intel.com/tensorflow</u>



### **LEGAL DISCLAIMERS**

- Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different
  processor families: Go to: Learn About Intel® Processor Numbers <a href="http://www.intel.com/products/processor\_number">http://www.intel.com/products/processor\_number</a>
- Some results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
- Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
- Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.
- Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported.
- SPEC, SPECint, SPECfp, SPECrate, SPECpower, SPECjbb, SPECompG, SPEC MPI, and SPECjEnterprise\* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.
- TPC Benchmark, TPC-C, TPC-H, and TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information.
- No computer system can provide absolute reliability, availability or serviceability. Requires an Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8800/4800/2800 v2 product families or Intel<sup>®</sup> Itanium<sup>®</sup> 9500 series-based system (or follow-on generations of either.) Built-in reliability features available on select Intel<sup>®</sup> processors may require additional software, hardware, services and/or an internet connection. Results may vary depending upon configuration. Consult your system manufacturer for more details.

For systems also featuring Resilient System Technologies: No computer system can provide absolute reliability, availability or serviceability. Requires an Intel® Run Sure Technology-enabled system, including an enabled Intel processor and enabled technology(ies). Built-in reliability features available on select Intel® processors may require additional software, hardware, services and/or an Internet connection. Results may vary depending upon configuration. Consult your system manufacturer for more details.

For systems also featuring Resilient Memory Technologies: No computer system can provide absolute reliability, availability or serviceability. Requires an Intel® Run Sure Technology-enabled system, including an enabled Intel® processor and enabled technology(ies). built-in reliability features available on select Intel® processors may require additional software, hardware, services and/or an Internet connection. Results may vary depending upon configuration. Consult your system manufacturer for more details.



### **OPTIMIZATION NOTICE**

#### **Optimization Notice**

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804



