oneAPI DevSummit at SC 2021

November 14, 2021 | 9 a.m.–6:30 p.m. CT

Join us for hands-on tutorials, tech talks, and panels spanning the oneAPI programming model, AI analytics, performance analysis tools and libraries with global Industry experts from Berkeley, Argonne, NASA, Codeplay, University of Lisbon, University of Edinburg and more. Get the latest information on Intel® oneAPI Toolkits since their initial production release in late 2020.

oneAPI Developer Summit at SC21

Agenda

DPC++
AI Analytics/ FPGA
Libraries

DPC++

9:00 - 9:15 AM CT

Introduction/Opening

9:15 - 10:00 AM CT

Global Experts on eXtreme Performance Panel

The Intel eXtreme Performance Users Group (IXPUG) will present a brief overview of the organization and its activities, followed by a panel discussion focused on the expected adoption, support and application of oneAPI at various computing sites around the world. Experts from the various sites will discuss ongoing work with oneAPI and plans for its support and application at their site, in addition to elucidating…
Presenting

10:30 - 10:45 AM CT Break

10:45 - 11:45 AM CT

Developing for Nvidia GPUs using SYCL with oneAPI

Support from the community for SYCL is growing, with some of the most powerful supercomputers in the world (including Aurora, Perlmutter and Frontier) adopting the programming model for cutting edge research. By migrating your code from CUDA to SYCL it’s not only possible to still target Nvidia GPUs, but it’s also possible to deploy to a wider set of GPUs from different companies including Intel…
Presenting

11:45 - 12:45 PM CT Lunch

12:45 - 1:15 PM CT

Experience in Moving CUDA Optimized FUN3D Kernels to Intel GPUs using Intel OneAPI

This presentation provides an overview of recent efforts to port existing CUDA kernels relevant to unstructured-grid computational fluid dynamics to the oneAPI framework for execution on Intel GPUs. Differences between the programming models are examined and ongoing challenges are discussed.   Download Presentation Deck
Presenting

1:15 - 1:45 PM CT

Acceleration of Integrated Circuit Simulation using SYCL and oneAPI

Simulation of integrated circuits consists of solving matrix-based equations. As the size of the modern circuits increases, the computation time and resources for a simulation have significantly increased. The recent progress in heterogenous hardware platforms has created an opportunity to increase the efficiency of these simulations. In this project, we demonstrate the acceleration of LU decomposition as the core algorithm in solving circuits using SYCL…
Presenting

1:45 - 2:15 PM CT Break

2:15 - 2:45 PM CT

Performance of DPC++ on Representative Structured/Unstructured Mesh

In this session we will give an overview of performance achieved with DPC++ on Intel server CPUs on MG-CFD, an unstructured-mesh CFD mini-app, and OpenSBLI, a structured mesh academic CFD code. We will contrast results to OpenMP implementations and explore key differences and bottlenecks based on VTune and Advisor feedback.   Download Presentation Deck
Presenting

2:45 - 3:15 PM CT

Enabling NAMD for Intel Xe

NAMD is a prominent parallel molecular dynamics application designed for high performance computing of large biomolecular systems. This session focuses on the development of NAMD for Intel GPUs using oneAPI/DPC++ by porting the efficient NAMD CUDA implementation and improving it with flexible vectorization for portable performance. We will also discuss the implementation in NAMD of relative debugging techniques across architectures and programming languages.   Download…
Presenting

3:15 - 3:30 PM CT Break

3:30 - 4:00 PM CT

Performance portability and evaluation of heterogeneous components of SeiSol targeted to upcoming Intel HPC GPUs

We will present our recent results of integrating oneAPI programming model into SeisSol, a software package for simulating seismic waves and earthquake dynamics. During the talk, we are going to demonstrate a set of comparisons of various SeisSol specific benchmarks compiled and executed with oneAPI, hipSYCL, and CUDA. At the end, we are going to present performance of the whole application obtained with 2 Nvidia…
Presenting

4:00 - 4:30 PM CT

Enhancing Online Planning on low-power CPU-GPU SoCs via Bloom Filter Based Memory

This work proposes a new design for online planning for intelligent agents modelled as POMDPs. We introduce an online planner enhanced with Bloom filter memory which we implement and evaluate on a low-power CPU+GPU SoC. Using the DPC++ parallel execution model of the most computing-intensive kernel of our Bloom filter implementation, we reduce the overall planning time by 3.5x to 7.5x for three representative benchmarks…
Presenting

4:30 - 5:30 PM CT

The oneAPI Software Abstraction for Heterogeneous Computing

oneAPI is a cross-industry, open, standards-based unified programming model. The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low-level hardware interface to support cross-architecture programming. It builds upon industry standards and provides an open, cross-platform developer stack to improve productivity and innovation. At the core of oneAPI is the DPC++…
Presenting

5:30 - 6:30 PM CT

Happy Hour

AI Analytics/ FPGA

9:00 - 9:15 AM CT

Introduction/Opening

9:15 - 10:00 AM CT

Global Experts on eXtreme Performance Panel

The Intel eXtreme Performance Users Group (IXPUG) will present a brief overview of the organization and its activities, followed by a panel discussion focused on the expected adoption, support and application of oneAPI at various computing sites around the world. Experts from the various sites will discuss ongoing work with oneAPI and plans for its support and application at their site, in addition to elucidating…
Presenting

10:30 - 10:45 AM CT Break

11:45 - 12:45 PM CT Lunch

12:45 - 1:15 PM CT

Spatial DPC++ constructs for algorithm acceleration with FPGAs

Field programmable gate arrays (FPGAs) have gained increasing mindshare as an architecture through which workloads can be accelerated in a power-efficient way, particularly when existing accelerators aren’t tuned for or well matched with a workload of interest. They allow a custom architecture to be built for the algorithm of interest without resorting to costly ASIC design, and therefore bridge a gap in performance between a…
Presenting

1:15 - 1:45 PM CT

oneAPI AI Analytics – End to End

Using an end-to-end machine learning platform to build and deploy Intel AI models at scale. Bridge science and engineering teams in a clear and collaborative machine learning management environment in which communicate and reproduce results with interactive workspaces, dashboards, dataset organization, experiment tracking and visualization, a model repository and API to consume them. All possible through a unique open source Platform, cnvrg.io and Intel AI…
Presenting

1:45 - 2:15 PM CT Break

2:15 - 2:45 PM CT

Using Arhat framework with Intel® oneDNN library and OpenVINO™ toolkit for object detection applications

Arhat is a cross-platform deep learning framework that converts neural network descriptions into lean standalone executable code. This approach provides significant benefits because of a simple and straightforward deployment process. Arhat is integrated with Intel oneAPI deep learning libraries. Arhat backend for Intel generates C++ code that directly calls oneDNN API. Furthermore, Arhat provides a module that consumes models produced by the OpenVINO Model Optimizer.…
Presenting

2:45 - 3:15 PM CT

The Great CEED Bake-off: DPC++ Edition

The CEED Bake-off Problems are a collection of benchmarks representing important compute-intensive kernels and solvers relevant to high-order finite and spectral element methods, such as those used in the Nek5000 CFD code. In this talk we present a DPC++ implementation of the CEED Bake-off Problems. Benchmark results are given for Intel CPUs and GPUs. Intel Advisor is used to conduct cache-aware roofline analysis and understand…
Presenting

3:15 - 3:30 PM CT Break

3:30 - 3:45 PM CT

Accelerating Deep Learning with Intel Extension for PyTorch: a MedMNIST Classification Decathlon example

We showcase how to use Intel Extension for PyTorch (IPEX) for training and inference on the MedMNIST datasets, a collection of 10 MNIST-like open datasets on various medical imaging classification tasks such as pathology images, chest x-ray, OCT images. The demo runs on the Intel DevCloud for oneAPI on Ice Lake. We compare the performance with stock PyTorch and observe the performance gain that Intel…
Presenting

3:45 - 4:00 PM CT

Inference with ArrayFire and oneAPI​

Session will demonstrate a simple ML inference pipeline using the OpenCL interop of oneAPI. The ArrayFire library and the derivative Flashlight project will be introduced and used as motivating examples. Data will flow from the oneAPI Video Processing Library to these existing libraries as an example of integrating oneAPI with existing GPU codebases. Download Presentation Deck
Presenting

4:00 - 4:30 PM CT

Edge Intelligence and Its application in CAVs

The proliferation of Internet of Things and the success of rich cloud services have pushed the horizon of a new computing paradigm, Edge Computing, which calls for processing the data at the edge of the network. Edge computing has the potential to address the concerns of response time requirement, battery life constraint, bandwidth cost saving, as well as data safety and privacy. In this talk,…
Presenting

4:30 - 5:30 PM CT

The oneAPI Software Abstraction for Heterogeneous Computing

oneAPI is a cross-industry, open, standards-based unified programming model. The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low-level hardware interface to support cross-architecture programming. It builds upon industry standards and provides an open, cross-platform developer stack to improve productivity and innovation. At the core of oneAPI is the DPC++…
Presenting

Libraries

9:00 - 9:15 AM CT

Introduction/Opening

9:15 - 10:00 AM CT

Global Experts on eXtreme Performance Panel

The Intel eXtreme Performance Users Group (IXPUG) will present a brief overview of the organization and its activities, followed by a panel discussion focused on the expected adoption, support and application of oneAPI at various computing sites around the world. Experts from the various sites will discuss ongoing work with oneAPI and plans for its support and application at their site, in addition to elucidating…
Presenting

10:30 - 10:45 AM CT Break

10:45 - 11:45 AM CT

Multi-GPU Programming - Scale-Up and Scale-Out made easy, using the Intel MPI Library

For shared memory programming of GPGPU systems, users either have to manually run their domain decomposition along available GPUs as well as GPU Tiles. Or leverage implicit scaling mechanisms that transparently scale their offload code across multiple GPU-Tiles. The former approach can be cumbersome and the latter approach is not always the best performing one. The Intel MPI library can take that burden from users…
Presenting

11:45 - 12:45 PM CT Lunch

3:30 - 4:00 PM CT

Accelerating epistasis detection on Intel CPUs and discrete GPUs with Intel® Advisor

In this tutorial, we will introduce the Cache-aware Roofline Model (CARM) and expose its basic principles when modelling the performance upper-bounds of Intel CPU and GPU devices. For this purpose, we will rely on epistasis detection as a case-study, which is an important application in bioinformatics. By using DPC++ to deploy the application in Intel Iris Xe MAX (DG1), we will show how Intel® Advisor…
Presenting

1:15 - 1:45 PM CT

Visual Analysis Challenges in the Age of Data

Ninety percent of all data in the world has been created in the past two years alone, at a rate of exabytes per day. New data of all kinds — structured, unstructured, quantitative, qualitative, spatial, and temporal — is growing exponentially and in every way. Given the vast amount of data being produced, one of our greatest scientific challenges is to effectively understand and make…
Presenting

1:45 - 2:15 PM CT Break

2:15 - 2:45 PM CT

A Synergistic Approach for Abstracting Hardware Heterogeneity and Reducing Algorithmic Complexity: Powering HiCMA with oneAPI for HPC Scientific Applications

We leverage performance of HPC scientific applications using tile low-rank matrix computations. The idea consists in revisiting tile algorithms using low-rank matrix approximations by exploiting the data sparsity of the dense operator coming from computational astronomy, seismic imaging, and climate/weather prediction applications. We rely on the HiCMA software library for providing sequential numerical kernels and oneAPI runtime system for orchestrating the resulting computational tasks onto…
Presenting

12:45 - 1:15 PM CT

Getting Ready to Aurora exa-scale supercomputer using Intel Advisor Roofline on Intel CPUs and GPUs

Aurora at Argonne National Laboratory is one of US DOE’s exa-scale supercomputers that will be deployed in 2022. OneAPI provides all essential components for porting applications to Aurora with optimal performance. OneAPI Intel Advisor roofline features provide intuitive performance analysis results on Intel GPUs, and useful insights about performance bottlenecks for further optimization. We present our Advisor use-cases from our workloads including MD (molecular dynamics)…
Presenting

3:15 - 3:30 PM CT Break

2:45 - 3:15 PM CT

Driving a New Era of Accelerated Computing using OpenMP* with Intel® oneAPI Compilers

You are already deeply invested in OpenMP for Multicore, so now just a few additions will launch your code into the xPU era! OpenMP* is a popular, portable, and widely supported programming model. OpenMP provides capabilities for threaded and task-based parallelism for multicore, data-parallel programming using Single Instruction Multiple Data (SIMD) for vector architectures, and most recently support for a programming model for offload to…
Presenting

4:00 - 4:30 PM CT

Exploiting Heterogeneous Computing with Intel® oneAPI Threading Building Blocks (oneTBB)

This session will discuss how to utilize Intel® oneAPI Threading Building Blocks (oneTBB) to balance workloads across heterogenous compute resources. As XPU programming grows, applications should be able to utilize CPU + other devices to maximize throughput.   Download Presentation Deck
Presenting

4:30 - 5:30 PM CT

The oneAPI Software Abstraction for Heterogeneous Computing

oneAPI is a cross-industry, open, standards-based unified programming model. The oneAPI specification extends existing developer programming models to enable a diverse set of hardware through language, a set of library APIs, and a low-level hardware interface to support cross-architecture programming. It builds upon industry standards and provides an open, cross-platform developer stack to improve productivity and innovation. At the core of oneAPI is the DPC++…
Presenting
×


Learn about joining the UXL Foundation:

Join now