11 March 2019
9:00 am - 6:00 pm
Room 302, Level 3, Suntec Singapore Convention & Exhibition Centre

INSTRUCTOR NAME & BIO

MR. GILAD SHAINER (CHAIRMAN, HPC-AI ADVISORY COUNCIL)

Gilad Shainer is an HPC evangelist that focuses on high-performance computing, high-speed interconnects, leading-edge technologies and performance characterizations. He serves as a board member in the OpenPOWER, CCIX, OpenCAPI and UCF organizations, a member of IBTA and contributor to the PCISIG PCI-X and PCIe specifications. Mr. Shainer holds multiple patents in the field of high-speed networking. He is also a recipient of 2015 R&D100 award for his contribution to the CORE-Direct collective offload technology. Mr. Shainer holds an M.Sc. degree and a B.Sc. degree in Electrical Engineering from the Technion Institute of Technology. He also holds patents in the field of high-speed networking

ABSTRACT

PRESENTATION: PAVE THE WAY TO EXASCALE

TBC

INSTRUCTOR NAME & BIO

MR. JEFFREY ADIE (PRINCIPAL SOLUTIONS ARCHITECT, APJI REGION, NVIDIA)

Jeff is a HPC specialist with over 25 years of experience in developing, tuning and porting scientific codes and architecting HPC solutions. Jeff’s primary area of expertise is in CFD and NWP, having previously worked at the New Zealand Oceanographic Institute (now NIWA), Toyota Motor Corporation, and on FEA/CFD analysis for America’s cup class yachts for Team New Zealand. Prior to joining Nvidia, Jeff worked for SGI for 16 years in Asia, Before that, he worked for various Post Production companies in his native New Zealand as a Visual Effects artist, technical director, and software development roles. Jeff holds a post-graduate diploma from the University of Auckland in Computer Science, specialising in Parallel programming and Computer graphics.

ABSTRACT

PRESENTATION: ENGINEERING AN HPC CLUSTER SOLUTION FOR GPU-ACCELERATED WORKLOADS

GPU-accelerated computing has become an integral part of HPC over the last few years and, when coupled with a high performance Infiniband interconnect, it is important to properly architect a solution to maximise productivity. This talk will cover the requirements for GPU accelerators and present best practices for designing and deploying GPU-based HPC solutions to deliver the optimal results.

INSTRUCTOR NAME & BIO

MR. OREN LAADAN (CTO, EXCELERO)

Oren Laadan serves as APAC technical lead/CTO(*) at Excelero. Oren has extensive experience in research, innovation, and technological leadership, as part of his 20+ years professional tenure in the fields of computer systems, broadly defined. Prior to Excelero, he co-founded Cellrox, serving as Chief Technology Officer, to provide mobile virtualization solutions for security, privacy, and isolation use-cases in the Android ecosystem. Before co-founding Cellrox, he was a researcher at Columbia University with a focus on computer systems, virtualization, operating systems, cloud systems, security, and mobile computing. A graduate of the Israel Defense Forces elite “Talpiot” program, Oren pioneered R&D nascent technologies in cloud computing. Dr. Laadan holds a Ph.D. in Computer Science from Columbia University as well as an M.Sc. in Computer Science and a B.Sc. in Physics and Mathematics from Hebrew University

ABSTRACT

PRESENTATION: EXCELERO NVMESH IS A STORAGE GAME CHANGER FOR SUPERCOMPUTING

Excelero’s NVMesh enables supercomputing centers to build high-performance, low-latency storage leveraging distributed NVMe for a variety of HPC use cases, including burst buffer, fast scratch and Nastran analytics. NVMesh enables shared NVMe across any network and supports any parallel file system. Distributed workloads can leverage the full performance of NVMe SSDs with the convenience of centralized storage while avoiding proprietary hardware lock-in and reducing the overall storage TCO. NVMesh enables SciNet to build a petabyte-scale unified pool of distributed high-performance NVMe as a burst buffer for checkpointing. The SciNet NVMe pool delivers 230GB/s of throughput and well over 20M random 4k IOPS and enables SciNet to meet its availability SLA’s

INSTRUCTOR NAME & BIO

MR. AVI TELYAS (DIRECTOR, SYSTEM ENGINEERING, MELLANOX)

Avi Telyas is a Director of System Engineering in Mellanox Technologies, leading APAC Sales Engineering and FAE teams. Based in Tokyo, Avi is deeply involved in large HPC, Machine learning and AI deployments in Japan and APAC. In his free time, Avi is coding over AI frameworks and gets too excited talking about it. Avi holds a BSc (Summa cum laude) in Computer Science from the Technion Institute of Technology, Israel

ABSTRACT

PRESENTATION: IN-NETWORK COMPUTING IN HPC SYSTEM

The latest revolution in HPC is the move to a co-design architecture, a collaborative effort among industry, academia, and manufacturers to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-design recognizes that the CPU has reached the limits of its scalability, and offers In-Network-Computing to share the responsibility for handling and accelerating application workloads, offload CPU. By placing data-related algorithms on an intelligent network, we can dramatically improve the data center and applications performance

INSTRUCTOR NAME & BIO

DR. RICHARD GRAHAM (HPC SCALE SPECIAL INTEREST GROUP CHAIR, HPC-AI ADVISORY COUNCIL)

Dr. Richard Graham is Senior Director, HPC Technology at Mellanox Technologies, Inc. His primary focus is on HPC network software and hardware capabilities for current and future HPC technologies. Prior to moving to Mellanox, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration, and was chairman of the MPI 3.0 standardization efforts.

ABSTRACT

MPI ACCELERATION IN HPC SYSTEM

TBC

INSTRUCTOR NAME & BIO

DR. YANG JIAN (FELLOW, AMD)

Dr. Yang Jian has graduated from CAG&CG State Key Lab with PhD in 2002.  He previous industry experiences included several IC companies on 3D graphics acceleration, Trident Multimedia Co. Ltd, Centrality Communications Co. Ltd and S3 Graphics Co Ltd. In 2006 Dr Yang joined ATI/AMD.  Dr Yang has built up a strong team on performance verification, analysis and optimization of modern GPUs.  The team has completed more than 40 ASICs’ tape-out.  Dr Yang is concentrating on computer architect of HPC and Artificial Intelligence and deep learning algorithm optimization and ROCm open-source platform and HPC apps from AMD

ABSTRACT

PRESENTATION: AMD RADEON INSTINCT™ PLATFORMS FOR HPC AND MACHINE INTELLIGENCE

AMD speeds up the HW/SW platforms for virtualization,  HPC and machine intelligence with 7nm CPU ROME  and 7nm GPU MI60&MI50.  AMD RADOEN INSTICTTM MI60 has 7.4 FP64 computing capability,  64GB/s bandwidth PCIe Gen4 and 200GB/s infinite fabric Links. ROCm over OpenUCX provides short latency and high transmission bandwidth for MPI intranode and internode communications.  Rapid evolution of ROCM open source software stack supports rapid HPC apps’ porting and many machine intelligence frameworks.   Many Math libraries and various machine intelligence primitives are developed and optimized  in ROCm on AMD RADOEN INSTINCT GPUs. AMD is working with many partners to promote ROCm for  computing marketing.

INSTRUCTOR NAME & BIO

MR. ASHRUT AMBASTHA (SR. STAFF OF SOLUTION ARCHITECT, MELLANOX)

Ashrut Ambastha is the Sr. Staff Architect at Mellanox responsible for defining network fabric for large scale InfiniBand clusters and high-performance datacenter fabric. He is also a member of application engineering team that works on product designs with Mellanox silicon devices. Ashrut’s professional interests includes network topologies, routing algorithms and phy signal Integrity analysis/simulations. He holds a MSc and MTech in Electrical Engineering from Indian Institute of Technology-Bombay

ABSTRACT

PRESENTATION: GPU DIRECT ACCELERATE HPC SYSTEM

This talk is aimed towards professionals interested in discussing the role of up-coming Interconnect technologies and network topologies in the field of HPC and Artificial Intelligence. We will start with analysing the latest “in-network computing” architecture of Mellanox network ASICs and software layers. Discuss network topologies and associated resiliency mechanisms to meet the demands of high-performance, yet flexible computing and AI systems. To conclude, we will also dwell upon few offloading technologies built into the network components that can be applied to accelerate HPC and cloud native workloads as well as storage systems

INSTRUCTOR NAME & BIO

MR. ZIVAN ORI (CEO AND CO-FOUNDER, E8 STORAGE)

Mr. Zivan Ori is the co-founder and CEO of E8 Storage. Before founding E8 Storage, Mr. Ori held the position of IBM XIV R&D Manager, being responsible for developing the IBM XIV high-end, grid-scale storage system, and served as Chief Architect at Stratoscale, a provider of hyper-converged infrastructure. Prior to IBM XIV, Mr. Ori headed Software Development at Envara (acquired by Intel) and served as VP R&D at Onigma (acquired by McAfee)

ABSTRACT

PRESENTATION: ACCELERATING MACHINE LEARNING WITH NVME OVER FABRICS FOR GPU CLUSTERS

GPU clusters are the basic building block for machine learning, but typical GPU servers have little room for internal storage. Relying on external storage like NAS or FC SAN does not deliver the anticipated performance needed for the GPUs, especially for training phase. NVMe over Fabrics to the rescue! By connecting shared NVMe enclosures over 100G Ethernet or InfiniBand, it is now possible to saturate the GPUs bandwidth and storage is no longer the bottleneck. E8 Storage will demo shared NVMe for GPU cluster and its impact on the performance of machine learning

  • 09:00 Opening
  • 09:10 Pave the Way to Exascale by Mr. Gilad Shainer, Chairman, HPC-AI Advisory Council
  • 10:00 Engineering an HPC cluster solution for GPU-accelerated workloads by Mr. Jeffrey Adie, Principal Solutions Architect, APJI Region, NVIDIA
  • 10:30 Tea Break
  • 11.00 Excelero NVMesh is a storage game changer for SuperComputing by Mr. Oren Laadan, Chief Technical Officer, Excelero
  • 11:30 In-Network Computing in HPC System by Mr. Avi Telyas, Director of System Engineering, Mellanox
  • 12:00 MPI Acceleration in HPC System by Dr. Richard Graham., HPC Scale Special Interest Group Chair, HPC-AI Advisory Council
  • 12:30 Lunch
  • 13:30 AMD Radeon Instinct™ Platforms For HPC and Machine Intelligence by Dr. Yang Jian, Fellow, AMD
  • 14:00 GPU Direct Accelerate HPC System by Mr. Ashrut Ambastha, Sr. Staff of Solution Architect, Mellanox
  • 14:30 Accelerating Machine Learning with NVMe over Fabrics for GPU Clusters by Mr. Zivan Ori, Chief Executive Officer and Co-founder, E8 Storage
  • 15:00 In-Network Computing In HPC System by Dr. Richard Graham., HPC Scale Special Interest Group Chair, HPC-AI Advisory Council
  • 15:30 Tea Break
  • 16:00 Exascale HPC Fabric Topology by Mr. Ashrut Ambastha, Sr. Staff of Solution Architect, Mellanox
  • 16:30 Exascale HPC Fabric Optimization by Mr. Qingchun Song, HPC-AI Advisory Council
  • 17:00 Panel Discussion and Q&A
  • 17:45 Lucky Draw & Closing by Mr. Qingchun Song, HPC-AI Advisory Council (15 mins)