IPDPS 2024 Conference

General IPDPS Info

IN COOPERATION WITH

and

IPDPS 2024 Advance Program

Please visit the IPDPS website regularly for updates, including schedule revisions.

For workshop programs and schedules, click link to their website below.

Authors who have corrections should send email to contact@ipdps.org giving full details.

(Updated as of 15 May 2024)

MONDAY - 27 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

MONDAY
Workshops

ALL DAY

See each individual
workshop program
for schedule details

1	HCW	Heterogeneity in Computing Workshop
2	RAW	Reconfigurable Architectures Workshop
3	APDCM	Advances in Parallel and Distributed Computational Models
4	AsHES	Accelerators and Hybrid Emerging Systems
5	EduPar	NSF/TCPP Workshop on Parallel and Distributed Computing Education
6	ESSA	Extreme-Scale Storage and Analysis
7	GrAPL	Graphs, Architectures, Programming, and Learning
8	HiCOMB	High Performance Computational Biology
9	PAISE	Parallel AI and Systems for the Edge

Reception
6:00 PM -7:30 PM

IPDPS - TCPP Welcome Reception

TUESDAY - 28 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Opening Session
8:15 AM - 8:30 AM

Opening Session

Keynote Session
8:30 AM - 9:30 AM

KEYNOTE SPEECH

Session Chair: DK Panda

AuroraGPT: Exploring AI Assistant for Science

Franck Cappello *
Argonne National Laboratory

* Recipient of the 2024 IEEE Charles Babbage Award

Read more information

Morning Break 9:30 AM -10:00 AM

All Day

Main Conference Poster-Accept Papers

See listing here. Posters on Display in Ballroom Foyer

Parallel Technical
Sessions 1A & 1B

10:00 AM - 12:00 PM

Session 1A: Numerical Linear Algebra

Session Chair: Grey Ballard

PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks
Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun (University of Science and Technology of China)
VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs
Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei (Institute of Computing Technology, Chinese Academy of Sciences); Kun Li (Microsoft Research); Xianmeng Jiang, Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences)
Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES
Ichitaro Yamazaki (SNL); Andrew J. Higgins (Temple University); Erik G. Boman (SNL); Daniel B. Szyld (Temple University)
Alternative Basis Matrix Multiplication is Fast and Stable
Oded Schwartz (The Hebrew University of Jerusalem); Sivan Toledo (Tel Aviv University); Noa Vaknin (The Hebrew University of Jerusalem); Gal Wiernik (Tel Aviv University)
Fast multiplication of random dense matrices with sparse matrices
Tianyu Liang, Riley Murray, Aydin Buluc, James Demmel (UC Berkeley)
A Cholesky QR Type Algorithm for Computing Tall-Skinny QR Factorization with Column Pivoting
Takeshi Fukaya (Hokkaido University); Yuji Nakatsukasa (University of Oxford); Yusaku Yamamoto (The University of Electro-Communications)

Session 1B: Containers and Serverless Computing

Session Chair: Alfredo Goldman

CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems
Yunfei Gu, Yihui Lu, Chentao Wu, Jie Li, Minyi Guo (Shanghai Jiao Tong University)
Tackling Cold Start in Serverless Computing with Multi-Layer Container Reuse
Amelie Chi Zhou (Hong Kong, Baptist University); Rongzheng Huang (Shenzhen University); Zhoubin Ke (Shenzhen University); Yusen Li (Nankai University); Yi Wang, Rui Mao (Shenzhen University)
PALDIA: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware
Vivek M. Bhasi, Aakash Sharma, Shruti Mohanty, Mahmut Taylan Kandemir, Chita R. Das (The Pennsylvania State University)
Application-Attuned Memory Management for Containerized HPC Workflows
Moiz Arif, Avinash Maurya, M. Mustafa Rafique (Rochester Institute of Technology); Dimitrios S. Nikolopoulos (Virginia Tech); Ali R. Butt (Rochester Institute of Technology)
FEDGE: An Interference-Aware QoS Prediction Framework for Black-Box Scenario in IaaS Clouds with Domain Generalization
Yunlong Cheng, Xiuqi Huang, Zifeng Liu, Jiadong Chen, Xiaofeng Gao (Shanghai Jiao Tong University); Zhen Fang, Yongqiang Yang (Huawei)
Software Resource Disaggregation for HPC with Serverless Computing
Marcin Copik, Marcin Chrapek (ETH Zürich); Larissa Schmid (Karlsruhe Institute of Technology); Alexandru Calotoiu, Torsten Hoefler (ETH Zürich)

12:00 PM – 1:30 PM

Lunch & PhD Program

Parallel Technical
Sessions 2A & 2B

1:30 PM – 2:30 PM

Session 2A: Algorithms on Trees

Session Chair: Cynthia Phillips

AMST: Accelerating Large-Scale Graph Minimum Spanning Tree Computation on FPGA
Haishuang Fan, Rui Meng, Qichu Sun, Jingya Wu, Xiaowei Li, Guihai Yan (State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences)
Wait-free trees supporting asymptotically efficient range queries
Ilya Kokorin (ITMO University); Victor Yudov (ITMO University); Vitaly Aksenov (City, University of London); Dan Alistarh (ISTA)
Low-Depth Spatial Tree Algorithms
Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski (ETH Zurich)

Session 2B: Federated and Distributed Learning

Session Chair: Amelie Zhou

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
Juntao Zhao, Borui Wan (The University of Hong Kong); Yanghua Peng, Haibin Lin, Yibo Zhu (ByteDance Inc.) Chuan Wu (The University of Hong Kong)
Enhancing the Generalization of Personalized Federated Learning with Multi-head Model and Ensemble Voting
Van An Le (National Institute of Advanced Industrial Science and Technology, Japan); Nam Duong Tran, Phuong Nam Nguyen, Thanh Hung Nguyen, Phi Le Nguyen (Hanoi University of Science and Technology, Vietnam) Truong Thao Nguyen (National Institute of Advanced Industrial Science and Technology, Japan); Yusheng Ji (National Institute of Informatics, Japan);
UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving
Yifei Li (Southern University of Science and Technology); Ryan Chard (Argonne National Laboratory); Yadu Babuji, Kyle Chard (University of Chicago); Ian Foster (Argonne National Laboratory); Zhuozhao Li (Southern University of Science and Technology)

Parallel Technical
Sessions 3A & 3B

2:30 PM – 4:10 PM

Session 3A: Applications I

Session Chair: Edgar Solomonik

Scalable and Differentiable Simulator for Quantum Computational Chemistry
Zhiqian Xu (Institute of Computing Technology, Chinese Academy of Sciences); Honghui Shang, Yi Fan, Xiongzhi Zeng (University of Science and Technology of China); Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Chu Guo (Hunan normal University)
Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing
S.M. Ferdous (Pacific Northwest National Laboratory); Reece Neff (North Carolina State University); Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski (Pacific Northwest National Laboratory); Michela Becchi (North Carolina State University); Mahantesh Halappanavar, (Pacific Northwest National Laboratory
Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging
Niteya Shah (Virginia Tech); Christine Sweeney (Los Alamos National Laboratory);Vinay Ramakrishnaiah (Los Alamos National Laboratory); Jeffrey Donatelli (Lawrence Berkeley National Laboratory); Wu-chun Feng (Virginia Tech)
Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications
Xiran Zhang, Sameh Abdulah (King Abdullah University of Science and Technology); Jian Cao (University of Houston); Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes (King Abdullah University of Science and Technology)
Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer
Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang (Tsinghua University); Xiaohui Duan (Shandong University); Guangwen Yang (Tsinghua University)

Session 3B: Scheduling I

Session Chair: Oguz Selvitopi

CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform
Taolei Wang, Chao Li, Jing Wang, Cheng Xu, Xiaofeng Hou, Minyi Guo (Shanghai Jiao Tong University)
Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources
Thanh Son Phung, Douglas Thain (University of Notre Dame)
nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling
David Álvarez, Kevin Sala, Vicenç Beltran (Barcelona Supercomputing Center)
SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs
Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs (Chalmers University of Technology)
Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining Baolin Li (Northeastern University); Siddharth Samsi (MIT); Vijay Gadepally (MIT Lincoln Laboratory); Devesh Tiwari (Northeastern University)

Late Afternoon Break 4:10 PM – 4:40 PM

PLENARY Session:
Best Papers
4:40 PM - 6:40 PM

Best Paper Nominees

Session Chair: Umit Catalyurek

CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
Jan Laukemann, Thomas Gruber, Georg Hager (University of Erlangen-Nuremberg); Dossay Oryspayev (Brookhaven National Laboratory); Gerhard Wellein (Erlangen National High Performance Computing Center)
ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor
Yi-chien Lin (University of Southern California); Yuyang Chen (Tsinghua University); Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha (Intel); Viktor Prasanna (University of Southern California)
Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures
Yuke Li, Arjun Kashyap, Weicong Chen (University of California, Merced); Yanfei Guo (Argonne National Laboratory); Xiaoyi Lu (University of California, Merced)
Performance-Portable Multiphase Flow Solutions with Discontinuous Galerkin Methods
Tobias Flynn (University of Warwick); Robert Manson-Sawko (IBM-Research Europe); Gihan Mudalige (University of Warwick)

WEDNESDAY - 29 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

All Day

Main Conference Poster-Accept Papers

See listing here. Posters on Display in Ballroom Foyer

Parallel Technical
Sessions 4A & 4B

8:30 AM – 10:30 AM

Session 4A: Applications II

Session Chair: Josh Milthorpe

Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method
Ahmed H. Mahmoud (Autodesk Research and University of California, Davis); Hesam Salehipour, Massimiliano Meneghin (Autodesk Research)
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
Herbert Owen (Barcelona Supercomputing Center); Dominik Ernst (FAU Erlangen-Nürnberg); Thomas Gruber (FAU Erlangen-Nürnberg); Oriol Lemkuhl, Guillaume Houzeaux, Lucas Gasparino (Barcelona Supercomputing Center); Gerhard Wellein (FAU Erlangen-Nürnberg)
CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction
Zizhe Jian (University of California, Riverside); Sheng Di (Argonne National Laboratory); Jinyang Liu (University of California, Riverside); Kai Zhao (Florida State University); Xin Liang (University of Kentucky); Haiying Xu (NCAR); Robert Underwood (Argonne National Laboratory); Shixun Wu, Zizhong Chen (University of California, Riverside); Franck Cappello (Argonne National Laboratory)
Automating GPU Scalability for Complex Scientific Models: Phonon Boltzmann Transport Equation
Eric Heisler (University of Utah); Siddharth Saurav (The Ohio State University); Aadesh Deshmukh (University of Utah); Sandip Mazumder (The Ohio State University); Hari Sundar (University of Utah)
An O(N) distributed-memory parallel direct solver for planar integral equations
Tianyu Liang (The University of California, Berkeley); Chao Chen (North Carolina State University); Per-gunnar Martinsson, George Biros (The University of Texas at Austin)
Exploiting long vectors with a CFD code: a co-design show case
Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, Filippo Mantovani (Barcelona Supercomputing Center)

Session 4B: I/O and Storage Systems

Session Chair: Hari Subramoni

Capturing Periodic I/O Using Frequency Techniques
Ahmad Tarraf (Technical University of Darmstadt); Alexis Bandet, Francieli Boito (Inria, University of Bordeaux); Guillaume Pallez (Inria); Felix Wolf (Technical University of Darmstadt)
To Store or Not to Store: a graph theoretical approach for Dataset Versioning
Anxin Guo (Northwestern University); Jingwei Li (Columbia University); Pattara Sukprasert (Databricks); Samir Khuller (Northwestern University); Amol Deshpande (University of Maryland); Koyel Mukherjee (Adobe Research)
TunIO: An AI-powered Framework for Optimizing HPC I/O
Neeraj Rajesh, Keith Bateman (Illinois Institute of Technology); Jean luca Bez (Lawrence Berkeley National Laboratory); Suren Byna (Ohio State University); Anthony Kougkas, Xian-he Sun (Illinois Institute of Technology)
A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis
Dong Kyu Sung (Seoul National University); Yongseok Son (Chung-Ang University); Alex Sim, Kesheng Wu (Lawrence Berkeley National Laboratory); Suren Byna (The Ohio State University); Houjun Tang (Lawrence Berkeley National Laboratory); Hyeonsang Eom (Seoul National University); Changjong Kim, Sunggon Kim (Seoul National University of Science and Technology)
NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support
Darren Ng, Andrew Lin, Arjun Kashyap (University of California, Merced); Guanpeng Li (University of Iowa); Xiaoyi Lu (University of California, Merced)
Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration
Hammad Ather (University of Oregon); Jean luca Bez (Lawrence Berkeley National Laboratory); Yankun Xia, Suren Byna (The Ohio State University)

Morning Break 10:30 AM -11:00 AM

Keynote Session
11:00 AM – 12:00PM

KEYNOTE SPEACH

Session Chair: Saday Sadayappan

PyTorch 2 and its Compiler Technologies

Peng Wu
Meta

Read more information

12:00 PM – 1:30 PM

Lunch & PhD Program

Parallel Technical
Sessions 5A & 5B

1:30 AM – 2:30 AM

Session 5A: Performance

Session Chair: Ali Butt

CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems
Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella (UC Davis)
Comparative Study of Large Language Model Architectures on Frontier
Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas (Oak Ridge National Laboratory), Quentin Anthony (Ohio State University)

Predicting Cross-Architecture Performance of Parallel Programs
Daniel Nichols, Alexander Movsesyan (University of Maryland); Jae-seung Yeom, Abhik Sarkar, Daniel Milroy, Tapasya Patki (Lawrence Livermore National Laboratory); Abhinav Bhatele (University of Maryland)

Session 5B: Resilience

Session Chair: Jay Lofstead

DRUTO: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications
Md Hasanur Rahman (University of Iowa); Sheng Di (Argonne National Laboratory); Shengjian Guo (Amazon Web Services); Xiaoyi Lu (University of California, Merced); Guanpeng Li (University of Iowa); Franck Cappello (Argonne National Laboratory)
MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR
Jad El Karchi (Inria); Hanze Chen, Ali TehraniJamsaz, Ali Jannesari (Iowa State University); Mihail Popov, Emmanuelle Saillard (Inria)
A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems
Shuaipeng Zhang (Harbin Institute of Technology, Shenzhen); Shiyi Li (Harbin Institute of Technology, Shenzhen); Chentao Wu (Shanghai Jiao Tong University); Ruobin Wu (Harbin Institute of Technology, Shenzhen); Saiqin Long (Jinan University); Wen Xia (Harbin Institute of Technology, Shenzhen)

Parallel Technical
Sessions 6A & 6B

2:30 PM – 4:10 PM

Session 6A: Accelerators

Session Chair: Davide Conficconi

Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
Payman Behnam, Uday Kamal (Georgia Institute of Technology); Ali Shafiee (Meta); Alexey Tumanov, Saibal Mukhopadhyay (Georgia Institute of Technology)
IPU-EpiDet: Identifying Gene Interactions on Massively Parallel Graph-Based AI Accelerators
Ricardo Nobre, Aleksandar Ilic (INESC-ID); Sergio Santander-Jiménez (University of Extremadura (UNEX)); Leonel Sousa (INESC-ID)
DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware
Malith Jayaweera, Yanyu Li (Northeastern University); Yanzhi Wang (Northeastern University); Bin Ren (William & Mary); David Kaeli (Northeastern University);
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du (The Hong Kong University of Science and Technology/ Guangzhou); Qiang Wang (Harbin Institute of Technology, Shenzhen); Xiaowen Chu (The Hong Kong University of Science and Technology/ Guangzhou)
Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA
Emanuele Del Sozzo (RIKEN Center for Computational Science); Xinyuan Wang (University of Toronto); Boma Adhi, Carlos Cortes (RIKEN Center for Computational Science); Jason Anderson (University of Toronto); Kentaro Sano (RIKEN Center for Computational Science)

Session 6B: Scheduling II

Session Chair: Suren Byna

Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Clusters
Abeda Sultana (University of Louisiana at Lafayette); Fei Xu (East China Normal University); Xu Yuan, Li Chen, Nian-feng Tzeng (University of Louisiana at Lafayette)
Fast Abort-freedom for Deterministic Transactions
Chen Chen (University of Illinois at Chicago); Xingbo Wu (Microsoft Research); Wenshao Zhong, Jakob Eriksson (University of Illinois at Chicago)
SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors
Marta Navarro, Josué Feliu, Salvador Petit, María e. Gómez (Universitat Politècnica de València); Julio Sahuquillo (Universitat Politècnica de València)
Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters
Di Zhang, Monish Soundar Raj (University of North Carolina at Charlotte); Bing Xie (Microsoft); Sheng Di (ANL); Dong Dai (University of North Carolina at Charlotte)
Automatic Task Parallelization of Dataflow Graphs in ML/DL Models
Srinjoy Das, Lawrence Rauchwerger (University of Illinois at Urbana Champaign)

Afternoon Break 4:10 PM - 4:40 PM

4:10 PM - 5:30 PM

Conference Poster Session

- Authors Available at Poster Boards

5:30 PM

PHD Forum - Students at posters

6:30 PM - 7:30 PM

Pre-Banquet Reception

7:30 PM

Banquet(Paper and Poster Awards)

THURSDAY - 30 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

All Day

Main Conference Poster-Accept Papers

See listing here. Posters on Display in Ballroom Foyer

Parallel Technical
Sessions 7A & 7B

8:30 AM – 10:30 AM

Session 7A: Message Passing and Communication

Session Chair: Doru Thom Popovici

Adaptive Prefetching for Fine-grain Communication in PGAS Programs
Thomas B. Rolinger (NVIDIA); Alan Sussman (University of Maryland)
An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression
Jiajun Huang (University of California, Riverside); Sheng Di (Argonne National Laboratory); Xiaodong Yu (Stevens Institute of Technology); Yujia Zhai (University of California, Riverside); Zhaorui Zhang (The Hong Kong Polytechnic University); Jinyang Liu (University of California, Riverside); Xiaoyi Lu (University of California, Merced); Ken Raffenetti, Hui Zhou (Argonne National Laboratory); Kai Zhao (Florida State University); Zizhong Chen (University of California, Riverside); Franck Cappello, Yanfei Guo (Argonne National Laboratory); Rajeev Thakur (Argonne National Laboratory)
MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic
Zijian Li, Zixuan Chen, Yiying Tang, Xin Ai, Yuanyi Zhu, Zhigao Zhao, Jiang Shao (Fudan University); Guowei Liu (Tsinghua University); Sen Liu (Fudan University); Bin Liu (Tsinghua University); Yang Xu (Fudan University)
Fast Policy Convergence for Traffic Engineering with Proactive Distributed Message-Passing
Zicheng Wang, Zirui Zhuang, Jingyu Wang, Qi Qi, Haifeng Sun, Jianxin Liao (Beijing University of Posts and Telecommunications)
The Self-adaptive and Topology-aware MPI_ Bcast leveraging Collective offload on Tianhe Express Interconnect
Chongshan Liang; Yi Dai (NUDT); Jun Xia (Nanhu Lab); Jinbo Xu, Jintao Peng, Weixia Xu, Ming Xie, Jie Liu, Zhiquan Lai, Sheng Ma, Qi Zhu (NUDT)

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions
Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda (The Ohio State University)

Session 7B: Communication Subsystems

Session Chair: Dip Sankar Banerjee

Flexible NVMe Request Routing for Virtual Machines
Tu Dinh Ngoc, Boris Teabe, Georges Da Costa, Daniel Hagimont (IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3)
HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance
Xiang Chen (Huazhong University of Science and Technology); Tao Lu, Jiapin Wang (DapuStor); Yu Zhong (Huazhong University of Science and Technology); Guangchun Xie (DapuStor); Xueming Cao, Yuanpeng Ma, Bing Si, Feng Ding, Ying Yang, Yunxing Huang (DapuStor); Yafei Yang, You Zhou, Fei Wu (Huazhong University of Science and Technology)
Graph Analytics on Jellyfish Topology
Md Nahid Newaz (Oakland University); Sayan Ghosh, Joshua Suetterlein, Nathan T. Tallent (Pacific Northwest National Laboratory); Md Atiqul Mollah (Cornelis Networks); Hua Ming (Oakland University)
TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture
Sukarn Agarwal (IIT Mandi); Shounak Chakraborty, Magnus Sjalander (Norwegian University of Science and Technology)
LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory
Li Wan, Fu Chao, Qiang Li, Jun Han (Fudan University)
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching
Pengmiao Zhang, Neelesh Gupta (University of Southern California); Rajgopal Kannan (DEVCOM Army Research Lab); Viktor Prasanna (University of Southern California)

Morning Break 10:30 AM -11:00 AM

Keynote Session
11:00 AM – 12:00PM

KEYNOTE SPEECH

Session Chair: Rich Vuduc

Computing Systems in the Foundation Model Era

Kunle Olukotun
Stanford University

Read more information

12:00 PM – 1:30 PM

Lunch & PhD Program

Parallel Technical
Sessions 8A & 8B

1:30 AM – 2:50 PM

Session 8A: Graph and MoE Learning

Session Chair: Ali Jannesari

Aurora: A Versatile and Flexible Accelerator for Generic Graph Neural Networks
Jiaqi Yang (George Washington University); Hao Zheng (University of Central Florida); Ahmed Louri (George Washington University)
cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding
Lihan Hu (The University of Iowa); Jing Li (Nvidia); Peng Jiang (The University of Iowa)
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda (The Ohio State University)
TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning
Gangda Deng, Hongkuan Zhou (University of Southern California); Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li (Meta); Rajgopal Kannan (DEVCOM US Army Research Lab); Viktor Prasanna (University of Southern California)

Session 8B: Performance Optimization

Session Chair: Sara Neuwirth

OpenFFT-SME: An Efficient Outer Product Pattern FFT Library on ARM SME CPUs
Ruge Zhang, Haipeng Jia, Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Baicheng Yan, Penghao Ma, Long Wang (Huawei Technologies Co. Ltd); Wenxuan Zhao (Institute of Computing Technology, Chinese Academy of Sciences)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Abhisek Kundu (Intel Corporation); Antonio Noack (Friedrich Schiller Universität Jena); Hans Pabst (Intel Corporation); Alexander Breuer (Friedrich Schiller Universität Jena); Alexander Heinecke (Intel Corporation)
Optimizing General Matrix Multiplications on Modern Multi-core DSPs
Kainan Yu, Xinxin Qi, Peng Zhang, Jianbin Fang, Dezun Dong, Ruibo Wang, Tao Tang, Chun Huang, Yonggang Che (National University of Defense Technology); Zheng Wang (Northwest University)
Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems
Yufan Xia (The Chinese University of Hong Kong); Giuseppe Maria Junior Barca (The University of Melbourne)

Afternoon Break 2:50 PM -3:30 PM

Parallel Technical
Sessions 9A & 9B

3:30 PM – 4:50 PM

Session 9A: Distributed Algorithms

Session Chair: Khaled Ibrahim

Time-Color Tradeoff on Uniform Circle Formation by Asynchronous Robots
Debasish Pattanayak (Carleton University); Gokarna Sharma (Kent State University)
LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast
Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo (Huazhong University of Science and Technology); Rui Hao (Nanjing University); Xia Xie (Hainan University); Hai Jin (Huazhong University of Science and Technology)
MAAD: A Distributed Anomaly Detection Architecture for Microservices Systems
Rongyuan Tan, Zhuozhao Li (Southern University of Science and Technology)
OneShot: View-Adapting Streamlined BFT Protocols with Trusted Execution Environments Jeremie Decouchant (Delft University of Technology); David Kozhaya (ABB Research); Vincent Rahli (University of Birmingham); Jiangshan Yu (Monash University)

Session 9B: Graph Algorithms

Session Chair: Kishore Kothapalli

Practically Tackling Memory Bottlenecks of Graph-Processing Workloads
Alexandre Valentin Jamet (Universitat Politecnica de Catalunya); Georgios Vavouliotis (Huawei Zurich Research Center); Daniel A. Jiménez (Texas A&M University); Lluc Alvarez (Barcelona Supercomputing Center); Marc Casas (Barcelona Supercomputing Center (BSC))
GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs
Yihua Wei, Peng Jiang (The University of Iowa)
Parallel Derandomization for Coloring
Sam Coy, Artur Czumaj (University of Warwick); Peter Davies-Peck (Durham University); Gopinath Mishra (National University of Singapore)
A Comparative Study of Intersection-Based Triangle Counting Algorithms on GPUs
Jiangbo Li, Zichen Xu (The Nanchang University); Minh Pham, Yicheng Tu (University of South Florida); Qihe Zhou (City University of Macau)

MainConference Closing Session

Details to be announced

FRIDAY - 31 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

FRIDAY
Workshops

ALL DAY

See each individual
workshop program
for schedule details

10	CGRA4HPC	Coarse-Grained Reconfigurable Architectures for High-Performance Computing
11	HIPS	High-level Parallel Programming Models and Supportive Environments
12	iWAPT	International Workshop on Automatic Performance Tuning
13	JSSPP	Job Scheduling Strategies for Parallel Processing
14	ParSocial	Parallel and Distributed Processing for Computational Social Systems
15	PDCO	Parallel / Distributed Combinatorics and Optimization
16	PDSEC	Parallel and Distributed Scientific and Engineering Computing
17	Q-CASA	Quantum Computing Algorithms, Systems, and Applications

IPDPS 2024: Keynote Speakers

IPDPS 2024 Tuesday KEYNOTE SPEAKER

Franck Cappello
Argonne National Laboratory

AuroraGPT: Exploring AI Assistant for Science

Abstract:
Innovative methods, new instruments, disruptive techniques, and groundbreaking technologies have led to significant leaps in scientific progress. The increasingly powerful Large Language Models (LLMs) released each month already speed up research activities such as concept explanation, literature search, and summarization. The transformative potential of AI in research activities, in particular foundation models, raises important questions about their performance in science activities, their potential application in different contexts, and their ethics. In this talk, I will introduce AuroraGPT, Argonne National Laboratory's effort to explore the notion of AI research assistants. To illustrate the gap between existing LLMs and an ideal AI research assistant, I will first share observations from using existing LLMs as early research assistants in three parallel and distributed computing experiments with experts in scheduling, distributed protocols, and PDE solvers. AuroraGPT is developed as an open foundation model trained specifically with scientific data to explore solutions toward the realization of effective AI research assistants. I will describe the activity, challenges, and progress of the different groups developing the key aspects of AuroraGPT. I will particularly focus on the task of conversational research assistant and discuss the evaluation of LLMs' scientific skills, their safety and trustworthiness, and the co-design of a scientific benchmark with domain experts.

Bio:
Franck Cappello received his Ph.D. in Computer Architecture from the University of Paris XI in 1994. He joined the French National Center for Scientific Research (CNRS), where he contributed to cluster and Grid computing, including desktop Grid and later hybrid parallel programming (MPI+OpenMP). In 2003, he moved to INRIA and led the R&D phase of Grid’5000 until 2008. Grid’5000 is a large-scale experimental platform for parallel and distributed computing research, which remains active and has produced over 2,500 scientific publications and supported hundreds of researchers and Ph.D. students. In 2009, as a visiting research professor at the University of Illinois, Cappello, alongside Prof. Marc Snir, established the Joint Laboratory on Petascale Computing (now the Joint Laboratory on Extreme Scale Computing). This collaboration is one of the largest and longest-lasting in high-performance computing, supporting numerous researchers and students in scientific computing, high-performance, and artificial intelligence. From 2009 to 2013, Cappello led an extensive research effort in parallel computing resilience, covering many aspects: failure characterization, checkpointing, fault tolerance protocols, silent data corruption detection, and failure prediction. As a member of the International Exascale Software Project, he led the roadmap efforts related to resilience at extreme scales. In 2016, Cappello became the director of two Exascale Computing Project (ECP) software projects: VeloC, for high-performance checkpointing of exascale applications, and SZ, for lossy compression of scientific data. Both software are now deployed in Exascale systems. He has become a leading figure in lossy compression for scientific data by leading the SZ project and developing key methodologies with the Z-checker compression error assessment tool and the SDRBench repository of reference scientific datasets. Throughout his career, Cappello has made significant contributions to parallel and distributed computing, high-performance computing resilience, and scientific data compression. He is an IEEE Fellow and the recipient of numerous awards, including the 2024 IEEE Charles Babbage Award, the 2024 Euro-Par Achievement Award, the 2022 ACM HPDC Achievement Award, two R&D 100 awards (2019 and 2021), the 2018 IEEE TCPP Outstanding Service Award, and the 2021 IEEE Transactions of Computer Award for Editorial Service and Excellence.

IPDPS 2024 Wednesday KEYNOTE SPEAKER

Peng Wu
Meta

PyTorch 2 and its Compiler Technologies

Abstract:
PyTorch 2.0 was unveiled in March 2023, bringing substantial performance enhancements across a diverse array of models, often with just a simple one-liner change. Do not mistake it as the end of the story. The first release of PyTorch 2 marks the beginning of a long technical roadmap to improving PyTorch execution efficiency via compiled mode. This talk will delve into the design and development of the PyTorch Compiler, examining key aspects through the lens of a three-year timeframe and highlighting our unique approach to creating a top-performing ML framework compiler in a highly competitive and rapidly evolving setting.

Bio:
Dr. Peng Wu is the engineering manager for the PyTorch Compiler team at Meta, bringing with her more than ten years of research expertise from IBM Research, where her work encompassed a diverse array of topics within programming systems. Following IBM, she founded the Programming Languages and Compiler Lab at Huawei and led its growth for six years. Since joining Meta, she supported the team's pursuit of effective compiler solutions for PyTorch over the last three years, culminating in the groundbreaking release of PyTorch 2.0 in March 2023. She holds a PhD in Computer Science from the University of Illinois, Urbana-Champaign.

IPDPS 2024 Thursday KEYNOTE SPEAKER

Kunle Olukotun
Stanford University

Computing Systems in the Foundation Model Era

Abstract:
Generative AI applications with their ability to produce natural language, computer code and images are transforming all aspects of society. These applications are powered by huge foundation models such as GTP-4, which have 10s of billions of parameters and are trained on trillions of tokens, have obtained state-of-the-art quality in natural language processing, vision and speech applications. These models are computationally challenging because they require 100s of petaFLOPS of computing capacity for training and inference. Future foundation models will have even greater capabilities provided by more complex model architectures with longer sequence lengths, irregular data access (sparsity) and irregular control flow. In this talk I will describe how the evolving characteristics of foundation models will impact the design of the optimized computing systems required for training and serving these models. A key element of improving the performance and lowering the cost of deploying future foundation models will be optimizing the data movement (Dataflow) within the model using specialized hardware. In contrast to human-in-the-loop applications such as conversational AI, an emerging application of foundation models is in real-time processing applications that operate without human supervision. I will describe how continuous real-time machine learning can be used to create an intelligent network data plane.

Bio:
Kunle Olukotun is the Cadence Design Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. He founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multi-core multi-thread processor, called Niagara, was acquired by Sun Microsystems and now powers Oracle's SPARC-based servers. Olukotun co-founded SambaNova Systems, a Machine Learning and Artificial Intelligence company, and continues to lead as their Chief Technologist. Olukotun is a member of the National Academy of Engineering, an ACM Fellow, and an IEEE Fellow for contributions to multiprocessors on a chip design and the commercialization of this technology. He received the 2023 ACM-IEEE CS Eckert-Mauchly Award.