IPDPS 2025 Conference

General IPDPS Info

Sponsor

IN COOPERATION WITH

and

HOST

IPDPS 2025 Advance Program

The conference will be held at Politecnico di Milano – located at Piazza Leonardo da Vinci 32, Milan 20133 – where we will be meeting in state-of-the-art facilities and enjoying all the resources of a modern university. The conference starts on Tuesday June 3rd and continues through Saturday June 7th. Workshops and Tutorials will be held on Tuesday and Wednesday in Building 3, and main conference events will be held the last three days in "Trifoglio" (250m away).

Tuesday, June 3 & Wednesday, June 4

The IPDPS 2025 Workshops listed here will be held on Tuesday and Wednesday, the first two days of the conference. See the individual website for the workshop’s program schedule. To enrich the offerings for those two days, the conference will also conduct 6 tutorials, open to all attendees. The IPDPS 2025 Tutorials are described here, where details will be posted closer to the conference.

On Wednesday, following two days of workshops and tutorials, all attendees are invited to the following conference events:

16:30-18:00 - Forward-Looking Panel
Revolutionizing Parallel & Distributed Computing: The Quantum, LLM Systems, and AI-Driven Future of HPC

18:00-19:30 - TCPP Welcome Reception

The detailed Main Conference Advance Program is available here. It includes the full program schedule for Thursday, Friday, and Saturday and provides the mapping of papers to the technical session in which they will be presented and the abstract for that paper.

The Conference Posters are listed here. The full schedule for Main Conference events follows.

Details for the IPDPS 2025 PhD Forum, starting Wednesday, are on this page.

IPDPS 2025 Main Conference Program

THURSDAY - June 5, 2025 DAYS • Thursday • Friday • Saturday
See full paper information here with authors & affiliations and link to abstract.
08:30-09:00 – Plenary	Welcome to IPDPS'25: Introductions and Highlights
09:00-10:00 – Plenary	Keynote Address Hava Siegelmann Distributed Collaborative AI with Applications to Drones
10:00-10:30 - Coffee Break I - Hall
10:30-11:50	1. Serverless Computing Room A TOSS: Tiering of Serverless Snapshots for Memory-Efficient Serverless Computing Ekko: Fully Decentralized Scheduling for Serverless Edge Computing It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation Tide: A Runtime Management Framework for the Things-Edge-Cloud Computing Continuum 2. Multithreading and Scheduling Room B PISA: An Adversarial Approach To Comparing Task Graph Scheduling Algorithms Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems CALock: Multi-granularity locking in dynamic hierarchies 3. High-Performance AI Room C AdAPT-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning An Efficient Adaptive Dual-Threshold SVM Based on Heterogeneous Collaboration Accelerating Tensor-train Decomposition on Graph Neural Networks
11:50-13:30 – Lunch I – Hall
13:30-14:30	4. Emerging Systems and Architectures Room A Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures AQUA: Hardware-Agnostic Qubit Allocation for Quantum Multi-Programming Distributed Construction of Demand-Aware Datacenter Networks 5. Graph Algorithms I Room B ComSpark: A Holistic Approach for Scalable Clique Counting Less is More: Faster Maximum Clique Search by Work-Avoidance ALGAS: A Low-latency GPU-Accelerated Approximate Nearest Neighbor Search System 6. AI and Applications Room C AI and HPC Applications on Leadership Computing Platforms: Performance and Scalability Studies Accelerate Coastal Ocean Circulation Model with AI Surrogate FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
14:40-15:40	7. Design Space Exploration Room A Compiler, Runtime, and Hardware Parameters Design Space Exploration Using performance projection for design-space exploration on future HPC CPU architectures Pallas: a generic trace format for large HPC trace analysis 8. Graph Algorithms II Room B Tera-Scale Multilevel Graph Partitioning A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite Graphs Edge-Disjoint Spanning Trees on Star Products 9. AI for Systems Room C IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs Graph Neural Network-based Latency Prediction for Stream Processing Task P3Forecast: Personalized Privacy-Preserving Cloud Workload Prediction based on Federated Generative Adversarial Networks
15:40-16:10 – Coffee Break II – Hall
16:10-18:15 - Plenary	Best Paper Nominees FlexRLHF: A Flexible Placement and Parallelism Framework for Efficient RLHF Training Enabling Efficient Error-controlled Lossy Compression for Unstructured Scientific Data PolyMorphous: An MLIR-Based Polyhedral Compiler with Loop Transformation Primitives Parallel scheduling of task graphs with minimal memory requirements The Artificial Scientist: in-transit Machine Learning of Plasma Simulations
20:00-23:00 – Conference Banquet at Leonardo da Vinci Museum
FRIDAY - JUNE 6, 2025 DAYS • Thursday • Friday • Saturday
See full paper information here with authors & affiliations and link to abstract.
08:30-08:50 – Plenary	IEEE CS Charles Babbage Award Announcement
08:50-09:50 – Plenary	Babbage Keynote Address Srinivas Aluru The Power of Parallelism: Accelerating Discovery in the Biosciences
09:50-10:20 – Coffee Break III – Hall
10:20-11:20	10. Data Flow and Scheduling Room A Automatically Inferring Detailed & Interpretable Workflow Scaling Models for Better Scheduling CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse Locality Aware Process Remapping for Distributed-Memory Graph Workloads 11. HPC for Biology Room B A Work-Optimal Parallel Algorithm for Aligning Sequences to Genome Graphs An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting Pandemics in silico: Scaling Agent-based Simulations on Realistic Social Contact Networks 12. Federated Learning Room C Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-the-air Computation IP-FL: Incentive-driven Personalization in Federated Learning SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning through Adaptive Aggregation and Selective Training
11:30-12:30	13. Compilation and Code Generation Room A Leveraging Compilation Statistics for Compiler Phase Ordering PCEBench: A Multi-dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning 14. Data and Signal Processing Room B SPRT²: Scalable, Parallel, and Real-Time fMRI Data Analysis on Heterogeneous Architectures The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary Use Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations 15. Security and Privacy Room C FLAME: Federated Learning for Attack Mitigation and Evasion Hybrid-Granularity Parallelism Support for Fast Transaction Processing in Blockchain-based Federated Learning Pair-then-Aggregate: Simplified and Efficient Parallel Programming Paradigm for Secure Multi-party Computation
12:30-14:00 – Lunch II – Hall
14:00-15:00	16. File System Performance Room A Be Aware of Metadata Corruption in Parallel File System: It Can Be Silent and Catastrophic LaOvl: Lifecycle-Aware Overlay File System for Efficient Container I/O in Cloud Computing KVAccel: A Novel Write Accelerator for LSM-Tree-Based KV Stores with Host-SSD Collaboration 17. HPC Applications I Room B Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC Automated MPI-X code generation for scalable finite-difference solvers A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems 18. Latency and Performance for ML Room C PredTOP: Latency Predictor for Distributed Deep Learning Training with Operator Parallelism Reducing the End-to-End Latency of DNN-based Recommendation Systems Deployed in GPU Pools Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter Management
15:10-16:10	19. Storage and I/O Room A A Deep Look into The Temporal I/O Behavior of HPC Applications VerifyIO: Verifying Adherence to Parallel I/O Consistency Semantics AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage 20. HPC Applications II Room B A New Spin on the Fast Multipole Method for GPUs: Rethinking the Far-Field Operators Large Scale Finite-Temperature Real-time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems Improving Parallel Scalability for Molecular Dynamics Simulations in the Exascale Era 21. Performance Profiling and Characterization Room C Phase-based Frequency Scaling for Energy-efficient Heterogeneous Computing GNNPerf: Towards Effective Performance Profiling and Analysis across GNN Frameworks DeepBAT: Performance and Cost Optimization of Serverless Inference Using Transformers
16:10-17:30	Conference Poster Reception Hall
17:30-18:30	PhD Forum Posters Hall
SATURDAY - JUNE 7, 2025 DAYS • Thursday • Friday • Saturday
See full paper information here with authors & affiliations and link to abstract.
08:30-08:50 – Plenary	Awards and IPDPS'26 Announcement
08:50-09:50 – Plenary	Keynote Address David Keyes For What the Bell Tolls
09:50-10:20 – Coffee Break V – Hall
10:20-11:20	22. Compression and Data Reduction I Room A To Compress or Not To Compress: Energy and Runtime Trade-Offs in Lossy Compressed I/O Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error Bounds A Memory-efficient and Computation-balanced Lossy Compressor on Wafer-Scale Engine 23. Matrix Multiplication Room B BRP-SpMM: Block-Row Partition Based Sparse Matrix Multiplication with Tensor and CUDA Cores Graph Input-Aware Matrix Multiplication for Pruned Graph Neural Network Acceleration NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU 24. Communication Room C Unified Designs of Multi-rail-aware MPI Allreduce and Alltoall Operations Across Diverse GPU and Interconnect Systems HiCCL: A Hierarchical Collective Communication Library NBLFQ: a lock-free MPMC queue optimized for low contention
11:30-12:30	25. Compression and Data Reduction II Room A Improving the Efficiency of Interpolation-Based Scientific Data Compressors with Adaptive Quantization Index Prediction An Adaptive Two-Stage Algorithm for Error-Bounded Scientific Data Compression Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression 26. Linear Solvers Room B Scalable and portable LU factorization with partial pivoting on top of runtime systems Accelerating Sparse Linear Solvers on Intelligence Processing Units Adaptive s-step GMRES with randomized and truncated low-synchronization orthogonalization 27. Memory and Networking Room C Performance Characterization of CXL Memory and Its Use Cases RXT: Reflexive Address Translation for Pointer-Chasing Workloads CoRD: Converged RDMA Dataplane
12:30-14:00 – Lunch III – Hall
14:00-15:00	28. Compression and Data Reduction III Room A Accelerating Graph Neural Networks Using a Novel Computation-Friendly Matrix Compression Format HPDR: High-Performance Portable Scientific Data Reduction Framework Sensitivity and Impacts on Parallel Compression of Prediction of Lossy Compression Ratios for Scientific Data 29. Graph Processing Room B TaijiGraph: An Out-of-core Graph Processing System Enhanced with Computational Storage CORD: Parallelizing Query Processing across Multiple Computational Storage Devices Matcha: A Language and Compiler for Backtracking-based Subgraph Matching 30. Transformers Room C FATHOM: Fast Attention Through Optimizing Memory Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques Characterizing the Behavior and Impact of KV Caching on Transformer Inferences under Concurrency
15:10-16:10	31. Data and Image Processing Room A SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose Estimation Enhanced JPEG Decoding using PIM Architectures with Parallel MCU Processing 32. Error Prediction and Fault Tolerance Room B An Effective Uncorrectable Memory Error Prediction Framework by Exploiting UPH Indicator in Production Environment Fine-Grained Global Search for Inputs Triggering Floating-Point Exceptions in GPU Programs GuardianOMP: A framework for highly productive fault tolerance via OpenMP task-level replication 33. High-Performance Inference Room C InkStream: Instantaneous GNN Inference on Dynamic Graphs via Incremental Update GIFTS: Efficient \underline{G}CN \underline{I}nference \underline{F}ramework on Py\underline{T}orch-CPU via Exploring the \underline{S}parsity MeanCache: User-Centric Semantic Caching for LLM Web Services

2025 REGISTRATION

Early Deadline ~~March 31st~~
Extended to April 8th

Registration Details

Search IPDPS

Follow IPDPS

IPDPS 2024 Report

38th IEEE International Parallel & Distributed Processing Symposium
May 27-31, 2024

Hyatt Regency San Francisco
Embarcadero Center
San Francisco, California USA

REPORT ON IPDPS 2024