General IPDPS Info

Sponsor


IN COOPERATION WITH

ACM

ACM SIGARCH   ACM SIGHPC

and

TCCA.png

TCDP.png

HOST

Polotecnico Logo

IPDPS 2025 Advance Program

The conference will be held at Politecnico di Milanolocated at Piazza Leonardo da Vinci 32, Milan 20133 – where we will be meeting in state-of-the-art facilities and enjoying all the resources of a modern university. The conference starts on Tuesday June 3rd and continues through Saturday June 7th. Workshops and Tutorials will be held on Tuesday and Wednesday in Building 3, and main conference events will be held the last three days in "Trifoglio" (250m away).

Tuesday, June 3 & Wednesday, June 4

The IPDPS 2025 Workshops listed here will be held on Tuesday and Wednesday, the first two days of the conference. See the individual website for the workshop’s program schedule. To enrich the offerings for those two days, the conference will also conduct 6 tutorials, open to all attendees. The IPDPS 2025 Tutorials are described here, where details will be posted closer to the conference.

On Wednesday, following two days of workshops and tutorials, all attendees are invited to the following conference events:

16:30-18:00 - Forward-Looking Panel
Revolutionizing Parallel & Distributed Computing: The Quantum, LLM Systems, and AI-Driven Future of HPC

18:00-19:30 - TCPP Welcome Reception

The detailed Main Conference Advance Program is available here. It includes the full program schedule for Thursday, Friday, and Saturday and provides the mapping of papers to the technical session in which they will be presented and the abstract for that paper.

The Conference Posters are listed here. The full schedule for Main Conference events follows.

Details for the IPDPS 2025 PhD Forum, starting Wednesday, are on this page.

IPDPS 2025 Main Conference Program

THURSDAY - June 5, 2025

DAYS • Thursday • Friday • Saturday

See full paper information here with authors & affiliations and link to abstract.

08:30-09:00 – Plenary

Welcome to IPDPS'25: Introductions and Highlights

09:00-10:00 – Plenary

Keynote Address

Hava Siegelmann
Distributed Collaborative AI with Applications to Drones

10:00-10:30 - Coffee Break I - Hall

10:30-11:50

1. Serverless Computing

Room A

  • TOSS: Tiering of Serverless Snapshots for Memory-Efficient Serverless Computing 
  • Ekko: Fully Decentralized Scheduling for Serverless Edge Computing 
  • It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation 
  • Tide: A Runtime Management Framework for the Things-Edge-Cloud Computing Continuum 

2. Multithreading and Scheduling
Room B

  • PISA: An Adversarial Approach To Comparing Task Graph Scheduling Algorithms 
  • Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines 
  • Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems 
  • CALock: Multi-granularity locking in dynamic hierarchies 

3. High-Performance AI
Room C

  • AdAPT-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning 
  • An Efficient Adaptive Dual-Threshold SVM Based on Heterogeneous Collaboration 
  • Accelerating Tensor-train Decomposition on Graph Neural Networks 

11:50-13:30 – Lunch I – Hall

13:30-14:30

4. Emerging Systems and Architectures
Room A

  • Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures 
  • AQUA: Hardware-Agnostic Qubit Allocation for Quantum Multi-Programming 
  • Distributed Construction of Demand-Aware Datacenter Networks 

5. Graph Algorithms I
Room B

  • ComSpark: A Holistic Approach for Scalable Clique Counting 
  • Less is More: Faster Maximum Clique Search by Work-Avoidance 
  • ALGAS: A Low-latency GPU-Accelerated Approximate Nearest Neighbor Search System 

6. AI and Applications
Room C

  • AI and HPC Applications on Leadership Computing Platforms: Performance and Scalability Studies 
  • Accelerate Coastal Ocean Circulation Model with AI Surrogate 
  • FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs 

14:40-15:40

7. Design Space Exploration
Room A

  • Compiler, Runtime, and Hardware Parameters Design Space Exploration 
  • Using performance projection for design-space exploration on future HPC CPU architectures 
  • Pallas: a generic trace format for large HPC trace analysis 

8. Graph Algorithms II
Room B

  • Tera-Scale Multilevel Graph Partitioning 
  • A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite Graphs 
  • Edge-Disjoint Spanning Trees on Star Products 

9. AI for Systems
Room C

  • IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs 
  • Graph Neural Network-based Latency Prediction for Stream Processing Task 
  • P3Forecast: Personalized Privacy-Preserving Cloud Workload Prediction based on Federated Generative Adversarial Networks 

15:40-16:10 – Coffee Break II – Hall

16:10-18:15 - Plenary

Best Paper Nominees

  • FlexRLHF: A Flexible Placement and Parallelism Framework for Efficient RLHF Training 
  • Enabling Efficient Error-controlled Lossy Compression for Unstructured Scientific Data 
  • PolyMorphous: An MLIR-Based Polyhedral Compiler with Loop Transformation Primitives 
  • Parallel scheduling of task graphs with minimal memory requirements 
  • The Artificial Scientist: in-transit Machine Learning of Plasma Simulations 
20:00-23:00 – Conference Banquet at Leonardo da Vinci Museum


FRIDAY - JUNE 6, 2025

DAYS • Thursday • Friday • Saturday

See full paper information here with authors & affiliations and link to abstract.

08:30-08:50 – Plenary

IEEE CS Charles Babbage Award Announcement

08:50-09:50 – Plenary

Babbage Keynote Address

Srinivas Aluru
The Power of Parallelism: Accelerating Discovery in the Biosciences 

09:50-10:20 – Coffee Break III – Hall

10:20-11:20

10. Data Flow and Scheduling
Room A

  • Automatically Inferring Detailed & Interpretable Workflow Scaling Models for Better Scheduling 
  • CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse 
  • Locality Aware Process Remapping for Distributed-Memory Graph Workloads 

11. HPC for Biology
Room B

  • A Work-Optimal Parallel Algorithm for Aligning Sequences to Genome Graphs 
  • An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting 
  • Pandemics in silico: Scaling Agent-based Simulations on Realistic Social Contact Networks 

12. Federated Learning
Room C

  • Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-the-air Computation 
  • IP-FL: Incentive-driven Personalization in Federated Learning 
  • SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning through Adaptive Aggregation and Selective Training 

11:30-12:30

13. Compilation and Code Generation
Room A

  • Leveraging Compilation Statistics for Compiler Phase Ordering 
  • PCEBench: A Multi-dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation 
  • Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning 

14. Data and Signal Processing
Room B

  • SPRT²: Scalable, Parallel, and Real-Time fMRI Data Analysis on Heterogeneous Architectures 
  • The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary Use 
  • Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations 

15. Security and Privacy
Room C

  • FLAME: Federated Learning for Attack Mitigation and Evasion 
  • Hybrid-Granularity Parallelism Support for Fast Transaction Processing in Blockchain-based Federated Learning 
  • Pair-then-Aggregate: Simplified and Efficient Parallel Programming Paradigm for Secure Multi-party Computation 

12:30-14:00 – Lunch II – Hall

14:00-15:00

16. File System Performance
Room A

  • Be Aware of Metadata Corruption in Parallel File System: It Can Be Silent and Catastrophic 
  • LaOvl: Lifecycle-Aware Overlay File System for Efficient Container I/O in Cloud Computing 
  • KVAccel: A Novel Write Accelerator for LSM-Tree-Based KV Stores with Host-SSD Collaboration 

17. HPC Applications I
Room B

  • Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC 
  • Automated MPI-X code generation for scalable finite-difference solvers 
  • A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems 

18. Latency and Performance for ML
Room C

  • PredTOP: Latency Predictor for Distributed Deep Learning Training with Operator Parallelism 
  • Reducing the End-to-End Latency of DNN-based Recommendation Systems Deployed in GPU Pools 
  • Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter Management 

15:10-16:10

19. Storage and I/O
Room A

  • A Deep Look into The Temporal I/O Behavior of HPC Applications 
  • VerifyIO: Verifying Adherence to Parallel I/O Consistency Semantics 
  • AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage 

20. HPC Applications II
Room B

  • A New Spin on the Fast Multipole Method for GPUs: Rethinking the Far-Field Operators 
  • Large Scale Finite-Temperature Real-time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems 
  • Improving Parallel Scalability for Molecular Dynamics Simulations in the Exascale Era 

21. Performance Profiling and Characterization
Room C

  • Phase-based Frequency Scaling for Energy-efficient Heterogeneous Computing 
  • GNNPerf: Towards Effective Performance Profiling and Analysis across GNN Frameworks 
  • DeepBAT: Performance and Cost Optimization of Serverless Inference Using Transformers 

16:10-17:30

Conference Poster Reception
Hall

17:30-18:30

PhD Forum Posters
Hall


SATURDAY - JUNE 7, 2025

DAYS • Thursday • Friday • Saturday

See full paper information here with authors & affiliations and link to abstract.

08:30-08:50 – Plenary

Awards and IPDPS'26 Announcement

08:50-09:50 – Plenary

Keynote Address

David Keyes
For What the Bell Tolls 

09:50-10:20 – Coffee Break V – Hall

10:20-11:20

22. Compression and Data Reduction I
Room A

  • To Compress or Not To Compress: Energy and Runtime Trade-Offs in Lossy Compressed I/O 
  • Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error Bounds 
  • A Memory-efficient and Computation-balanced Lossy Compressor on Wafer-Scale Engine 

23. Matrix Multiplication
Room B

  • BRP-SpMM: Block-Row Partition Based Sparse Matrix Multiplication with Tensor and CUDA Cores 
  • Graph Input-Aware Matrix Multiplication for Pruned Graph Neural Network Acceleration 
  • NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU 

24. Communication
Room C

  • Unified Designs of Multi-rail-aware MPI Allreduce and Alltoall Operations Across Diverse GPU and Interconnect Systems 
  • HiCCL: A Hierarchical Collective Communication Library 
  • NBLFQ: a lock-free MPMC queue optimized for low contention 

11:30-12:30

25. Compression and Data Reduction II
Room A

  • Improving the Efficiency of Interpolation-Based Scientific Data Compressors with Adaptive Quantization Index Prediction 
  • An Adaptive Two-Stage Algorithm for Error-Bounded Scientific Data Compression 
  • Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression 

26. Linear Solvers
Room B

  • Scalable and portable LU factorization with partial pivoting on top of runtime systems 
  • Accelerating Sparse Linear Solvers on Intelligence Processing Units 
  • Adaptive s-step GMRES with randomized and truncated low-synchronization orthogonalization 

27. Memory and Networking
Room C

  • Performance Characterization of CXL Memory and Its Use Cases 
  • RXT: Reflexive Address Translation for Pointer-Chasing Workloads 
  • CoRD: Converged RDMA Dataplane 

12:30-14:00 – Lunch III – Hall

14:00-15:00

28. Compression and Data Reduction III
Room A

  • Accelerating Graph Neural Networks Using a Novel Computation-Friendly Matrix Compression Format 
  • HPDR: High-Performance Portable Scientific Data Reduction Framework 
  • Sensitivity and Impacts on Parallel Compression of Prediction of Lossy Compression Ratios for Scientific Data 

29. Graph Processing
Room B

  • TaijiGraph: An Out-of-core Graph Processing System Enhanced with Computational Storage 
  • CORD: Parallelizing Query Processing across Multiple Computational Storage Devices 
  • Matcha: A Language and Compiler for Backtracking-based Subgraph Matching 

30. Transformers
Room C

  • FATHOM: Fast Attention Through Optimizing Memory 
  • Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques 
  • Characterizing the Behavior and Impact of KV Caching on Transformer Inferences under Concurrency 

15:10-16:10

31. Data and Image Processing
Room A

  • SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation 
  • Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose Estimation 
  • Enhanced JPEG Decoding using PIM Architectures with Parallel MCU Processing 

32. Error Prediction and Fault Tolerance
Room B

  • An Effective Uncorrectable Memory Error Prediction Framework by Exploiting UPH Indicator in Production Environment 
  • Fine-Grained Global Search for Inputs Triggering Floating-Point Exceptions in GPU Programs 
  • GuardianOMP: A framework for highly productive fault tolerance via OpenMP task-level replication 

33. High-Performance Inference
Room C

  • InkStream: Instantaneous GNN Inference on Dynamic Graphs via Incremental Update 
  • GIFTS: Efficient \underline{G}CN \underline{I}nference \underline{F}ramework on Py\underline{T}orch-CPU via Exploring the \underline{S}parsity 
  • MeanCache: User-Centric Semantic Caching for LLM Web Services 

Register Today

Early Deadline March 31st
Extended to April 8th

Registration Details

Search IPDPS

 

Follow IPDPS

   

IPDPS 2024 Report



38th IEEE International Parallel & Distributed Processing Symposium
May 27-31, 2024

Hyatt Regency San Francisco
Embarcadero Center
San Francisco, California USA

REPORT ON IPDPS 2024