On Wednesday, following two days of workshops and tutorials, all attendees are invited to the following conference events:
THURSDAY - June 5, 2025
DAYS • Thursday • Friday • Saturday |
See full paper information here with authors & affiliations and link to abstract. |
08:30-09:00 – Plenary |
Welcome to IPDPS'25: Introductions and Highlights |
09:00-10:00 – Plenary |
Keynote Address
Hava Siegelmann
Distributed Collaborative AI with Applications to Drones |
10:00-10:30 - Coffee Break I - Hall |
10:30-11:50 |
1. Serverless Computing
Room A
- TOSS: Tiering of Serverless Snapshots for Memory-Efficient Serverless Computing
- Ekko: Fully Decentralized Scheduling for Serverless Edge Computing
- It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation
- Tide: A Runtime Management Framework for the Things-Edge-Cloud Computing Continuum
2. Multithreading and Scheduling
Room B
- PISA: An Adversarial Approach To Comparing Task Graph Scheduling Algorithms
- Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines
- Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems
- CALock: Multi-granularity locking in dynamic hierarchies
3. High-Performance AI
Room C
- AdAPT-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning
- An Efficient Adaptive Dual-Threshold SVM Based on Heterogeneous Collaboration
- Accelerating Tensor-train Decomposition on Graph Neural Networks
|
11:50-13:30 – Lunch I – Hall |
13:30-14:30 |
4. Emerging Systems and Architectures
Room A
- Energy-Optimal and Low-Depth Algorithmic Primitives for Spatial Dataflow Architectures
- AQUA: Hardware-Agnostic Qubit Allocation for Quantum Multi-Programming
- Distributed Construction of Demand-Aware Datacenter Networks
5. Graph Algorithms I
Room B
- ComSpark: A Holistic Approach for Scalable Clique Counting
- Less is More: Faster Maximum Clique Search by Work-Avoidance
- ALGAS: A Low-latency GPU-Accelerated Approximate Nearest Neighbor Search System
6. AI and Applications
Room C
- AI and HPC Applications on Leadership Computing Platforms: Performance and Scalability Studies
- Accelerate Coastal Ocean Circulation Model with AI Surrogate
- FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
|
14:40-15:40 |
7. Design Space Exploration
Room A
- Compiler, Runtime, and Hardware Parameters Design Space Exploration
- Using performance projection for design-space exploration on future HPC CPU architectures
- Pallas: a generic trace format for large HPC trace analysis
8. Graph Algorithms II
Room B
- Tera-Scale Multilevel Graph Partitioning
- A Bidirectional GPU Algorithm for Computing Maximum Matchings in Bipartite Graphs
- Edge-Disjoint Spanning Trees on Star Products
9. AI for Systems
Room C
- IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs
- Graph Neural Network-based Latency Prediction for Stream Processing Task
- P3Forecast: Personalized Privacy-Preserving Cloud Workload Prediction based on Federated Generative Adversarial Networks
|
15:40-16:10 – Coffee Break II – Hall |
16:10-18:15 - Plenary |
Best Paper Nominees
- FlexRLHF: A Flexible Placement and Parallelism Framework for Efficient RLHF Training
- Enabling Efficient Error-controlled Lossy Compression for Unstructured Scientific Data
- PolyMorphous: An MLIR-Based Polyhedral Compiler with Loop Transformation Primitives
- Parallel scheduling of task graphs with minimal memory requirements
- The Artificial Scientist: in-transit Machine Learning of Plasma Simulations
|
20:00-23:00 – Conference Banquet at Leonardo da Vinci Museum |
FRIDAY - JUNE 6, 2025
DAYS • Thursday • Friday • Saturday |
See full paper information here with authors & affiliations and link to abstract. |
08:30-08:50 – Plenary |
IEEE CS Charles Babbage Award Announcement |
08:50-09:50 – Plenary |
Babbage Keynote Address
Srinivas Aluru
The Power of Parallelism: Accelerating Discovery in the Biosciences |
09:50-10:20 – Coffee Break III – Hall |
10:20-11:20 |
10. Data Flow and Scheduling
Room A
- Automatically Inferring Detailed & Interpretable Workflow Scaling Models for Better Scheduling
- CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
- Locality Aware Process Remapping for Distributed-Memory Graph Workloads
11. HPC for Biology
Room B
- A Work-Optimal Parallel Algorithm for Aligning Sequences to Genome Graphs
- An Asynchronous Distributed-Memory Parallel Algorithm for k-mer Counting
- Pandemics in silico: Scaling Agent-based Simulations on Realistic Social Contact Networks
12. Federated Learning
Room C
- Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-the-air Computation
- IP-FL: Incentive-driven Personalization in Federated Learning
- SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning through Adaptive Aggregation and Selective Training
|
11:30-12:30 |
13. Compilation and Code Generation
Room A
- Leveraging Compilation Statistics for Compiler Phase Ordering
- PCEBench: A Multi-dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation
- Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning
14. Data and Signal Processing
Room B
- SPRT²: Scalable, Parallel, and Real-Time fMRI Data Analysis on Heterogeneous Architectures
- The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary Use
- Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations
15. Security and Privacy
Room C
- FLAME: Federated Learning for Attack Mitigation and Evasion
- Hybrid-Granularity Parallelism Support for Fast Transaction Processing in Blockchain-based Federated Learning
- Pair-then-Aggregate: Simplified and Efficient Parallel Programming Paradigm for Secure Multi-party Computation
|
12:30-14:00 – Lunch II – Hall |
14:00-15:00 |
16. File System Performance
Room A
- Be Aware of Metadata Corruption in Parallel File System: It Can Be Silent and Catastrophic
- LaOvl: Lifecycle-Aware Overlay File System for Efficient Container I/O in Cloud Computing
- KVAccel: A Novel Write Accelerator for LSM-Tree-Based KV Stores with Host-SSD Collaboration
17. HPC Applications I
Room B
- Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC
- Automated MPI-X code generation for scalable finite-difference solvers
- A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems
18. Latency and Performance for ML
Room C
- PredTOP: Latency Predictor for Distributed Deep Learning Training with Operator Parallelism
- Reducing the End-to-End Latency of DNN-based Recommendation Systems Deployed in GPU Pools
- Improving Accuracy and Efficiency of Graph Embedding Training with Fine-Grained Parameter Management
|
15:10-16:10 |
19. Storage and I/O
Room A
- A Deep Look into The Temporal I/O Behavior of HPC Applications
- VerifyIO: Verifying Adherence to Parallel I/O Consistency Semantics
- AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage
20. HPC Applications II
Room B
- A New Spin on the Fast Multipole Method for GPUs: Rethinking the Far-Field Operators
- Large Scale Finite-Temperature Real-time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems
- Improving Parallel Scalability for Molecular Dynamics Simulations in the Exascale Era
21. Performance Profiling and Characterization
Room C
- Phase-based Frequency Scaling for Energy-efficient Heterogeneous Computing
- GNNPerf: Towards Effective Performance Profiling and Analysis across GNN Frameworks
- DeepBAT: Performance and Cost Optimization of Serverless Inference Using Transformers
|
16:10-17:30 |
Conference Poster Reception
Hall |
17:30-18:30 |
PhD Forum Posters
Hall |
SATURDAY - JUNE 7, 2025
DAYS • Thursday • Friday • Saturday |
See full paper information here with authors & affiliations and link to abstract. |
08:30-08:50 – Plenary |
Awards and IPDPS'26 Announcement |
08:50-09:50 – Plenary |
Keynote Address
David Keyes
For What the Bell Tolls |
09:50-10:20 – Coffee Break V – Hall |
10:20-11:20 |
22. Compression and Data Reduction I
Room A
- To Compress or Not To Compress: Energy and Runtime Trade-Offs in Lossy Compressed I/O
- Fast and Effective Lossy Compression on GPUs and CPUs with Guaranteed Error Bounds
- A Memory-efficient and Computation-balanced Lossy Compressor on Wafer-Scale Engine
23. Matrix Multiplication
Room B
- BRP-SpMM: Block-Row Partition Based Sparse Matrix Multiplication with Tensor and CUDA Cores
- Graph Input-Aware Matrix Multiplication for Pruned Graph Neural Network Acceleration
- NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU
24. Communication
Room C
- Unified Designs of Multi-rail-aware MPI Allreduce and Alltoall Operations Across Diverse GPU and Interconnect Systems
- HiCCL: A Hierarchical Collective Communication Library
- NBLFQ: a lock-free MPMC queue optimized for low contention
|
11:30-12:30 |
25. Compression and Data Reduction II
Room A
- Improving the Efficiency of Interpolation-Based Scientific Data Compressors with Adaptive Quantization Index Prediction
- An Adaptive Two-Stage Algorithm for Error-Bounded Scientific Data Compression
- Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression
26. Linear Solvers
Room B
- Scalable and portable LU factorization with partial pivoting on top of runtime systems
- Accelerating Sparse Linear Solvers on Intelligence Processing Units
- Adaptive s-step GMRES with randomized and truncated low-synchronization orthogonalization
27. Memory and Networking
Room C
- Performance Characterization of CXL Memory and Its Use Cases
- RXT: Reflexive Address Translation for Pointer-Chasing Workloads
- CoRD: Converged RDMA Dataplane
|
12:30-14:00 – Lunch III – Hall |
14:00-15:00 |
28. Compression and Data Reduction III
Room A
- Accelerating Graph Neural Networks Using a Novel Computation-Friendly Matrix Compression Format
- HPDR: High-Performance Portable Scientific Data Reduction Framework
- Sensitivity and Impacts on Parallel Compression of Prediction of Lossy Compression Ratios for Scientific Data
29. Graph Processing
Room B
- TaijiGraph: An Out-of-core Graph Processing System Enhanced with Computational Storage
- CORD: Parallelizing Query Processing across Multiple Computational Storage Devices
- Matcha: A Language and Compiler for Backtracking-based Subgraph Matching
30. Transformers
Room C
- FATHOM: Fast Attention Through Optimizing Memory
- Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
- Characterizing the Behavior and Impact of KV Caching on Transformer Inferences under Concurrency
|
15:10-16:10 |
31. Data and Image Processing
Room A
- SymProp: Scaling Sparse Symmetric Tucker Decomposition via Symmetry Propagation
- Accelerating Homotopy Continuation with GPUs: Application to Trifocal Pose Estimation
- Enhanced JPEG Decoding using PIM Architectures with Parallel MCU Processing
32. Error Prediction and Fault Tolerance
Room B
- An Effective Uncorrectable Memory Error Prediction Framework by Exploiting UPH Indicator in Production Environment
- Fine-Grained Global Search for Inputs Triggering Floating-Point Exceptions in GPU Programs
- GuardianOMP: A framework for highly productive fault tolerance via OpenMP task-level replication
33. High-Performance Inference
Room C
- InkStream: Instantaneous GNN Inference on Dynamic Graphs via Incremental Update
- GIFTS: Efficient \underline{G}CN \underline{I}nference \underline{F}ramework on Py\underline{T}orch-CPU via Exploring the \underline{S}parsity
- MeanCache: User-Centric Semantic Caching for LLM Web Services
|