General IPDPS Info




IPDPS 2016 Advance Program

Please visit the IPDPS website regularly for updates, since there may be schedule revisions. Authors who have corrections should send email to giving full details. Note that paper numbers are listed for easy reference.

MONDAY - 23 May 2016


* See each individual
workshop program
for schedule details





Heterogeneity in Computing Workshop



Reconfigurable Architectures Workshop



Workshop on High-Level Parallel Programming Models & Supportive Environments



Workshop on High Performance Computational Biology



Advances in Parallel and Distributed Computational Models



Accelerators and Hybrid Exascale Systems



Parallel Computing and Optimization



Graph Algorithms Building Blocks



NSF/TCPP Workshop on Parallel and Distributed Computing Education



High Performance Data Analysis and Visualization



Variability in Parallel and Distributed Systems


5:00  PM

Round-table Workshop II: Heterogeneous Tasking

See Workshops page for details
Light Reception
6:00 PM – 7:30 PM
IPDPS 2016 Welcome Reception & TCPP Annual Meeting

TUESDAY - 24 May 2016


Opening Session
8:00 AM - 8:30 AM

Opening Session: TBA

Keynote Session
8:30 AM - 9:30 AM

Keynote Speech

Session Chair: Xian-He Sun


Kai Li
Princeton University

Disruptive Research and Innovation


Abstract: Ever since Clayton Christensen coined the terms “disruptive technologies” and “disruptive innovations” in 1990s, researchers and entrepreneurs love the word “disruptive” because disrupting current knowledge or products help us accelerate knowledge discoveries and moving the society into a new era.  What is disruptive research?  What is disruptive innovation?  How do they happen? ... Read more

Morning Break 9:30 AM -10:00 AM

PhD Forum
Starts on Tuesday

PhD Forum Posters
On Display All Day Tuesday and Wednesday


More details to be announced

Parallel Technical
Sessions 1, 2, 3, & 4

10:00 AM - 12:00 PM

Sesson 1
Graph Algorithms

Session Chair: Umit V. Catalyurek


Subgraph Counting: Color Coding Beyond Trees
Venkatesan T Chakaravarthy (IBM Research, India); Mikhail Kapralov (IBM Research, USA); Prakash Murali (IBM Research, India); Fabrizio Petrini and Xinyu Que (IBM T.J. Watson Research Center, USA); Yogish Sabharwal (IBM Research, India); Baruch Schieber (IBM T.J. Watson Research Center, USA)


A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs
Matteo Ceccarello, Andrea Pietracaprina and Geppino Pucci (University of Padova, Italy); Eli Upfal (Brown University, USA)


Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis
Junya Arai (Nippon Telegraph and Telephone Corporation, Japan); Hiroaki Shiokawa (University of Tsukuba, Japan); Takeshi Yamamuro (Nippon Telegraph and Telephone Corporation, Japan); Makoto Onizuka (Osaka University, Japan); Sotetsu Iwamura (Nippon Telegraph and Telephone Corporation, Japan)


Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs
Ariful Azad and Aydin Buluc (Lawrence Berkeley National Laboratory, USA)



Session 2
Software Environments and Tools

Session Chair: Karen L Karavanic


Automatic Parallel Pattern Detection in the Algorithm Structure Design Space
Zia Ul Huda (TU Darmstadt and Laboratory for Parallel Programming, Germany); Ali Jannesari (German Research School for Simulation Sciences and RWTH Aachen University, Germany); Felix Wolf (TU Darmstadt, Germany)


ARCHER: Effectively Spotting Data Races in Large OpenMP Applications
Simone Atzeni, Ganesh Gopalakrishnan and Zvonimir Rakamaric (University of Utah, USA); Dong Ahn, Gregory L Lee, Ignacio Laguna and Martin Schulz (Lawrence Livermore National Laboratory, USA); Joachim Protze (RWTH Aachen University, Germany); Matthias Mueller (RWTH Aachen University, USA)


SEAK: Future-Proof Mission-Centric Benchmarking
Nathan Tallent, Joseph B Manzano, Nitin A. Gawande, Seunghwa Kang, Darren Kerbyson and Adolfy Hoisie (Pacific Northwest National Laboratory, USA); Joseph Cross (DARPA, USA)


Design and Implementation of a Parallel Research Kernel for Assessing Dynamic Load-Balancing Capabilities
Evangelos Georganas (University of California, Berkeley, USA); Rob F Van der Wijngaart and Tim Mattson (Intel Corporation, USA)


Session 3
Network Architecture

Session Chair: Ron Brightwell


VNRE: Flexible and Efficient Acceleration for Network Redundancy Elimination
Xiongzi Ge (University of Minnesota, Twin Cities, USA); Yi Liu (Huawei Corporation, P.R. China); Chengtao Lu (Xi'an Technological University, P.R. China); Jim Diehl (University of Minnesota, Twin Cities, USA); David Du (University of Minnesota, USA); Liang Zhang and Jian Chen (Huawei Corporation, P.R. China)


Analyzing Network Health and Congestion in Dragonfly-based Systems
Abhinav Bhatele (Lawrence Livermore National Laboratory, USA); Nikhil Jain (University of Illinois at Urbana-Champaign, USA); Yarden Livnat and Valerio Pascucci (University of Utah, USA); Peer-Timo Bremer (Lawrence Livermore National Laboratory, USA)


Random Regular Graph and Generalized De Bruijn Graph with K-Shortest Path Routing
Peyman Faizian, Md Atiqul Mollah and Xin Yuan (Florida State University, USA); Scott Pakin and Michael Lang (Los Alamos National Laboratory, USA)


Deflection Containment for Bufferless Network-on-Chips
Xiyue Xiang and Nian-Feng Tzeng (University of Louisiana at Lafayette, USA)



Session 4
Application Optimization

Session Chair: Shirley V Moore


RUPS: Fixing Relative Distances Among Urban Vehicles with Context-Aware Trajectories
Hongzi Zhu (Shanghai Jiao Tong University, P.R. China); Shan Chang (Donghua University, P.R. China); Li Lu (University of Electronic Science and Technology of China, P.R. China); Wei Zhang (Shanghai Jiao Tong University, P.R. China)


HDT: A Hybrid Structure for Extreme-Resolution 3D Sparse Data Modeling
Mohammad M Hossain (Georgia Institute of Technology, USA); Thomas Tucker (Tucker Innovations, USA); Thomas Kurfess and Richard W Vuduc (Georgia Institute of Technology, USA)


Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-Dimensional Intra-Tile Parallelization
Tareq Malas (KAUST, Saudi Arabia); Julian Hornich and Georg Hager (Friedrich-Alexander University of Erlangen-Nuremberg, Germany); Hatem Ltaief (KAUST and Extreme Computing Research Center, Saudi Arabia); Christoph Pflaum (Friedrich-Alexander University of Erlangen-Nuremberg, Germany); David Keyes (KAUST, Saudi Arabia)


Order-Invariant Real Number Summation: Circumventing Accuracy Loss for Multimillion Summands on Multiple Parallel Architectures
Patrick E Small, Rajiv Kalia, Aiichiro Nakano and Priya Vashishta (University of Southern California, USA)


12:00 PM – 1:30 PM

IPDPS 2016 Round-Table Workshop I:
PDC in Core Undergraduate Education


Dick Brown of St. Olaf College and Suzanne Matthews of West Point will lead discussion on this topic to of interest to the IPDPS community. For details...

Parallel Technical Sessions 5, 6, 7, & 8
1:30 PM - 3:30 PM

Session 5
Linear Algebra & Solvers

Session Chair: Cevdet Aykanat


INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices
Chenhan Yu and William March (The University of Texas at Austin, USA); Bo Xiao (Georgia Institute of Technology, USA); George Biros (The University of Texas at Austin, USA)


A Fast Tridiagonal Solver for Intel MIC Architecture
Xinliang Wang, Wei Xue, Yangtong Xu and Weimin Zheng (Tsinghua University, P.R. China)


A Relaxed Synchronization Approach for Solving Parallel Quadratic Programming Problems with Guaranteed Convergence
Kooktae Lee, Raktim Bhattacharya, Jyotikrishna Dass, V N S Prithvi Sakuru and Rabi Mahapatra (Texas A&M University, USA)


Enhancing Scalability and Load Balancing of Parallel Selected Inversion Via Tree-Based Asynchronous Communication
Mathias Jacquelin (Lawrence Berkeley National Lab, USA); Lin Lin (University of California Berkeley, USA); Nathan Wichmann (Cray Inc., USA); Chao Yang (Lawrence Berkeley National Lab, USA)



Session 6
Fault Tolerance & Resilience

Session Chair: Kathryn Mohror


Optimal Resilience Patterns to Cope with Fail-Stop and Silent Errors
Anne Benoit, Aurelien Cavelan and Yves Robert (ENS Lyon, France); Hongyang Sun (ENS Lyon and INRIA, France)


Reducing Waste in Large Scale Systems Through Introspective Analysis
Leonardo Bautista-Gomez (Argonne National Laboratory, USA); Ana Gainaru (University of Illinois at Urbana-Champaign and National Center for Suppercomputing Applications, USA); Swann Perarnau (Argonne National Laboratory, USA); Devesh Tiwari and Saurabh Gupta (Oak Ridge National Laboratory, USA); Franck Cappello (Argonne National Laboratory, University of Illinois at Urbana Champaign and Inria, France); Christian Engelmann (Oak Ridge National Laboratory, USA); Marc Snir (Argonne National Laboratory, USA)


Fault Modeling of Extreme Scale Applications Using Machine Learning
Abhinav Vishnu (Pacific Northwest National Laboratory, USA); Hubertus J. J. Van Dam (Brookhaven National Laboratory, USA); Nathan Tallent, Darren Kerbyson and Adolfy Hoisie (Pacific Northwest National Laboratory, USA)


Efficient Checkpointing of Multi-Threaded Applications as a Tool for Debugging, Performance Tuning, and Resiliency
Max Grossman and Vivek Sarkar (Rice University, USA)



Session 7
Modeling and Evaluation

Session Chair: David Lowenthal


X: A Comprehensive Analytic Model for Parallel Machines
Ang Li (Eindhoven University of Technology, The Netherlands); Shuaiwen Song (Pacific Northwest National Laboratory, USA); Eric Brugel (The State University of New Jersey, USA); Akash Kumar (Technische Universität Dresden, Germany); Daniel Gerardo Chavarria (Pacific Northwest National Laboratory, USA); Henk Corporaal (Technical University Eindhoven, The Netherlands)


NiMC: Characterizing and Eliminating Network-Induced Memory Contention
Taylor L Groves (Sandia National Laboratories and University of New Mexico, USA); Ryan E Grant (Sandia National Laboratories and Center for Computing Research, USA); Dorian C Arnold (University of New Mexico, USA)


An Early Performance Study of Large-scale POWER8 SMP Systems
Xing Liu, Daniele Buono, Fabio Checconi, Jee W Choi, Xinyu Que, Fabrizio Petrini, John Gunnels and Jeff Stuecheli (IBM T. J. Watson Research Center, USA)


A Methodology for Modeling Dynamic and Static Power Consumption
Bhavishya Goel and Sally A. McKee (Chalmers University of Technology, Sweden)



Session 8
Graph Applications

Session Chair: Aydin Buluc


Algorithmic Techniques for Solving Graph Problems on the Automata Processor
Indranil Roy (Micron Technology, Inc., USA); Nagakishore Jammula (Georgia Institute of Technology, USA); Srinivas Aluru (Georgia Institute of Technology and Indian Institute of Technology Bombay, USA)


A Case Study of Complex Graph Analysis in Distributed Memory: Implementation and Optimization
George M Slota (The Pennsylvania State University, USA); Sivasankaran Rajamanickam (Sandia National Laboratories, USA); Kamesh Madduri (The Pennsylvania State University, USA)


FastBFS: Fast Breadth-First Graph Search on a Single Server
Shuhan Cheng, Guangyan Zhang, Jiwu Shu and Qingda Hu (Tsinghua University, P.R. China)


GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms
Michael Anderson (Intel Corporation, USA); Narayanan Sundaram (Intel Labs, USA); Nadathur Satish (Intel Corporation, USA); Md. Mostofa Ali Patwary (Intel Labs, USA); Theodore L. Willke, II and Pradeep Dubey (Intel Corporation, USA)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical
Sessions 9, 10, 11, & 12

4:00 PM - 6:00 PM

Session 9
Cloud Resource Allocation

Session Chair: Xinghui Zhao


On First Fit Bin Packing for Online Cloud Server Allocation
Xueyan Tang, Yusen Li, Runtian Ren and Wentong Cai (Nanyang Technological University, Singapore)


Smoothed Online Resource Allocation in Multi-Tier Distributed Cloud Networks
Lei Jiao (Bell Labs, Ireland); Antonia Tulino (Bell Labs and Università Federico II, Napoli, USA); Jaime Llorca (Bell Labs, Alcatel-Lucent, USA); Yue Jin (Alcatel-Lucent, Ireland); Alessandra Sala (Bell Labs, Alcatel-Lucent, Ireland)


Dynamic Acceleration of Parallel Applications in Cloud Platforms by Adaptive Time-Slice Control
Song Wu, Zhenjiang Xie and Haibao Chen (Huazhong University of Science and Technology, P.R. China); Sheng Di (Argonne National Laboratory, USA); Xinyu Zhao and Hai Jin (Huazhong University of Science and Technology, P.R. China)


Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning
Yash Ukidave, Xiangyu Li and David Kaeli (Northeastern University, USA)



Session 10
Memory Management

Session Chair: Nikos Hardavellas


TintMalloc: Reducing Memory Access Divergence Via Controller-Aware Coloring
Xing Pan, Yasaswini Jyothi Gownivaripalli and Frank Mueller (NCSU, USA)


Markov Chain-based Adaptive Scheduling in Software Transactional Memory
Pierangelo Di Sanzo, Marco Sannicandro, Bruno Ciciani and Francesco Quaglia (Sapienza – Università di Roma, Italy)


MEMTUNE: Dynamic Memory Management for In-memory Data Analytic Platforms
Luna Xu (Virginia Tech, USA); Min Li and Li Zhang (IBM T. J. Watson Research Center, USA); Ali R. Butt (Virginia Tech, USA); Yandong Wang (IBM T.J. Watson Research Center, USA); Zane Zhenhua Hu (IBM Platform Computing, Canada)


High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits
Dipti Shankar, Xiaoyi Lu, Nusrat Islam, Md. Wasi-ur-Rahman and Dhabaleswar Panda (The Ohio State University, USA)



Session 11
Scheduling and Resource Management

Session Chair: Hank Hoffmann


GreenMatch: Renewable-Aware Workload Scheduling for Massive Storage Systems
Xiaoyang Qu and Jiguang Wan (Huazhong University of Science and Technology, P.R. China); Jun Wang (University of Central Florida, USA); Liqiong Liu, Dan Luo and Changsheng Xie (Huazhong University of Science and Technology, P.R. China)


CATA: Criticality Aware Task Acceleration for Multicore Processors
Emilio Castillo (Barcelona Supercomputing Center, Spain); Miquel Moreto (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, Spain); Marc Casas and Lluc Alvarez (Barcelona Supercomputing Center, Spain); Enrique Vallejo (University of Cantabria, Spain); Kallia Chronaki and Rosa M. Badia (Barcelona Supercomputing Center, Spain); Jose L Bosque and Ramon Beivide (University of Cantabria, Spain); Eduard Ayguade (Universitat Politècnica de Catalunya and Barcelona Supercomputing Center, Spain); Jesús Labarta (Barcelona Supercomputing Center, Spain); Mateo Valero (Universidad Politécnica de Cataluña, Spain)


TECfan: Coordinating Thermoelectric Cooler, Fan, and DVFS for CMP Energy Optimization
Wenli Zheng, Kai Ma and Xiaorui Wang (The Ohio State University, USA)


Utility Maximizing Thread Assignment and Resource Allocation
Pan Lai, Rui Fan, Wei Zhang and Fang Liu (Nanyang Technological University, Singapore)



Session 12
Scientific Applications (1)

Session Chair: Kamesh Madduri


A Hybrid Decomposition Parallel Algorithm for Multi-Scale Simulation of Viscoelastic Fluids
Xiao-Wei Guo, Xin-hai Xu, Qian Wang, Hao Li, Xiao-Guang Ren, Liyang Xu and Xuejun Yang (National University of Defense Technology, P.R. China)


A Hartree-Fock Application Using UPC++ and the New DArray Library
David Ozog (University of Oregon, USA); Amir Kamil, Yili Zheng and Paul H. Hargrove (Lawrence Berkeley National Laboratory, USA); Jeff Hammond (Intel Labs, USA); Allen D. Malony (University of Oregon, USA); Wibe De Jong and Katherine Yelick (Lawrence Berkeley National Laboratories, USA)


A Fast Selected Inversion Algorithm for Green's Function Calculation in Many-body Quantum Monte Carlo Simulations

Chengming Jiang, Zhaojun Bai and Richard Scalettar (University of California, Davis, USA)


6:00 PM – 8:00 PM

NVIDIA Tutorial for University Educators: Teach GPU-Accelerated Computing with the New NVIDIA Teaching Kit


Dr. Wen-Mei Hwu from University of Illinois (UIUC) will lead a hands-on tutorial that introduces the GPU Teaching Kit for Accelerated Computing for use in university courses… Read more

WEDNESDAY - 25 May 2016


Keynote Session
8:30 AM – 9:30 AM

Keynote Speech

Session Chair: Jeffrey K Hollingsworth


Thomas Pawlowski

Memory, Storage and Processing in Future Parallel and Distributed Processing Systems


Abstract: This is perhaps the most exciting time in the short yet eventful 71 year history of Turing-complete computing. We are in the early but visible stage of an exponential explosion of data and analyses thereof.  We simultaneously have witnessed the cessation of several exponential scaling-related trends and a slowdown of technology scaling itself. Technology scaling… Read More

Morning Break 9:30 AM - 10:00 AM

Parallel Technical Sessions 13, 14, 15, & 16
10:00 AM - 12:00 PM

Session 13
Clustering & Partitioning

Session Chair: Ananth Kalyanaraman


A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors
Olivier Beaumont (Inria, France); Lionel Eyraud-Dubois (INRIA Bordeaux Sud-Ouest and University of Bordeaux, France); Thomas Lambert (Inria, France)


Structural Clustering: A New Approach to Support Performance Analysis At Scale
Matthias Weber, Ronny Brendel and Tobias Hilbrich (Technische Universität Dresden, Germany); Kathryn Mohror and Martin Schulz (Lawrence Livermore National Laboratory, USA); Holger Brunst (Technische Universitaet Dresden, Germany)


PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures
Md. Mostofa Ali Patwary (Intel Labs, USA); Nadathur Satish (Intel Corporation, USA); Narayanan Sundaram (Intel Labs, USA); Jialin Liu (Lawrence Berkeley National Laboratory, USA); Peter Sadowski (UC Irvine, USA); Evan Racahc, Surendra Byna, Wahid Bhimji, Craig Tull and Mr Prabhat (Lawrence Berkeley National Laboratory, USA); Pradeep Dubey (Intel Corporation, USA)


DataNet: A Data Distribution-aware Method for Sub-dataset Analysis on Distributed File Systems
Jun Wang, Jiangling Yin, Jian Zhou and Xuhong Zhang (University of Central Florida, USA)



Session 14
Accelerated Computing

Session Chair: Erik Saule


Synchronization Trade-offs in GPU Implementations of Graph Algorithms
Rashid Kaleem (University of Texas at Austin, USA); Anand Venkat (University of Utah, USA); Sreepathi Pai (ICES, UT Austin, USA); Mary Hall (University of Utah, USA); Keshav Pingali (University of Texas at Austin, USA)


Eliminating Intra-warp Load Imbalance in Irregular Nested Patterns Via Collaborative Task Engagement
Farzad Khorasani, Bryan Rowe,  Rajiv Gupta and Laxmi Bhuyan (University of California Riverside, USA)


Compiler-Assisted Workload Consolidation for Efficient Dynamic Parallelism on GPU
Hancheng Wu, Da Li and Michela Becchi (University of Missouri - Columbia, USA)


OpenACC to FPGA: A Framework for Directive-based High-Performance Reconfigurable Computing
Seyong Lee and Jungwon Kim (Oak Ridge National Laboratory, USA); Jeffrey S Vetter (Oak Ridge National Laboratory and Georgia Institute of Technology, USA)



Session 15
Memory Hieracrchy

Session Chair: Nathan Tallent


Architecting and Programming a Hardware-Incoherent Multiprocessor Cache Hierarchy
Wooil Kim (University of Illinois, USA); Sanket Tavarageri (The Ohio State University, USA); Ponnuswamy Sadayappan (Ohio State University, USA); Josep Torrellas (University of Illinois at Urbana-Champaign, USA)


Refree: A Refresh-Free Hybrid DRAM/PCM Main Memory System
Bahareh Pourshirazi and Zhichun Zhu (University of Illinois at Chicago, USA)


Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches
Jagadish Kotra, Mohammad Arjomand, Diana Guttman, Mahmut Taylan Kandemir and Chita R. Das (The Pennsylvania State University, USA)


Evaluating and Improving Thread-Level Speculation in Hardware Transactional Memories
Juan Salamanca (University of Campinas, Brazil); J. Nelson Amaral (University of Alberta, Canada); Guido Araujo (University of Campinas, Brazil)



Session 16
Optimization Techniques

Session Chair: Martin Schulz


Enabling Application Scalability and Reproducibility by Reducing System Noise with SMT
Edgar A. Leon, Ian Karlin and Adam Moody (Lawrence Livermore National Laboratory, USA)


Key/Value-enabled Flash Memory for Complex Scientific Workflows with On-line Analysis and Visualization
Stefan Eilemann, Fabien Delalondre, Jon Bernard, Judit Planas and Felix Schürmann (Ecole Polytechnique Fédérale de Lausanne, Switzerland); John Biddiscombe (CSCS, Swiss National Supercomputing Centre, Switzerland); Costas Bekas and Alessandro Curioni (IBM Zurich Research Laboratory, Switzerland); Bernard Metzler (IBM Research GmbH, Switzerland); Peter Kaltstein, Peter Morjan and Joachim Fenkes (IBM Deutschland Research and Development GmbH, Germany); Ralph Bellofatto and Lars Schneidenbach (IBM T. J. Watson Research Center Yorktown Heights, USA); Chris Ward (IBM, United Kingdom); Blake Fitch (IBM, USA)


Fast Classification of MPI Applications Using Lamport's Logical Clocks
Zhou Tong (Florida State University, USA); Scott Pakin and Michael Lang (Los Alamos National Laboratory, USA); Xin Yuan (Florida State University, USA)


Online-Autotuning of Parallel SAH kD-Trees
Martin Tillmann, Philip Pfaffe, Christopher Kaag and Walter F. Tichy (Karlsruhe Institute of Technology, Germany)

Parallel Technical Sessions 17, 18, 19, & 20
1:30 PM - 3:30 PM

Session 17
Communication Efficiency & Avoidance Algorithms

Session Chair: Sivasankaran Rajamanickam


Polynomial-time Construction of Optimal MPI Derived Datatype Trees
Robert Ganian, Martin Kalany and Stefan Szeider (Vienna University of Technology, Austria); Jesper Larsson Träff (Vienna University of Technology and Faculty of Informatics, Institute of Information Systems, Austria)


Write-Avoiding Algorithms
Erin Carson (New York University, USA); James Demmel (University of California at Berkeley, USA); Laura Grigori (INRIA, France); Nicholas Knight and Penporn Koanantakool (University of California at Berkeley, USA); Oded Schwartz (Hebrew University, Israel); Harsha Vardhan Simhadri (Lawrence Berkeley National Lab, USA)


Communication Efficient Algorithms for Top-k Selection Problems
Lorenz Hübschle-Schneider and Peter Sanders (Karlsruhe Institute of Technology, Germany)

Minimal Aggregated Shared Memory Messaging on Distributed Memory Supercomputers
Benjamin Jamroz and John M Dennis (National Center for Atmospheric Research, USA)



Session 18
Distributed Algorithms

Session Chair: Shuaiwen Song


Never Say Never Probabilistic & Temporal Failure Detectors
Dacfey Dzung (ABB Ltd. Corporate Research, Switzerland); Rachid Guerraoui (Swiss Federal Institute of Technology, Switzerland); David Kozhaya (EPFL, Switzerland); Yvonne-Anne Pignolet (ABB Ltd. Corporate Research, Switzerland)


Gathering a Closed Chain of Robots on a Grid
Daniel Jung, Matthias Fischer, Friedhelm MeyerAufDerHeide, Sebastian Abshoff and Andreas Cord-Landwehr (University of Paderborn, Germany)


On Competitive Algorithms for Approximations of Top-k-Position Monitoring of Distributed Streams
Manuel Malatyali, Alexander Mäcker and Friedhelm Meyer auf der Heide (Heinz Nixdorf Institute, University of Paderborn, Germany)


Towards a Restrained Use of Non-equivocation for Achieving Iterative Approximate Byzantine Consensus
Li Chuanyou (Southeast University, P.R. China); Michel Hurfin (INRIA, France); Yun Wang (Southeast University, P.R. China); Lei Yu (Wuhan University, P.R. China)



Session 19
I/O and Storage

Session Chair: Fabrizio Petrini


Storage-Optimized Data-Atomic Algorithms for Handling Erasures and Errors in Distributed Storage Systems
Erez Kantor (Northeastern, USA); Kishori Konwar and Nancy Lynch (CSAIL, MIT, USA); Muriel Médard and N. Prakash (MIT, USA); Alexander Shvartsman (University of Connecticut, USA)


Fast Error-bounded Lossy HPC Data Compression with SZ
Sheng Di (Argonne National Laboratory, USA); Franck Cappello (Argonne National Laboratory, University of Illinois at Urbana Champaign, USA and Inria, France)


I/O Aware Power Shifting
Lee Savoie and David Lowenthal (University of Arizona, USA); Bronis R. de Supinski, Tanzima Islam, Kathryn Mohror, Barry L Rountree and Martin Schulz (Lawrence Livermore National Laboratory, USA)


On the Root Causes of Cross-application I/O Interference in HPC Storage Systems
Orcun Yildiz (INRIA Rennes, France); Matthieu Dorier (Argonne National Laboratory, USA); Shadi Ibrahim (INRIA Rennes, France); Robert Ross (Argonne National Laboratory, USA); Gabriel Antoniu (INRIA Rennes - Bretagne Atlantique, France)



Session 20
Scientific Applications (2)

Session Chair: Darren Kerbyson


Exploiting Variant-based Parallelism for Data Mining of Space Weather Phenomena
Michael Gowanlock, David Blair and Victor Pankratius (Massachusetts Institute of Technology, USA)


Solving Open MIP Instances with ParaSCIP on Supercomputers Using Up to 80,000 Cores
Yuji Shinano (Zuse Institute Berlin, Germany); Tobias Achterberg (Gurobi GmbH, Germany); Timo Berthold and Stefan Heinz (Fair Isaac Germany GmbH, Germany); Thorsten Koch (Zuse Institute Berlin, Germany); Michael Winkler (Gurobi GmbH, Germany)


AAlign: A SIMD Framework for Pairwise Sequence Alignment on X86-Based Multi- And Many-core Processors
Kaixi Hou, Hao Wang and Wu-chun Feng (Virginia Tech, USA)


Mendel: A Distributed Storage Framework for Similarity Searching Over Sequencing Data
Cameron Tolooee, Sangmi Pallickara and Asa Ben-Hur (Colorado State University, USA)

Afternoon Break 3:30 PM - 4:00 PM

Community Summit
4:00 PM – 5:30 PM

COMMUNITY SUMMIT: The Road Ahead for the IPDPS Community 


This Community Summit, hosted by the IPDPS Steering Committee, will be an opportunity to discuss ideas for keeping pace with the times and continuing to build on the strengths of IPDPS. We will launch a program for gathering proposals and comments from the community with the plan to "summit" again in 2017 to see where things stand and what we learned. Return here closer to the conference for details.


PhD Forum Special Session

5:30 PM – 7:00 PM

Posters on Display


IPDPS Attendees Invited to View Posters and Talk with Student Presenters

JPDC Reception

6:00 PM – 7:00 PM

Hosted by Elsevier:


Introducing the new edition of Journal of Parallel & Distributed Processing

Symposium Banquet

After 7:00 PM

Banquet will open with short concert by Chinese String Band


Hosted by IPDPS 2016 General Chair Xian-He Sun

THURSDAY - 26 May 2016


Keynote Session
8:30 AM - 9:30 AM

Keynote Speech

Session Chair: Michela Taufer


Katrin Heitmann
Argonne National Laboratory   

Unlocking the Mysteries of the Universe with Supercomputers


Abstract: Cosmology is in a scientifically very exciting phase. Two decades of surveying the sky have culminated in the celebrated "Cosmological Standard Model''. Yet, two of its key pillars, dark matter and dark energy -- together accounting for 95% of the mass-energy of the Universe -- remain mysterious.  Deep fundamental questions… Read More

Morning Break 9:30 AM - 10:00 AM

Best Papers

10:00 AM - 12:00 PM

Session Best Papers

Session Chair: Jeff K Hollingsworth


ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines
Aleksandar Zlateski and Kisuk Lee (Massachusetts Institute of Technology, USA); H. Sebastian Seung (Princeton University, USA)


Stochastic Matrix-Function Estimators Scalable Big-Data Kernels with High Performance
Peter Staar and Panagiotis Barkoutsos (IBM Zurich Research Laboratory, Switzerland); Roxana Istrate (IBM ZRL, Switzerland); A. Cristiano I. Malossi (IBM ZRL, Switzerland); Ivano Tavernelli, Nikolaj Moll and Heiner Giefers (IBM ZRL, Switzerland); Christoph Hagleitner (IBM ZRL, Switzerland); Costas Bekas and Alessandro Curioni (IBM ZRL, Switzerland)


Discrete Cache Insertion Policies for Shared Last Level Cache Management on Large Multicores
Aswinkumar Sridharan (INRIA, France); André Seznec (Irisa/Inria, France)


Massively Parallel First-Principles Simulation of Electron Dynamics in Materials
Erik Draeger and Xavier Andrade (Lawrence Livermore National Laboratory, USA); John Gunnels (IBM T. J. Watson Research Center, USA); Abhinav Bhatele (Lawrence Livermore National Laboratory, USA); Andre Schleife (University of Illinois, Urbana-Champaign, USA); Alfredo Correa (Lawrence Livermore National Laboratory, USA)

Parallel Technical Sessions 21, 22, 23 & 24
1:30 PM - 3:30 PM

Session 21
Numerical Algorithms

Session Chair: Yves Robert


Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication
Penporn Koanantakool (University of California at Berkeley, USA); Ariful Azad, Aydin Buluc and Dmitriy Morozov (Lawrence Berkeley National Laboratory, USA); Sang-Yun Oh (University of California, Santa Barbara, USA); Leonid Oliker (Lawrence Berkeley National Laboratory, USA); Katherine Yelick (University of California at Berkeley, USA)


Petascale Local Time Stepping for the ADER-DG Finite Element Method
Alexander Breuer (Technische Universität München, Germany); Alexander Heinecke (Intel Corporation, USA); Michael Bader (Technische Universität München, Germany)


Asymptotic Optimality of Parallel Short Division
Niall Emmart and Charles Weems (University of Massachusetts, USA)


High Performance Parallel Stochastic Gradient Descent in Shared Memory
Scott Sallinen (University of British Columbia, Canada); Nadathur Satish (Intel Corporation, USA); Mikhail Smelyanskiy and Samantika Sury (Intel Corporation, USA); Christopher Ré (Stanford University, USA)



Session 22
Graphs and Tensors

Session Chair: Bora Uçar


Optimal Algorithms for Graphs and Images on a Shared Memory Mesh
Yujie An and Quentin Stout (University of Michigan, USA)


Parallel Graph Coloring for Manycore Architectures
Mehmet Deveci, Erik G. Boman, Karen D Devine and Sivasankaran Rajamanickam (Sandia National Laboratories, USA)


A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization
Shaden Smith and George Karypis (University of Minnesota, USA)


Parallel Tensor Compression for Large-Scale Scientific Data
Woody Austin (University of Texas, USA); Grey Ballard and Tamara Kolda (Sandia National Laboratories, USA)



Session 23
Runtime Systems

Session Chair: Karen L Karavanic


GinFlow: A Decentralised Adaptive Workflow Execution Manager
Javier Rojas Balderrama (University of Rennes 1 / INSERM, France); Matthieu Simonin (INRIA, France); Cedric Tedeschi (University of Rennes I / INRIA, France)


Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
Nikolaos Papakonstantinou (FORTH-ICS, Greece); Foivos S. Zakkak (University of Crete and FORTH-ICS, Greece); Polyvios Pratikakis (FORTH-ICS, Greece)


MPMD Framework for Offloading Load Balance Computation
Olga Pearce, Todd Gamblin, Bronis R. de Supinski and Martin Schulz (Lawrence Livermore National Laboratory, USA); Nancy Amato (Texas A&M University, USA)


Integrating Abstractions to Enhance the Execution of Distributed Applications
Matteo Turilli (Rutgers University, USA); Feng Liu (University of Minnesota, USA); Zhao Zhang (University of California, Berkeley, USA); Andre Merzky (LSU, USA); Michael Wilde (University of Chicago, Argonne National Laboratory, USA); Jon Weissman (University of MInnesota, Twin Cities, USA); Daniel S. Katz (University of Chicago, USA); Shantenu Jha (Rutgers University, USA)



Session 24

Session Chair: Michael Lam


cusFFT: A High-Performance Sparse Fast Fourier Transform Algorithm on GPUs
Cheng Wang (University of Houston, USA); Sunita Chandrasekaran (University of Delaware, USA); Barbara Chapman (University of Houston, USA)


Balancing Scalar and Vector Execution on GPU Architectures
Zhongliang Chen and David Kaeli (Northeastern University, USA)


Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-enabled System
Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Dip Sankar Banerjee, Hari Subramoni and Dhabaleswar Panda (The Ohio State University, USA)


Online Algorithm-Based Fault Tolerance for Cholesky Decomposition on Heterogeneous Systems with GPUs
Jieyang Chen, Xin Liang and Zizhong Chen (University of California, Riverside, USA)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical Sessions 25, 26, 27 & 28
4:00 PM - 6:00 PM

Session 25

Session Chair: Sanjay Chatterjee


Reusable Resource Scheduling Via Colored Interval Covering
Venkat Chakravarthy and Sreyash D Kenkre (IBM Research, India); Sakib A. Mondal (Flipkart Internet Pvt Ltd, India); Vinayaka D Pandit and Yogish Sabharwa (IBM Research, India)


Partitioned Feasibility Tests for Sporadic Tasks on Heterogeneous Machines
Shaurya Ahuja, Kefu Lu and Benjamin Moseley (Washington University in St. Louis, USA)


Are Static Schedules So Bad ? A Case Study on Cholesky Factorization
Emmanuel Agullo (INRIA / LaBRI, France); Olivier Beaumont (Inria, France); Lionel Eyraud-Dubois (INRIA Bordeaux Sud-Ouest and University of Bordeaux,, France); Suraj Kumar (University of Bordeaux and INRIA Bordeaux, France)



Session 26
System Software

Session Chair: Andrew Lumsdaine


Optimization of MPI Collective Communication on Fat-tree Networks
Sameer Kumar (IBM Research, India); Sameh Sharkawi (IBM Systems and Technology Group, USA); Nysal K. A. Jan (IBM Systems and Technology Group, India)


On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel
Balazs Gerofi, Masamichi Takagi and Atsushi Hori (RIKEN Advanced Institute for Computational Science, Japan); Gou Nakamura and Tomoki Shirasawa (Hitachi Solutions, Ltd., Japan); Yutaka Ishikawa (University of Tokyo, Japan)


ZCCloud: Exploring Wasted Green Power for High-Performance Computing
Fan Yang (University of Chicago, USA); Andrew A Chien (University of Chicago and Argonne National Laboratory, USA)


Agile Live Migration of Virtual Machines
Umesh Deshpande (IBM Research, USA); Danny Chan, Ten-Young Guh, James Edouard and Kartik Gopalan (State University of New York at Binghamton, USA); Nilton Bila (IBM Research, USA)



Session 27
Security & Fault Tolerance

Session Chair: Frederic Vivien


Lazy Repair for Addition of Fault-tolerance
Yiyan Lin, Mohammad Roohitavaf and Sandeep Kulkarni (Michigan State University, USA)


Security RBSG: Protecting Phrase Change Memory with Security-Level Adjustable Dynamic Mapping
Fangting Huang, Dan Feng and Wen Xia (Huazhong University of Science and Technology, P.R. China); Wen Zhou (Wuhan National Lab for Optoelectronics, School of Computer Science and Technology and Huazhong University of Science and Technology, P.R. China); Yucheng Zhang, Min Fu, Chuntao Jiang and Yukun Zhou (Huazhong University of Science and Technology, P.R. China)


Mitigation of Denial of Service Attack with Hardware Trojans in NoC Architectures
Travis Boraten and Avinash Kodi (Ohio University, USA)


CRC-based Memory Reliability for Task-parallel HPC Applications
Omer Subasi, Osman Unsal and Jesús Labarta (Barcelona Supercomputing Center, Spain); Gulay Yalcin (Abdullah Gul University and Barcelona Supercomputing Center, Turkey); Adrian Cristal (Barcelona Supercomputing Center, Spain)



Session 28
Data Streaming

Session Chair: Cynthia A Phillips


Differentiated Scheduling of Response-Critical and Best-Effort Wide-Area Data Transfers
Rajkumar Kettimuthu (Argonne National Lab, USA); Gagan Agrawal and Ponnuswamy Sadayappan (The Ohio State University, USA); Ian Foster (University of Chicago, USA)


High Performance Pattern Matching Using the Automata Processor
Indranil Roy (Micron Technology, Inc., USA); Ankit Srivastava (Georgia Institute of Technology, USA); Marziyeh Nourian and Michela Becchi (University of Missouri-Columbia, USA); Srinivas Aluru (Georgia Institute of Technology and Indian Institute of Technology Bombay, USA)


GPU-accelerated Outlier Detection for Continuous Data Streams
Chandima Hewanadungodage, Yuni Xia and John Lee (IUPUI, Indiana University – Purdue University Indianapolis, USA)


Neptune: Real Time Stream Processing for Internet of Things and Sensing Environments
Thilina Buddhika and Shrideep Pallickara (Colorado State University, USA)

FRIDAY - 27 May 2016


* See each individual
workshop program
for schedule details






High-Performance, Power-Aware Computing



Workshop on Parallel and Distributed Scientific and Engineering Computing



Dependable Parallel, Distributed  and Network-Centric Systems



Large-Scale Parallel Processing



Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics



Workshop on Job Scheduling Strategies for Parallel Processing



International Workshop on Automatic Performance Tuning



Chapel Implementers and Users Workshop



High-Performance Big Data Computing



Monitoring and Analysis for High Performance Computing Systems Plus Applications



Emerging Parallel and Distributed Runtime Systems and Middleware



Parallel and Distributed Processing for Computational Social Systems



IPDPS 2015 Information on Keynote Speakers

IPDPS 2016 Tuesday

Kai Li
Princeton University
Disruptive Research and Innovation

Abstract: Ever since Clayton Christensen coined the terms "disruptive technologies" and "disruptive innovations" in 1990s, researchers and entrepreneurs love the word "disruptive" because disrupting current knowledge or products help us accelerate knowledge discoveries and moving the society into a new era. What is disruptive research? What is disruptive innovation? How do they happen? To answer such questions, in this talk, I will share my experience from co-leading the ImageNet project which built a knowledge base for computer vision and machine learning community, and from co-founding Data Domain, Inc. which built deduplication storage ecosystems to replace tape library infrastructure in data centers.

Bio: Kai Li is a Paul M. Wythes '55, P'86 and Marcia R. Wythes P'86 Professor at Princeton University, where he joined the faculty in 1986. He received his Ph.D. from Yale University, M.S. from Chinese Academy of Sciences, and B.S. from Jilin University. His research areas include operating systems, parallel and distributed systems, storage systems, and analysis of large data. He pioneered Distributed Shared Memory (DSM), allowing shared-memory programming on a cluster of computers. His group proposed user-level DMA mechanism for efficient cluster communication, which evolved into the RDMA standard of Infiniband. He co-led the ImageNet project which enabled the computer vision and machine learning community to accelerate their advances. He co-founded Data Domain, Inc. and led the innovation of deduplication storage system products to replace tape libraries at data centers. For Data Domain, he served in roles of chief executive officer, chief technology officer and chief scientist. He is an ACM fellow, an IEEE fellow and a member of National Academy of Engineering.


IPDPS 2016 Wednesday

Thomas Pawlowski
Memory, Storage and Processing in Future Parallel and Distributed Processing Systems

Abstract: This is perhaps the most exciting time in the short yet eventful 71 year history of Turing-complete computing. We are in the early but visible stage of an exponential explosion of data and analyses thereof. We simultaneously have witnessed the cessation of several exponential scaling-related trends and a slowdown of technology scaling itself. Technology scaling will be discussed in this talk. We will zero in on the salient features of a new epoch in the operation of processing systems. We will discuss the new balance in algorithms, architectures, technology selection, components and their usage. New technologies will be presented, showing the potential of some new concepts. Considerations for memory and storage scale-up and scale-out will be examined. Finally we will conclude with a view of our challenges and opportunities for research and collaboration.

Bio: J. Thomas Pawlowski is a Fellow and Chief Technologist with Micron's Architecture Development Group. His responsibilities include advising on new technologies, investments and system/memory/storage architectures. For the past twenty-five years at Micron Mr. Pawlowski has had the pleasure of making key technical contributions to many new memory and system architectures such as synchronous burst pipelined SRAM; hierarchical cache systems; Zero Bus Turnaround SRAM; abstracted memory; double data rate memory; Pseudo-Static RAM; high-speed NAND; double address rate memory; quad data rate SRAM; multi-channel memory; memories on SERDES buses; Reduced Latency DRAM; new refresh schemes; 3D memory; the Non-Deterministic Finite Automata Processor; abstraction protocols; new ECC concepts; processing near memory concepts; 3D Xpoint system architecture and others yet to be announced. Mr. Pawlowski earned a bachelor of applied science degree in electrical engineering, summa cum laude, from the University of Waterloo in Canada. He has well over 100 U.S. and in-flight patents and serves on several advisory boards and conference program committees.


IPDPS 2016 Thursday

Katrin Heitmann
Argonne National Laboratory
Unlocking the Mysteries of the Universe with Supercomputers

Abstract: Cosmology is in a scientifically very exciting phase. Two decades of surveying the sky have culminated in the celebrated ``Cosmological Standard Model''. Yet, two of its key pillars, dark matter and dark energy -- together accounting for 95% of the mass-energy of the Universe -- remain mysterious. Deep fundamental questions demand answers; to address these burning questions, survey capabilities are being exponentially improved. The new observations will pose tremendous challenges on many fronts -- from the sheer size of the data that will be collected to its modeling and interpretation. The interpretation of the data requires sophisticated simulations on the world's largest supercomputers.

In this talk I will introduce HACC, the Hardware/Hybrid Accelerated Cosmology Code, which is being developed to combat the tremendous computational challenge to simulate our Universe. HACC is a new and evolving cosmology N-body code framework, designed to run very efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available today. I present a description of the design philosophy of HACC and underlying code structure and outline some implementation details. I will also briefly describe the analysis challenges posed by the large data sets that the HACC simulations generate. Finally, I will discuss some results from our recent work on confronting the simulated with the real Universe.

Bio: Katrin Heitmann is a member of the scientific staff at Argonne National Laboratory in High Energy Physics and Mathematics and Computational Science Divisions. She is also a Senior Fellow at the Computation Institute and the Kavli Institute for Cosmological Physics at the University of Chicago. Her research focuses on physical cosmology, advanced statistical methods, and large scale computing. Heitmann received her PhD in 2000 at the University of Dortmund (Germany), held a postdoctoral position and later a staff position at Los Alamos National Laboratory before she joined Argonne in 2011. She is a member of the American Physical Society.

Search IPDPS


2016 Registration

March 27th Deadline for
Advance Registration

Registration Details

Follow IPDPS


Tweets by @IPDPS

IPDPS 2014 Report

May 25-29, 2015
Hyderabad International Convention Centre
Hyderabad, INDIA