General IPDPS Info

Sponsors

IN COOPERATION WITH

DIAMOND INDUSTRY PARTNER

Intel, the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other Countries.

GOLD INDUSTRY PARTNER

nvidia

IPDPS 2017 Advance Program

The following program is slightly modified from the version posted on March 8, 2017. We apologize for any inconvenience due to changes in the assignment of papers to sessions. Please visit the IPDPS website regularly for updates. Authors who have corrections should send email tocontact@ipdps.org giving full details.

MONDAY - 29 May 2017

DAYSMondayTuesdayWednesdayThursdayFriday

MONDAY WORKSHOPS
ALL DAY*
* See each individual
workshop program
for schedule details

 

IPDPS 2017 WORKSHOPS – MONDAY 29 MAY
1 HCW

Heterogeneity in Computing Workshop

2

RAW

Reconfigurable Architectures Workshop

3

HiComb

High Performance Computational Biology

4

EduPar

NSF/TCPP W. on Parallel and Distributed Computing Education

5

ParLearning

Parallel and Distributed Computing for Machine Learning and Big Data Analytics

6

PDCO

Parallel / Distributed Computing and Optimization

7

GABB

Graph Algorithms Building Blocks

8

AsHES

Accelerators and Hybrid Exascale Systems

9

HIPS

High Level Programming Models and Supporting Environments

10

APDCM

Advances in Parallel and Distributed Computational Models

11

HPPAC

High-Performance, Power-Aware Computing

12

HPBDC

High-Performance Big Data Computing

 

Reception

6:00 PM -7:30 PM
IPDPS - TCPP Welcome Reception

TUESDAY - 30 May 2017

DAYSMondayTuesdayWednesdayThursdayFriday

Opening Session
8:00 AM - 8:30 AM

Opening Session

Keynote Session
8:30 AM - 9:30 AM

Keynote Speech

 

Session Chair: Marc Snir

 

Tandy Warnow
University of Illinois       

Computational Challenges in Constructing the Tree of Life

 

Abstract: Estimating the Tree of Life is one of the grand computational challenges in Science, and has applications to many areas of science and biomedical research.  Read more

Morning Break 9:30 AM -10:00 AM

PhD Forum
Starts on Tuesday

PhD Forum Posters
On Display All Day Tuesday and Wednesday

More details to be announced

All Day Industry Exhibits

Parallel Technical
Sessions 1, 2, 3, & 4

10:00 AM - 12:00 PM

Sesson 1
Graph Algorithms

 

Session Chair: Joseph Jaja

 

Monitoring Properties of Large, Distributed, Dynamic Graphs
Gal Yehuda, Daniel Keren and Islam Akaria

 

Parallel Construction of Suffix Trees and the All-Nearest-Smaller-Values Problem
Patrick Flick and Srinivas Aluru

 

The Reverse Cuthill-McKee Algorithm in Distributed-Memory
Ariful Azad, Mathias Jacquelin, Aydin Buluc and Esmond Ng

 

SlimSell: A Vectorizable Graph Representation for Breadth-First Search
Maciej Besta, Florian Marending, Edgar Solomonik and Torsten Hoefler

 

 

Session 2
Computational Biology

 

Session Chair: Cynthia A Phillips

 

SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence

Haidong Lan, Weiguo Liu, Yongchao Liu and Bertil Schmidt

 

PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment

Yuandong Chan, Kai Xu, Haidong Lan, Weiguo Liu, Yongchao Liu and Bertil Schmidt

 

Eliminating Irregularities of Protein Sequence Search on Multicore  Architectures
Jing Zhang, Sanchit Misra, Hao Wang and Wu-Chun Feng

 

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms
Jie Wang, Xinfeng Xie and Jason Cong


 

Session 3
Caches

 

Session Chair: Sanjeev Baskiyar

 

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management
Bingchao Li, Jizhou Sun, Murali Annavaram and Nam Sung Kim

 

Content-Aware Non-Volatile Cache Replacement
Qi Zeng and Jih-Kwon Peir

 

DEFT-Cache: A Cost-effective and Highly Reliable SSD Cache for RAID Storage
Jiguang Wan, Wei Wu, Ling Zhan, Qing Yang, Xiaoyang Qu and Changsheng Xie

 

Adaptive Software Caching for Efficient NVRAM Data Persistence
Pengcheng Li, Dhruva R. Chakrabarti, Chen Ding and Liang Yuan

 

 

Session 4
Cloud & OS

 

Session Chair: Ümit Çatalyürek

 

Container-Based Cloud Platform for Mobile Computation Offloading
Song Wu, Chao Niu, Jia Rao, Hai Jin and Xiaohai Dai

 

Enhancing Datacenter Resource Management through Temporal Logic Constraints
Hao He, Jiang Hu and Dilma Da Silva

 

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV enabled InfiniBand Clusters
Jie Zhang, Xiaoyi Lu and Dhabaleswar K. (DK) Panda

 

Argo NodeOS: Toward Unified Resource Management for Exascale

Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, Brian C. Van Essen, Roberto Gioiosa, Kamil Iskra, Maya B. Gokhale, Kazutomo Yoshii and Pete Beckman
Parallel Technical Sessions 5, 6, 7, & 8
1:30 PM - 3:30 PM

Session 5
Distributed Algorithms

 

Session Chair: Pierre Fraigniaud

 

Rational Fair Consensus in the GOSSIP Model
Andrea Clementi, Luciano Gualà, Guido Proietti and Giacomo Scornavacca

 

Leader Election in a Smartphone Peer-to-Peer Network
Calvin Newport

 

Leader Election in Asymmetric Labeled Unidirectional Rings
Karine Altisen, Ajoy K. Datta, Stéphane Devismes, Anaïs Durand and Lawrence L. Larmore

 

Tight Load Balancing via Randomized Local Search
Petra Berenbrink, Peter Kling, Christopher Liaw and Abbas Mehrabian

 

 

Session 6
Numerical Simulation

 

Session Chair: Alex Pothen

 

Large Scale Manycore-Aware PIC Simulation with Efficient Particle Binning
Hiroshi Nakashima, Yoshiki Summura, Keisuke Kikura and Yohei Miyake

 

Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors
Amrita Mathuriya, Ye Luo, Anouar Benali, Luke Shulenburger and Jeongnim Kim

 

One-Way Wave Equation Migration at Scale on GPUs using Directive-Based Programming
Kshitij Mehta, Maxime Hugues, Oscar Hernandez, Henri Calandra and David E. Bernholdt

 

Towards highly scalable Ab Initio Molecular Dynamics (AIMD) simulations on the Intel Knights Landing manycore processor
Eric Bylaska, Wibe De Jong and Mathias Jacquelin

 

 

Session 7
Novel Architectures

 

Session Chair: CJ Newburn

 

General Purpose Task-Dependence Management Hardware for Task-based Dataflow Programming Models
Xubin Tan, Jaume Bosch, Miquel Vidal, Carlos Álvarez, Daniel Jiménez-González, Eduard Ayguadé and Mateo Valero

 

Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for in-Hardware Explicit Messaging
Halit Dogan, Masab Ahmad, Farrukh Hijaz, Brian Kahne, Peter Wilson and Omer Khan

 

Respin: Rethinking Near-Threshold Multiprocessor Design with Non-Volatile Memory
Xiang Pan, Anys Bacha and Radu Teodorescu

 

MOCHA: Morphable locality and compression aware architecture for convolutional neural networks
Syed M. A. H. Jafri, Ahmed Hemani, Kolin Paul and Naeem Abbas



 

Session 8
Performance Modeling and Tuning

 

Session Chair: Jeffrey Vetter

 

Autotuning Stencil Computations with Structural Ordinal Regression Learning
Biagio Cosenza, Juan J. Durillo, Stefano Ermon and Ben Juurlink

 

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL
Sabela Ramos and Torsten Hoefler

 

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code
David Beckingsale, Olga Pearce, Ignacio Laguna and Todd Gamblin

 

Generating Performance Models for Irregular Applications
Ryan Friese, Nathan R. Tallent, Abhinav Vishnu, Darren J. Kerbyson and Adolfy Hoisie

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical
Sessions 9, 10, 11, & 12

4:00 PM - 6:00 PM

Session 9
Communication & Coordination

 

Session Chair: Sergio Rajbaum

 

Bounded Reordering Allows Efficient Reliable Message Transmission
Keishla D Ortiz-Lopez and Jennifer L. Welch

 

Dynamic Adaptation in Wireless Networks Under Comprehensive Interference Via Carrier Sense
Magnus M. Halldórsson, Tigran Tonoyan, Yuexuan Wang and Dongxiao Yu

 

Fault-Tolerant Online Packet Scheduling on Parallel Channels
Paweł Garncarek, Tomasz Jurdziński and Krzysztof Loryś

 

Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems
Torsten Hoefler, Amnon Barak, Amnon Shiloh and Zvi Drezner

 

 

Session 10
Tools 1

 

Session Chair: Murali Annavaram

 

Dr-BW: Identifying Bandwidth Contention in NUMA Architectures with Supervised Learning
Hao Xu, Shasha Wen, Alfredo Gimenez, Todd Gamblin and Xu Liu

 

Data Centric Performance Measurement Techniques for Chapel Programs
Hui Zhang and Jeffrey K. Hollingsworth

 

A Parallel FastTrack Data Race Detector on Multi-core Systems
Young Wn Song and Yann-Hang Lee

 

Localized Fault Recovery for Nested Fork-Join Programs
Gokcen Kestor, Sriram Krishnamoorthy and Wenjing Ma

 

 

Session 11
Networks

 

Session Chair: Matthias Blumrich

 

Exploring DataVortex Systems for Irregular Applications
Roberto Gioiosa, Antonino Tumeo, Jian Yin, Thomas Warfel, David Haglin and Santiago Betelu

 

DC2-MTCP: Light-weight Coding for Efficient Multi-path Transmission in Data Center Network
Jiyan Sun, Yan Zhang, Xin Wang, Shihan Xiao, Zhen Xu, Hongjing Wu, Xin Chen and Yanni Han

 

A Scalable and Resilient Microarchitecture Based on Multiport Binding for High-radix Router Design
Yi Dai, Kefei Wang, Gang Qu, Liquan Xiao, Dezun Dong and Xingyun Qi

 

Partitioning Low-diameter Networks to Eliminate Inter-job Interference
Nikhil Jain, Abhinav Bhatele, Xiang Ni, Todd Gamblin and Laxmikant V. Kale

 

 

Session 12
Libraries & Frameworks

 

Session Chair: Costin Iancu

 

Accelerating Spark Datasets by inlining deserialization
Jan Wróblewski, Kazuaki Ishizaki, Hiroshi Inoue and Moriyoshi Ohara

 

MRapid: An Efficient Short Job Optimizer on Hadoop
Hong Zhang, Hai Huang and Liqiang Wang

 

Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications
Samuel K. Gutiérrez, Kei Davis, Dorian C. Arnold, Randal S. Baker, Robert W. Robey, Patrick McCormick, Daniel Holladay, Jon A. Dahl, R. Joe Zerr, Florian Weik and Christoph Junghans

 

Multi-GPU Graph Analytics
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang and John D. Owens

Industry
Tutorial

7:00 PM -9:00 PM

NVIDIA Tutorial/Workshop: Deep Learning

 

Presenter: Julie Bernauer
Senior Solutions Architect for Machine Learning and Deep Learning at NVIDIA Corporation

 

For details, see here (PDF).

WEDNESDAY - 31 May 2017

DAYSMondayTuesdayWednesdayThursdayFriday

Keynote Session
8:30 AM – 9:30 AM

Keynote Speech

 

Session Chair: Michela Taufer

 

Mark Seager
Intel

A scalable system architecture to addressing the next generation of predictive simulation workflows with coupled compute and data intensive applications

 

Abstract: Trends in the emerging digital economy are pushing the virtual representation of products and services. Read more

Morning Break 9:30 AM - 10:00 AM

PhD Forum
Starts on Tuesday

PhD Forum Posters

On Display All Day Tuesday and Wednesday


More details to be announced

All Day Industry Exhibits
Parallel Technical Sessions 13, 14, 15, & 16
10:00 AM - 12:00 PM

Session 13
Motion Planning & Similarity Search

 

Session Chair: Allan Sussman

 

Fault-Tolerant Robot Gathering Problems on Graphs With Arbitrary Appearing Times
Manuel Alcántara, Armando Castañeda, David Flores-Peñaloza and Sergio Rajsbaum

Distributed Vehicle Routing Approximation

Akhil Krishnan, Mikhail Markov and Borzoo Bonakdarpour

 

O(log N)-Time Complete Visibility for Asynchronous Robots with Lights
Gokarna Sharma, Ramachandran Vaidyanathan, Jerry L. Trahan, Costas Busch and Suresh Rai

 

Similarity Search on Automata Processors
Vincent T. Lee, Justin Kotalik, Carlo C. del Mundo, Armin Alaghi, Luis Ceze and Mark Oskin

 

 

Session 14
Applications

 

Session Chair: Timothy G Mattson

 

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight
Yulong Ao, Chao Yang, Xinliang Wang, Wei Xue and Haohuan Fu
Yulong Ao, Chao Yang, Xinliang Wang, Wei Xue, Haohuan Fu, Fangfang Liu, Lin Gan, Ping Xu and Wenjing Ma

 

Image-Domain Gridding on Graphics Processors
Bram Veenboer, Matthias Petschow and John W. Romein

 

Aces4: A Platform for Computational Chemistry Calculations with Extremely Large Block-Sparse Arrays
Beverly A. Sanders, Jason N. Byrd, Nakul Jindal, Victor F. Lotrich, Dmitry Lyakh, Ajith Perera and Rodney J. Bartlett

 

PhiOpenSSL: Using Xeon Phi Coprocessor for Efficient Cryptography Calculation
Shun Yao and Dantong Yu

 

 

Session 15
Tools 2

Session Chair: Jeff Hollingsworth

 

Directive-Based Partitioning and Pipelining for Graphics Processing Units
Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski and Wu-chun Feng

 

ScalaIOExtrap: Elastic I/O Tracing and Extrapolation
Xiaoqing Luo, Frank Mueller, Philip Carns, Jonathan Jenkins, Robert Latham, Robert Ross and Shane Snyder

 

SimProf: A Sampling Framework for Data Analytic Workloads
Jen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim and Hyesoon Kim

 

PaPar: A Parallel Data Partitioning Framework for Big Data Applications
Hao Wang, Jing Zhang, Da Zhang, Sarunya Pumma and Wu-Chun Feng



Session 16
Data and Graph Analytics

 

Session Chair: Aydin Buluc

 

swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight
Jiarui Fang, Haohuan Fu, Wenlai Zhao, Bingwei Chen, Weijie Zheng and Guangwen Yang

 

Community Detection on the GPU
Md. Naim, Fredrik Manne, Mahantesh Halappanavar and Antonino Tumeo

 

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores
Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin and Weimin Zheng

 

Partitioning Trillion-edge Graphs in Minutes

George M. Slota, Sivasankaran Rajamanickam, Kamesh Madduri and Karen Devin


Parallel Technical Sessions 17, 18, 19, & 20
1:30 PM – 3:30 PM

Session 17
Linear Algebra

 

Session Chair: Olivier Baumont

 

Generating Families of Practical Fast Matrix Multiplication Algorithms
Jianyu Huang, Leslie Rice, Devin A. Matthews and Robert A. van de Geijn

 

Bidiagonalization with Parallel Tiled Algorithms
Mathieu Faverge, Julien Langou, Yves Robert and Jack Dongarra

 

Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations
Tobias Wicky, Edgar Solomonik and Torsten Hoefler

 

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm
Ariful Azad and Aydin Buluç

 

 

Session 18
Power Management

 

Session Chair: Frank Mueller

 

Power Efficient Sharing-Aware GPU Data Management
Abdulaziz Tabbakh, Murali Annavaram and Xuehai Qian

 

Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-Chip
Rahul Boyapati, Jiayi Huang, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum and Eun Jung Kim

 

RCube: A Power Efficient and Highly Available Network for Data Centers
Zhenhua Li and Yuanyuan Yang

 

Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems
Thang Cao, Wei Huang, Yuan He and Masaaki Kondo

 

 

Session 19
Scheduling

 

Session Chair: Emmanuel Jeannot

 

Algorithms for hierarchical and semi-partitioned parallel scheduling
Vincenzo Bonifaci, Gianlorenzo D'Angelo and Alberto Marchetti-Spaccamela

 

Efficient and Deterministic Scheduling for Parallel State Machine Replication
Odorico M. Mendizabal, Rudá S. T. De Moura, Fernando Luís Dotti and Fernando Pedone

 

Dynamic memory-aware task-tree scheduling
Guillaume Aupy, Clément Brasseur and Loris Marchal

 

Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs
Olivier Beaumont, Lionel Eyraud-Dubois and Suraj Kumar

 

 

Session 20
Code Optimization

 

Session Chair: Stephen Olivier

 

Automatic Collapsing of Non-Rectangular Loops
Philippe Clauss, Ervin Altıntaş and Matthieu Kuhn

 

HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems
Yonghong Yan, Jiawen Liu, Kirk W. Cameron and Mariam Umar

 

Multigrain Parallelism: Bridging Coarse-Grain Parallel Programs and Fine-Grain Event-Driven Multithreading
Jaime Arteaga, Stéphane Zuckerman and Guang R. Gao

 

Improving the integration of task nesting and dependencies in OpenMP
Josep M. Perez, Vicenç Beltran, Jesus Labarta and Eduard Ayguade

Afternoon Break 3:30 PM - 4:00 PM

All Conference
Plenary Event

4:00 PM – 5:00 PM

Report to IPDPS Community

Updates on the road ahead for IPDPS

PhD Forum
Special Session

5:30 PM – 7:00 PM

Posters on Display

IPDPS Attendees Invited to View Posters and Talk with Student Presenters

Reception

6:00 PM – 7:00 PM

Details to be announced

Symposium Banquet

After 7:00 PM

Details to be announced

THURSDAY - 1 June 2017

DAYSMondayTuesdayWednesdayThursdayFriday

Keynote Session
8:30 AM - 9:30 AM

Keynote Speech

 

Session Chair: Viktor Prassana

 

Mateo Valero
Barcelona Supercomputing Center
Runtime Aware Architectures

 

Abstract: In the last years the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore's Law vanished. When uni-cores were the... Read More

Morning Break 9:30 AM - 10:00 AM

PhD Forum

PhD Forum Student Program

 

 Details to be announced

All Day Industry Exhibits
PLENARY SESSION:
Best Papers

10:00 AM - 12:00 PM

Best Papers

 

Session Chair: Marc Snir

 

Reducing Pagerank Communication via Propagation Blocking
Scott Beamer, Krste Asanoviç and David Patterson

 

Clustering Throughput Optimization on the GPU
Michael Gowanlock, Cody M. Rude, David M. Blair, Justin D. Li and Victor Pankratius

 

FlexVC: Flexible Virtual Channel Management in Low-Diameter Networks
Pablo Fuentes, Enrique Vallejo, Ramón Beivide, Cyriel Minkenberg and Mateo Valero

 

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

Benjamin Klenk, Holger Fröning, Hans Eberle and Larry Dennison

Parallel Technical Sessions 21, 22, 23 & 24
1:30 PM - 3:30 PM

Session 21
Algorithms

 

Session Chair: Mathias Jaquelin

 

The SEPO Model of Computation to Enable Larger-than-Memory Hash Tables for GPU-accelerated Big Data Analytics
Reza Mokhtari and Michael Stumm

 

Elastic Consistent Hashing for Distributed Storage Systems
Wei Xie and Yong Chen

 

An NlogN Parallel Fast Direct Solver for Kernel Metrices

Chenhan D. Yu, William  March and George Biros

 

A robust parallel preconditioner for indefinite systems using hierarchical matrices and randomized sampling
Pieter Ghysels, Xiaoye Sherry Li, Christopher Gorman and Francois-Henry Rouet

 

 

Session 22
Coordination

 

Session Chair: Phil Carns

 

FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue
Sergei Arnautov, Pascal Felber, Christof Fetzer and Bohdan Trach

 

Scalable Lock-Free Vector with Combining
Ivan Walulya and Philippas Tsigas

 

Automatic-Signal Monitors with Multi-Object Synchronization
Wei-Lun Hung and Vijay K. Garg

 

Optimal Algorithms for a Mesh-Connected Computer with Limited Additional Global Bandwidth
Yujie An and Quentin Stout

 

 

Session 23
Power Management 2

 

Session Chair: Bronis de Supinski

 

An Adaptive Core-specific Runtime for Energy Efficiency
Sridutt Bhalachandra, Allan Porterfield, Stephen Olivier and Jan Prins

 

Production Hardware Overprovisioning: Real-world Performance Optimization using an Extensible Power-aware Resource Management Framework
Ryuichi Sakamoto, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Tapasya Patki, Daniel Ellsworth, Barry Rountree and Martin Schulz

 

Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems
Qi Zhu, Bo Wu, Xipeng Shen, Li Shen and Zhiying Wang

 

Characterizing and Modeling Power and Energy for Extreme-Scale In-situ Visualization
Vignesh Adhinarayanan, Wu-Chun Feng, David Rogers, James Ahrens and Scott Pakin


 

Session 24
MPI

 

Session Chair: Beverly Sanders

 

Application Level Reordering of Remote Direct Memory Access Operations
Costin Iancu and Wim Lavrijsen

 

Toucan - A Translator for Communication Tolerant MPI Applications
Sergio Martin, Marsha J. Berger and Scott B. Baden

 

Memory Compression Techniques for Network Address Management in MPI
Yanfei Guo, Charles Archer, Michael Blocksome, Scott Parker, Wesley Bland, Ken Raffenetti and Pavan Balaji

 

Transparent Caching for RMA Systems
Salvatore Di Girolamo, Flavio Vella and Torsten Hoefler

 

Application Level Reordering of Remote Direct Memory Access Operations
Costin Iancu and Wim Lavrijsen


Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical Sessions 25, 26, 27 & 28
4:00 PM - 6:00 PM

Session 25
ML & Tensors

 

Session Chair: Torsten Hoefler

 

When Neurons Fail
El Mahdi El Mhamdi and Rachid Guerraoui

 

On Optimizing Distributed Tucker Decomposition for Dense Tensors
Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Xing Liu, Prakash Murali, Yogish Sabharwal and Dheeraj Sreedhar

 

Model-Driven Sparse CP Decomposition for High-Order Tensors
Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun and Richard Vuduc

 

Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory
Shaden Smith, Jongsoo Park and George Karypis



 

Session 26
Resource Management

 

Session Chair: Pavan Balaji

 

Proximity-Aware Balanced Allocations  in Cache Networks
Ali Pourmiri, Mahdi Jafari Siavoshani and Seyed Pooya Shariatpanahi

 

Addressing Performance Heterogeneity in MapReduce Clusters with Elastic Tasks
Wei Chen, Jia Rao and Xiaobo Zhou

 

Autonomic Resource Management for Program Orchestration in Large-scale Data Analysis
Masahiro Tanaka, Kenjiro Taura and Kentaro Torisawa

 

Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems
Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji and Michela Taufer

 

 

Session 27
Compression & Memoization

 

Session Chair: Yves Robert

 

Elastic Data Compression with Improved Performance and Space Efficiency for Flash-based Storage Systems
Bo Mao, Hong Jiang, Suzhen Wu, Yaodong Yang and Zaifa Xi

 

E²MC: Entropy Encoding based Memory Compression for GPUs
Sohan Lal, Jan Lucas and Ben Juurlink

 

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
Dingwen Tao, Sheng Di, Zizhong Chen and Franck Cappello

 

ATM: Approximate Task Memoization in the Runtime System
Iulian Brumar, Marc Casas, Miquel Moretó Planas, Gurindar S. Sohi and Mateo Valero Cortés


 

Session 28
Persistent Memory

 

Session Chair: Swann Perarneau

 

Design and Implementation of Papyrus: Parallel Aggregate Persistent Storage
Jungwon Kim, Kittisak Sajjapongse, Seyong Lee and Jeffrey Vetter

 

Language-Based Optimizations for Persistence on Nonvolatile Main Memory Systems
Joel Denny, Seyong Lee and Jeffrey Vetter

 

MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers
Teng Wang, Adam Moody, Yue Zhu, Kathryn Mohror, Kento Sato, Tanzima Islam and Weikuan Yu

 

Parallelism and Garbage Collection aware I/O Scheduler with Improved SSD Performance

Jiayang Guo, Yiming Hu and Bo Mao


FRIDAY - 2 June 2017

DAYSMondayTuesdayWednesdayThursdayFriday

FRIDAY WORKSHOPS
ALL DAY*
* See each individual
workshop program
for schedule details

 

IPDPS 2017 WORKSHOPS – FRIDAY 2 JUNE
13 CHIUW

Chapel Implementers and Users Workshop

 

LSPP

Large-Scale Parallel Processing: Practices and Experiences (cancelled)

14

PDSEC

Parallel and Distributed Scientific and Engineering Computing

15

JSSPP

Job Scheduling Strategies for Parallel Processors Proposal

16

DPDNS

Dependable Parallel, Distributed and Network-centric Systems

17

IPDRM

Emerging Parallel and Distributed Runtime Systems and Middleware

18

iWAPT

International Workshop on Automatic Performance Tunings

19

ParSocial

Parallel and Distributed Processing for Computational Social System

20

BigDataEco

Big Data Regional Innovation Hubs and Spokes: Accelerating the Big Data Innovation Ecosystem

21

GraML

Graph Algorithms and Machine Learning

22

EMBRACE

Evolvable Methods for Benchmarking Realism and Community Engagement

23

REPPAR

Reproducibility in Parallel Computing

IPDPS 2017 Tuesday
KEYNOTE SPEAKER

Tandy Warnow
University of Illinois       
Computational Challenges in Constructing the Tree of Life           

Abstract: Estimating the Tree of Life is one of the grand computational challenges in Science, and has applications to many areas of science and biomedical research.  Despite intensive research over the last several decades, many problems remain inadequately solved.  Relatively small datasets can take hundreds of CPU years (e.g., the Avian Phylogenomics Project analysis of just 48 bird genomes used more than 200 CPU years to construct its tree), and larger datasets will require much more time. Thus, the estimation of the Tree of Life, which contains millions of species each with a genome containing millions of nucleotides, will depend on both novel algorithmic designs and effective use of high performance and distributed computing platforms.             

Bio: Tandy Warnow is the Founder Professor of Engineering at the University of Illinois at Urbana-Champaign, where she is a faculty member in Computer Science and in Bioengineering.  Tandy received her PhD in Mathematics at UC Berkeley under the direction of Gene Lawler, and did postdoctoral training with Simon Tavaré and Michael Waterman at USC.  She received the National Science Foundation Young Investigator Award in 1994, the David and Lucile Packard Foundation Award in Science and Engineering in 1996, a Radcliffe Institute Fellowship in 2006, and a Guggenheim Foundation Fellowship for 2011. She was elected a fellow of the Association for Computing Machinery (ACM) in 2015, and of the International Society for Computational Biology (ISCB) in 2017. Her current research focuses on phylogeny and alignment estimation for very large datasets (10,000 to 1,000,000 sequences), estimating species trees and phylogenetic networks from collections of gene trees, and metagenomics.    

http://ws.engr.illinois.edu/directory/viewphoto.aspx?id=44946&s=215&type=portrait

IPDPS 2017 Wednesday
KEYNOTE SPEAKER

Mark Seager
Intel
A scalable system architecture to addressing the next generation of predictive simulation workflows with coupled compute and data intensive applications             

Abstract: Trends in the emerging digital economy are pushing the virtual representation of products and services.  Creating these digital twins requires a combination of real time data ingestion, simulation of physical products under real world conditions, service delivery optimization and data analytics as well as ML/DL anomaly detection and decision making.  Quantification of Uncertainty in the simulations will also be a compute and data intensive workflow that will drive the simulation improvement cycle.  Future high-end computing systems designs need to comprehend these types of complex workflows and provide a flexible framework for optimizing the design and operations under dynamic load conditions for them. 

Bio: Mark leads Intel® Scalable System Framework (SSF) strategy for Intel’s Scalable Datacenter Solutions Group and Directs Intel’s contributions to the Open Compute Platforms (opencompute.org). At Intel he is working on an ecosystem approach to develop and build HPC systems with Exascale capabilities.  Before moving to Intel, he was assistant department head for Advanced Computing Technology within the Integrated Computing and Communications department at Lawrence Livermore National Laboratory. He joined LLNL in 1983 and has been working in parallel processing ever since. Mark was instrumental in the development and deployment of the BlueGene/L and Purple systems at LLNL for the Advanced Simulation and Computing program within the USA National Nuclear Security Administration. He was also instrumental in the development and deployment of ASC White and Blue-Pacific systems and numerous world class Linux clusters. He has won numerous awards including the prestigious Edward Teller Award for "major contributions to the state-of-the-art in high performance computing." In 2015 he was promoted to Intel Fellow.  Mark and Jan Seager breed and raise Arabian horses, with one national championship win in 2013. He received a BS degree in mathematics and astrophysics from the University of New Mexico at Albuquerque and a PhD in numerical analysis from the University of Texas at Austin.

IPDPS 2017 Thursday
KEYNOTE SPEAKER

Mateo Valero
Barcelona Supercomputing Center
Runtime Aware Architectures

Abstract: In the last years the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore's Law vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well-defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while computer architects proposed techniques to aggressively exploit Instruction-Level Parallelism (ILP) in superscalar processors. Current multi-cores are designed as simple symmetric multiprocessors on a chip. While these designs are able to compensate the clock frequency stagnation, they face multiple problems in terms of power consumption, programmability, resilience or memory. The solution is to give more responsibility to the runtime system and to let it tightly collaborate with the hardware. The runtime has to drive the design of future multi-cores architectures.

In this talk, we introduce an approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective. RAA aims at supporting the activity the parallel runtime system in three ways: First, to enable fine-grain tasking and support the opportunities it offers; second, to improve the performance of the memory subsystem by exposing hybrid hierarchies to the runtime system and, third, to improve performance by using vector units. During the talk, we will give a general overview of the problems RAA aims to solve and provide some examples of hardware components supporting the activity of the runtime system in the context of multi-core chips.

Bio: Mateo Valero is a professor in the Computer Architecture Department at UPC, in Barcelona. His research interests are focused on high performance architectures. He has published approximately 700 papers, has served in the organization of more than 300 International Conferences and has given more than 500 invited talks. He is the director of the Barcelona Supercomputing Centre, the National Centre of Supercomputing in Spain. Prof. Valero has been honored with several awards, including the Eckert-Mauchly Award 2007 by the IEEE and ACM; Seymour Cray Award 2015 by IEEE;  Harry Goode Award 2009 by IEEE: ACM Distinguished Service Award 2012; Euro-Par Achievement Award 2015; the Spanish National Julio Rey Pastor award; the Spanish National Award “Leonardo Torres Quevedo”;  the “King Jaime I” given by Generalitat Valenciana; the Research Award by the Catalan Foundation for Research and Innovation; and the “Aragón Award” 2008  given by the Government of Aragón. He has been named Honorary Doctor by the Universities of Chalmers, Belgrade, Las Palmas de Gran Canaria, Zaragoza, Complutense de Madrid, Cantabria, Granada and by the University of Veracruz  and CINVESTAV (Mexico). He is a "Hall of the Fame" member of the ICT European Program (selected as one of the 25 most influential European researchers in IT during the period 1983-2008, Lyon, November 2008); Honoured with Creu de Sant Jordi 2016 by Generalitat de Catalunya. It is the highest recognition granted by the Government. In December 1994, Professor Valero became a founding member of the Royal Spanish Academy of Engineering. In 2005 he was elected Correspondant Academic of the Spanish Royal Academy of Science, in 2006 a member of the Royal Spanish Academy of Doctors, in 2008 a member of the Academia Europaea and in 2012 Correspondant Academic of the Mexican Academy of Sciences. He is a Fellow of the IEEE, Fellow of the ACM and an Intel Distinguished Research Fellow. In 1998 he won a “Favourite Son” Award of his home town, Alfamén (Zaragoza) and in 2006, his native town of Alfamén named their Public College after him.

 

Search IPDPS

 

2017 Registration

March 27th Deadline for Advance Registration

Registration Details

Follow IPDPS

   

Tweets by @IPDPS

IPDPS 2016 Report



May 23-27, 2016
Chicago Hyatt Regency
Chicago, Illinois USA

REPORT ON IPDPS 2016