DAYS • Monday • Tuesday • Wednesday • Thursday • Friday
This page lists all the 21 workshops that are part of the IPDPS 2020 program. Click on the workshop of interest – Monday workshops at top of page and Friday workshops at bottom – and the link will take you to the home landing page of the workshop. The workshop web page provides detailed information regarding papers in the workshop and any other program material and events. Check individual workshop pages to see what events are planned.
The Main Conference program that follows shows the papers accepted for the conference, organized in Technical Sessions originally scheduled to be held on Tuesday, Wednesday and Thursday. Those papers as well as all of the papers in the workshops are all published in the proceedings and accompanied by presentation slides from the authors.
This publication will be released by May 15th to be available to all registrants.
IPDPS will be holding virtual events to coincide with the conference dates of 18-22 May. Participation details available here and in links in the program that follows.
- Tuesday, May 19: Best paper presentations and Q&A session.
- Wednesday, May 20: Best paper announcement and TCPP public meeting.
- Thursday, May 21: IPDPS Town Hall meeting.
Events on these three days will take place from 9:00 AM to 10:00 AM US Central Daylight Time / 2:00 PM UTC. Check individual workshops for any scheduled events.
MONDAY - 18 May 2020
DAYS • Monday • Tuesday • Wednesday • Thursday • Friday |
MONDAY WORKSHOPS
Visit individual
websites at
links shown
|
|
TUESDAY - 19 May 2020
DAYS • Monday • Tuesday • Wednesday • Thursday • Friday |
Virtual Session
9:00 to 10:00 US Central Daylight Time / 2:00 UTC |
Best Paper Presentations and Q&A Session
See this page for details and link to join session. |
Parallel Technical
Sessions 1, 2, 3, & 4 |
SESSION 1: Communication & NoCs
DozzNoC: Reducing Static and Dynamic Energy in NoCs with Low-latency Voltage Regulators using Machine
Mark Clark, Yingping Chen, Avinash Karanth, Brian Ma, and Ahmed Louri
Neksus: An Interconnect for Heterogeneous System-In-Package Architectures
Vidushi Goyal, Xiaowei Wang, Valeria Bertacco, and Reetu Das
Accelerated Reply Injection for Removing NoC Bottleneck in GPGPUs
Yunfan Li and Lizhong Chen
Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures
Jahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh, Hari Subramoni, Mohammadreza Bayatpour, and Dhabaleswar K. (DK) Panda
SESSION 2: Storage & IO
ClusterSR: Cluster-Aware Scattered Repair in Erasure-Coded Storage
Zhirong Shen, Jiwu Shu, Zhijie Huang, and Yingxun Fu
Stitch It Up: Using Progressive Data Storage to Scale Science
Jay Lofstead, John Mitchel, and Enze Chen
HFetch: Hierarchical Data Prefetching for Scientific Workflows in Multi-Tiered Storage Environments
Hariharan Devarajan, Anthony Kougkas, and Xian-He Sun,
CanarIO: Sounding the Alarm on IO-Related Performance Degradation
Michael Wyatt, Stephen Herbein, Kathleen Shoga, Todd Gamblin, and Michela Taufer
SESSION 3: Applications
A Study of Graph Analytics for Massive Datasets on Large-Scale Distributed GPUs
Vishwesh Jatala, Roshan Dathathri, Gurbinder Gill, Loc Hoang, V. Krishna Nandivada, and Keshav Pingali
A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format
Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, and Minghua Zhang
Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations
Sian Jin, Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Jiannan Tian, Dingwen Tao, and James P. Ahrens
Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydin Buluc
SESSION 4: Distributed Algorithms
Tightening Up the Incentive Ratio for Resource Sharing Over the Rings
Yukun Cheng, Xiaotie Deng, Yuhao Li
Communication-Efficient String Sorting
Timo Bingmann, Peter Sanders, and Matthias Schimek
SCSL: Optimizing Matching Algorithms to Improve Real-time for Content-based Pub/Sub Systems
Tianchen Ding, Shiyou Qian, Jian Cao, Guangtao Xue, and Minglu Li
Distributed Graph Realizations
John Augustine, Keerti Choudhary, Avi Cohen, David Peleg, Sumathi Sivasubramaniam, and Suman Sourav |
Parallel Technical Sessions 5, 6, 7, & 8 |
SESSION 5: Reliability and QoS
Transaction-Based Core Reliability
Sang Wook Stephen Do and Michel Dubois
Understanding the Interplay between Hardware Errors and User Job Characteristics on the Titan Supercomputer
Seung-Hwan Lim, Ross Miller, and Sudharshan Vazhkudai,
EC-Fusion: An Efficient Hybrid Erasure Coding Framework to Improve Both Application and Recovery Performance in Cloud Storage Systems
Han Qiu, Chentao Wu, Jie Li, Minyi Guo, Tong Liu, Xubin He, Yuanyuan Dong, and Yafei Zhao
SESSION 6: Learning Algorithms
Learning an Effective Charging Scheme for Mobile Devices
Tang Liu, Baijun Wu, Wenzheng Xu, ,Xiaobo Cao, Jian Peng, and Hongyi Wu
Optimize Scheduling of Federated Learning on Battery-powered Mobile Devices
Cong Wang, Xin Wei, and Pengzhan Zhou
Harnessing Deep Learning via a Single Building Block
Kunal Banerjee, Michael J. Anderson, Sasikanth Avancha, Anand Venkat, Gregory M. Henry, Evangelos Georganas, Hans Pabst, Alexander Heinecke, and Dhiraj D. Kalamkar
Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning
Yufeng Zhan, Peng Li, and Song Guo
SESSION 7: Data Analysis and Management
An Active Learning Method for Empirical Modeling in Performance Tuning
Jiepeng Zhang, Jingwei Sun, Wenju Zhou, and Guangzhong Sun
DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection
Bin Dong, Veronica Rodriguez, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, and Kesheng Wu
Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data
Mahesh Balasubramanian, Trevor Ruiz, Brandon Cook, Mr Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, and Kristofer Bouchard
GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting
Xiaodong Yu, Fengguo Wei, Xinming Ou, Michela Becchi, Tekin Bicer, and Danfeng(Daphne) Yao
SESSION 8: Edge Computing
Robust Server Placement for Edge Computing
Dongyu Lu, Yuben Qu, Fan Wu, Haipeng Dai, Chao Dong, and Guihai Chen
EdgeIso: Effective Performance Isolation for Edge Devices
Yoonsung Nam, Yongjun Choi, Byeonghun Yoo, Yongseok Son, and Hyeonsang Eom
Busy-Time Scheduling on Heterogeneous Machines
Runtian Ren and Xueyan Tang
Scheduling Malleable Jobs Under Topological Constraints
Evripidis Bampis, Konstantinos Dogeas, Alexander Kononov, Giorgio Lucarelli, and Fanny Pascual |
PLENARY SESSION:
Best Papers |
Best Papers
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu
Abstract—There has been a rapid proliferation of machine learning/deep learning (ML) models and wide... Read more
Exploring the Binary Precision Capabilities of Tensor Cores for Epistasis Detection
Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, and Leonel Sousa
Abstract—Genome-wide association studies are performed to correlate a number of diseases and other... Read more
Understanding and Improving Persistent Transactions on Optane DC Memory
Pantea Zardoshti, Michael Spear, Aida Vosoughi, and Garret Swart
Abstract—Storing data structures in high-capacity byte-addressable persistent memory instead... Read more
CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via Sharding
Mengqian Zhang, JiChen Li, Zhaohua Chen, Hongyin Chen, and Xiaotie Deng
Abstract—Traditional public distributed ledgers have not been able to scale-out well and work... Read more |
WEDNESDAY - 20 May 2020
DAYS • Monday • Tuesday • Wednesday • Thursday • Friday |
Virtual Session
9:00 to 10:00 US Central Daylight Time / 2:00 UTC |
Best Paper Announcement and TCPP Public Meeting
See this page for details and link to join session.
|
Parallel Technical
Sessions 9, 10, 11, & 12 |
SESSION 9: Cloud Technology
Mitigating Large Response Time Fluctuations through Fast Concurrency Adapting in the Cloud
Jianshu Liu, Shungeng Zhang, Qingyang Wang, and Jinpeng Wei
DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters
Yinggen Xu, Liu Liu, and Zhijun Ding
Solving the Container Explosion Problem for Distributed High Throughput Computing
Tim Shaffer, Nicholas Hazekamp, Jakob Blomer, and Douglas Thain,
Amoeba: QoS-Awareness and Reduced Resource Usage of Microservices with Serverless Computing
Zijun Li, Quan Chen, Shuai Xue, Tao Ma, Yong Yang, Zhuo Song, and Minyi Guo
SESSION 10: Machine Learning
Efficient I/O for Neural Network Training with Compressed Data
Zhao Zhang, Lei Huang, J. Gregory Pauloski, and Ian T. Foster
Not All Explorations Are Equal: Harnessing Heterogeneous Profiling Cost for Efficient MLaaS Training
Jun Yi, Chengliang Zhang, Wei Wang, Cheng Li, and Feng Yan
ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning
Saeed Soori, Bugra Can, Mert Gurbuzbalaban, and Maryam Dehnavi
Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Cheng Li, Abdul Dakkak, Jinjun Xiong, and Wen-mei Hwu
SESSION 11: GPUs
Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription
Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem
LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment
Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydin Buluç, Leonid Oliker, and Katherine Yelick
Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs
Qi Yu, Bruce R. Childers, Libo Huang, Cheng Qian, Hui Guo, and Zhiying Wang
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs
Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, and Satoshi Matsuoka
SESSION 12:Applications
DPF-ECC: Accelerating Elliptic Curve Cryptography with Floating-point Computing Power of GPUs
Lili Gao, Fangyu Zheng, Niall Emmart, Jiankuo Dong, Jingqiang Lin, and Charles Weems
Scalability Challenges of an Industrial Implicit Finite Element Code
Francois-Henry Rouet, Cleve Ashcraft, Jef Dawson, Roger Grimes, Erman Guleryuz, Seid Koric, Robert F. Lucas, James S. Ong, Todd Simons, and Ting-Ting Zhu
ETH: An Architecture for Exploring the Design Space of In-Situ Scientific Visualization
Greg Abram, Vignesh Adhinarayanan, Wu-chun Feng, David H. Rogers, and James P. Ahrens
Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling
Alexander van der Grinten and Henning Meyerhenke |
Parallel Technical Sessions 13, 14, 15, & 16 |
SESSION 13: Data Management
Improved Intermediate Data Management for MapReduce Frameworks
Haoyu Wang, Haiying Shen, Charles Reiss, Arnim Jain, and Yunqiao Zhang
Bandwidth-Aware Page Placement in NUMA
David Gureya, João Neto, Reza Karimi, João Barreto, Pramod Bhatotia, Vivien Quema, Rodrigo Rodrigues, Paolo Romano, and Vladimir Vlassov
HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments
Hariharan Devarajan, Anthony Kougkas, Luke Logan, and Xian-He Sun,
FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data
Robert R. Underwood, Sheng Di, Jon Calhoun, and Franck Cappello
SESSION 14: Storage & Caching
DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors
Nadja Holtryd, Madhavan Manivannan, Per Stenström, and Miquel Pericas
Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints
Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas, and Per Stenström
StragglerHelper: Alleviating Straggling in Computing Clusters via Sharing Memory Access Patterns
Wenjie Liu, Ping Huang, and Xubin He
SESSION 15: Numerics
Evaluating the Numerical Stability of Posit Floating Point Arithmetic
Nicholas Buoncristiani, Sanjana Shah, David Donofrio, and John Shalf
Varity: Quantifying Floating-Point Variations in HPC Systems Through Randomized Testing
Ignacio Laguna
Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply
Da Yan, Wei Wang, and Xiaowen Chu
SESSION 16: IoT and Consensus
Data Collection of IoT Devices Using an Energy-Constrained UAV
Yuchen Li, Weifa Liang, Wenzheng Xu, and Xiaohua Jia
Argus: Multi-Level Service Visibility Scoping for Internet-of-Things in Enterprise Environments
Qian Zhou, Omkant Pandey, and Fan Ye
G-PBFT: A Location-based and Scalable Consensus Protocol for IoT-Blockchain Applications
LapHou Lao, Xiaohai Dai, Bin Xiao, and Songtao Guo
Byzantine Generalized Lattice Agreement
Giuseppe Antonio Di Luna, Emmanuelle Anceaume, and Leonardo Querzoni |
THURSDAY - 21 May 2020
DAYS • Monday • Tuesday • Wednesday • Thursday • Friday |
Virtual Session
9:00 to 10:00 US Central Daylight Time / 2:00 UTC |
IPDPS Town Hall Meeting
See this page for details and link to join session. |
Parallel Technical Sessions 17, 18, 19, & 20 |
SESSION 17: Graph Processing & Coding
A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing
Yu Huang, Long Zheng, Pengcheng Yao, Jieshan Zhao, Xiaofei Liao, Hai Jin, and Jingling Xue
Spara: An Energy-Efficient ReRAM-based Accelerator for Sparse Graph Analytics Applications
Long Zheng, Jieshan Zhao, Yu Huang, Qinggang Wang, Zhen Zeng, Jingling Xue, Xiaofei Liao, and Hai Jin
Optimal Encoding and Decoding Algorithms for the RAID-6 Liberation Codes
Zhijie Huang, Hong Jiang, Zhirong Shen, Hao Che, Nong Xiao, and Ning Li
Sturgeon: Preference-aware Co-location for Improving Utilization of Power Constrained Computers
Pu Pang, Quan Chen, Deze Zeng, Chao Li, Jingwen Leng, Wenli Zheng, and Minyi Guo
SESSION 18: Parallel Algorithms
A High-Throughput Solver for Marginalized Graph Kernels on GPU
Yu-Hang Tang, Oguz Selvitopi, Doru Thom Popovici, and Aydin Buluc
Dynamic Graphs on the GPU
Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, and John D. Owens
Accelerating Parallel Hierarchical Matrix-Vector Products via Data Driven Sampling
Lucas Erlandson, Difeng Cai, Yuanzhe Xi, and Edmond Chow
NC Algorithms for Popular Matchings in One-Sided Preference Systems and Related Problems
Changyong Hu and Vijay Garg
SESSION 19: Performance, Power, and Energy
Smartly Handling Renewable Energy Instability in Supporting A Cloud Datacenter
Jiechao Gao, Haoyu Wang, and Haiying Shen
A Self-Optimized Generic Workload Prediction Framework for Cloud Computing
Vinodh Kumaran Jayakumar, Jaewoo Lee, In Kee Kim, and Wei Wang
SeeSAw: Optimizing Performance of In-Situ Analytics Applications under Power Constraints
Ivana Marincic, Venkatram Vishwanath, and Henry Hoffmann
SESSION 20: Resource Management
What does Power Consumption Behavior of HPC Jobs Reveal?
Tirthak Patel, Adam Wagenhäuser, Christopher Eibel, Timo Hönig, Thomas Zeiser, and Devesh Tiwari
Efficient Parallel Adaptive Partitioning for Load-balancing in Spatial Join
Jie Yang and Satish Puri
Union: An Automatic Workload Manager for Accelerating Network Simulation
Xin Wang, Misbah Mubarak, Yao Kang, Robert B. Ross, and Zhiling Lan
Auto-Tuning Parameter Choices using Bayesian Optimization
Harshitha Menon, Abhinav Bhatele, and Todd Gamblin |
Parallel Technical Sessions 21, 22, 23, 24 |
SESSION 21: Runtime Systems
Inter-Job Scheduling of High-Throughput Material Screening Applications
Zhihui Du, Xining Hui, Yurui Wang, Jun Jiang, Jason Liu, Baokun Lu, Chongyu Wang
Reservation and Checkpointing Strategies for Stochastic Jobs
Ana Gainaru, Brice Goglin, Valentin Honore, Guillaume Pallez, Padma Raghavan, Yves Robert, and Hongyang Sun
A Scheduling Approach to Incremental Maintenance of Datalog Programs
Shikha Singh, Sergey Madaminov, Michael Bender, Michael Ferdman, Ryan Johnson, Benjamin Moseley, Hung Ngo, Dung Nguyen, Soeren Olesen, Kurt Stirewalt, and Geoffrey Washburn
Dynamic Scheduling in Distributed Transactional Memory
Costas Busch, Maurice Herlihy, Miroslav Popovic, and Gokarna Sharma
SESSION 22: Performance Analysis
Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling
Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, and Felix Wolf
The Case of Performance Variability on Dragonfly-based Systems
Abhinav Bhatele, Jayaraman J. Thiagarajan, Taylor Groves, Rushil Anirudh, Staci A. Smith, Brandon Cook, and David Lowenthal
Predicting and Comparing the Performance of Array Management Libraries
Donghe Kang, Oliver Ruebel, Suren Byna, and Spyros Blanas
Demystifying the Performance of HPC Scientific Applications on NVM-based Memory
Ivy B. Peng, Kai Wu, Jie Ren, Dong Li, and Maya Gokhale
SESSION 23: Communication
Packet-in Request Redirection for Minimizing Control Plane Response Time
Rui Xia, Haipeng Dai, Jiaqi Zheng, Hong Xu, Meng Li, and Guihai Chen
PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network
Chao Tian, Lingxiao Ma, Zhi Yang, and Yafei Dai
ConMidbox: Consolidated Middleboxes Selection and Routing in SDN/NFV-Enabled Networks
Guiyan Liu, Songtao Guo, Pan Li, and Liang Liu
Scalable and Memory-Ef?cient Kernel Ridge Regression
Gustavo Chávez, Yang Liu, Pieter Ghysels, Xiaoye Sherry Li, and Elizaveta Rebrova
SESSION 24: Storage
SSDKeeper: Self-Adapting Channel Allocation to Improve the Performance of SSD Devices
Renping Liu, Xianzhang Chen, Yujuan Tan, Runyu Zhang, Liang Liang, and Duo Liu
FlashKey:A High-Performance Flash Friendly Key-Value Store
Madhurima Ray, Krishna Kant, Peng Li, and Sanjeev Trika
Pacon: Improving Scalability and Ef?ciency of Metadata Service through Partial Consistency
Yubo Liu1, Yutong Lu, Zhiguang Chen, and Ming Zhao
|
Parallel Technical Sessions 25, 26, 27 & 28 |
SESSION 25: Program Analysis and Runtime Library
XPlacer: Automatic Analysis of Data Access Patterns on Heterogeneous CPU/GPU Systems
Peter Pirkelbauer, Pei-Hung Lin, Tristan Vanderbruggen, and Chunhua Liao
Improving Transactional Code Generation via Variable Annotation and Barrier Elision
João P.L. de Carvalho, Bruno C. Honorio, Alexandro Baldassin, and Guido Araujo
Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi
Hancheng Wu and Michela Becchi
AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation
André Müller, Bertil Schmidt, Andreas Hildebrandt, Richard Membarth, Roland Leißa, Matthis Kruse, and Sebastian Hack
SESSION 26: Scheduling
Analysis of a List Scheduling Algorithm for Task Graphs on Two Types of Resources
Lionel Eyraud-Dubois and Suraj Kumar
Optimal Convex Hull Formation on a Grid by Asynchronous Robots with Lights
Rory Hector, Ramachandran Vaidyanathan, Gokarna Sharma, and Jerry L. Trahan
On the Complexity of Conditional DAG Scheduling in Multiprocessor Systems
Alberto Marchetti-Spaccamela, Nicole Megow, Jens Schlöter, Martin Skutella, and Leen Stougie
Weaver: Ef?cient Co?ow Scheduling in Heterogeneous Parallel Networks
Xin Sunny Huang, Yiting Xia, and T. S. Eugene Ng
SESSION 27: Fault Tolerance
Fault-Tolerant Containers Using NiLiCon
Diyu Zhou and Yuval Tamir
Aarohi: Making Real-Time Node Failure Prediction Feasible
Anwesha Das, Frank Mueller, and Barry Rountree
FP4S: Fragment-based Parallel State Recovery for Stateful Stream Applications
Pinchao Liu, Hailu Xu, Dilma Da Silva, Qingyang Wang, Sarker Tanzir Ahmed, and Liting Hu
SESSION 28: Multidisciplinary
Implementation and Evaluation of a Hardware Decentralized Synchronization Lock for MPSoCs
Maxime France-Pillois, Jérôme Martin, and Frederic Rousseau
Communication-Ef?cient Jaccard Similarity for High-Performance Distributed Genome Comparisons
Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Ratsch, Torsten Hoefler, and Edgar Solomonik
Engineering Worst-Case Inputs for Pairwise Merge Sort on GPUs
Kyle Berney and Nodari Sitchinava
The Impossibility of Fast Transactions
Karolos Antoniadis, Diego Didona, Rachid Guerraoui and Willy Zwaenepoel |
FRIDAY - 22 May 2020
DAYS • Monday • Tuesday • Wednesday • Thursday • Friday |
FRIDAY WORKSHOPS
Visit individual
websites at
links shown |
|
IPDPS 2020 BEST PAPERS
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPU
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu
Abstract—There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack, which limits the thoroughness and usefulness of the profiling results.
This paper proposes XSP — an across-stack profiling design that gives a holistic and hierarchical view of ML model execution. XSP leverages distributed tracing to aggregate and correlate profile data from different sources. XSP introduces a leveled and iterative measurement approach that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead. We couple the profiling design with an automated analysis pipeline to systematically analyze 65 state-of-the-art ML models. We demonstrate that XSP provides insights which would be difficult to discern otherwise.
Exploring the Binary Precision Capabilities of Tensor Cores for Epistasis Detection
Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, and Leonel Sousa
Abstract—Genome-wide association studies are performed to correlate a number of diseases and other physical or even psychological conditions (phenotype) with substitutions of nucleotides at specific positions in the human genome, mainly single-nucleotide polymorphisms (SNPs). Some conditions, possibly because of the complexity of the mechanisms that give rise to them, have been identified to be more statistically correlated with genotype when multiple SNPs are jointly taken into account. However, the discovery of new associations between genotype and phenotype is exponentially slowed down by the increase of computational power required when epistasis, i.e., interactions between SNPs, is considered. This paper proposes a novel graphics processing unit (GPU)-based approach for epistasis detection that combines the use of modern tensor cores with native support for processing binarized inputs with algorithmic and target-focused optimizations. Using only a single mid-range Turing-based GPU, the proposed approach is able to evaluate 64.8 × 1012 and 25.4 × 1012 sets of SNPs per second, normalized to the number of patients, when considering 2-way and 3-way epistasis detection, respectively. This proposal is able to surpass the state-of-the-art approach by 6× and 8.2× in terms of the number of pairs and triplets of SNP allelic patient data evaluated per unit of time per GPU.
Understanding and Improving Persistent Transactions on Optane DC Memory
Pantea Zardoshti, Michael Spear, Aida Vosoughi, and Garret Swart
Abstract—Storing data structures in high-capacity byte-addressable persistent memory instead of DRAM or a storage device offers the opportunity to (1) reduce cost and power consumption compared with DRAM, (2) decrease the latency and CPU resources needed for an I/O operation compared with storage, and (3) allow for fast recovery as the data structure remains in memory after a machine failure. The first commercial offering in this space is Intel® OptaneTM Direct Connect (OptaneTM DC) Persistent Memory. OptaneTM DC promises access time within a constant factor of DRAM, with larger capacity, lower energy consumption, and persistence. We present an experimental evaluation of persistent transactional memory performance, and explore how OptaneTM DC durability domains affect the overall results. Given that neither of the two available durability domains can deliver performance competitive with DRAM, we introduce and emulate a new durability domain, called PDRAM, in which the memory controller tracks enough information (and has enough reserve power) to make DRAM behave like a persistent cache of OptaneTM DC memory.
In this paper we compare the performance of these durability domains on several configurations of five persistent transactional memory applications. We find a large throughput difference, which emphasizes the importance of choosing the best durability domain for each application and system. At the same time, our results confirm that recently published persistent transactional memory algorithms are able to scale, and that recent optimizations for these algorithms lead to strong performance, with speedups as high as 6× at 16 threads.
CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via Sharding
Mengqian Zhang, JiChen Li, Zhaohua Chen, Hongyin Chen, and Xiaotie Deng
Abstract—Traditional public distributed ledgers have not been able to scale-out well and work efficiently. Sharding is deemed as a promising way to solve this problem. By partitioning all nodes into small committees and letting them work in parallel, we can significantly lower the amount of communication and computation, reduce the overhead on each node’s storage, as well as enhance the throughput of the distributed ledger. Existing sharding-based protocols still suffer from several serious drawbacks. The first thing is that all non-faulty nodes must connect well with each other, which demands a huge number of communication channels in the network. Moreover, previous protocols have faced great loss in efficiency in the case where the honesty of each committee’s leader is in question. At the same time, no explicit incentive is provided for nodes to actively participate in the protocol.
We present CycLedger, a scalable and secure parallel protocol for distributed ledger via sharding. Our protocol selects a leader and a partial set for each committee, who are in charge of maintaining intra-shard consensus and communicating with other committees, to reduce the amortized complexity of communication, computation, and storage on all nodes. We introduce a novel semi-commitment scheme between committees and a recovery procedure to prevent the system from crashing even when leaders of committees are malicious. To add incentive for the network, we use the concept of reputation, which measures each node’s trusty computing power. As nodes with a higher reputation receive more rewards, there is an encouragement for nodes with strong computing ability to work honestly to gain reputation. In this way, we strike out a new path to establish scalability, security, and incentive for the sharding-based distributed ledger.
|