A Fine-grained Prefetching Scheme for DGEMM Kernels on GPU with Auto-tuning Compatibility |
Jialin Li, Computer network information center, Chinese Academy of Sciences
Huang Ye, Computer network information center, Chinese Academy of Sciences
Shaobo Tian, Computer network information center, Chinese Academy of Sciences
Xinyuan Li, Computer network information center, Chinese Academy of Sciences
Jian Zhang, Computer network information center, Chinese Academy of Sciences |
A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization |
Qinglei Cao, University of Tennessee
Rabab Alomairy, King Abdullah University of Science and Technology
Yu Pei, University of Tennessee
George Bosilca, University of Tennessee
Hatem Ltaief, King Abdullah University of Science and Technology
David Keyes, King Abdullah University of Science and Technology
Jack Dongarra, University of Tennessee |
A General Offloading Approach for Processing-In-Memory Architectures |
Dan Chen, Huazhong University of Science and Technology
Hai Jin, Huazhong University of Science and Technology
Long Zheng, Huazhong University of Science and Technology
Yu Huang, Huazhong University of Science and Technology
Pengcheng Yao, Huazhong University of Science and Technology
Chuangyi Gui, Huazhong University of Science and Technology
Qinggang Wang, Huazhong University of Science and Technology
Haifeng Liu, Huazhong University of Science and Technology
Haiheng He, Huazhong University of Science and Technology
Xiaofei Liao, Huazhong University of Science and Technology
Ran Zheng, Huazhong University of Science and Technology |
A Model-Architecture Co-Design for High Performance Temporal GNN Inference on FPGA |
Hongkuan Zhou, University of Southern California
Bingyi Zhang, University of Southern California
Rajgopal Kannan, US Army Research Lab
Viktor Prasanna, University of Southern California
Carl Busart, US Army Research Lab |
A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application |
Wenxiang Yang, College of Computer, National University of Defense Technology
Xiangke Liao, College of Computer, National University of Defense Technology
Dezun Dong, College of Computer, National University of Defense Technology
Jie Yu, Computational Aerodynamics Institute, China Aerodynamics Research and Development Center |
A Scalable Adaptive-Matrix Solver for Heterogeneous Architectures |
Han Tran, University of Utah
Milinda Fernando, University of Texas at Austin
Kumar Saurabh, Iowa State University
Baskar Ganapathysubramanian, Iowa State University
Robert Kirby, University of Utah
Hari Sundar, University of Utah |
A self-stabilizing 1-minimal dominating set algorithm based on loop composition in networks of girth at least 7 |
Syohei Maruyama, Hiroshima University
Yuichi Sudo, Hosei University
Sayaka Kamei, Hiroshima University
Hirotsugu Kakugawa, Ryukoku University |
A Swap Dominated Tensor Re-Generation Strategy for Training Deep Learning Models |
Zan Zong, Tsinghua University
Lijie Wen, Tsinghua University
Li Lin, Tsinghua University
Leilei Lin, Capital Normal University |
Accelerating Encrypted Computing on Intel GPUs |
Yujia Zhai, University of California, Riverside
Mohannad Ibrahim, North Carolina State University
Yiqin Qiu, Intel Corporation
Fabian Boemer, Intel Corporation
Zizhong Chen, University of California, Riverside
Alexey Titov, Intel Corporation
Alexander Lyashevsky, Intel Corporation |
Accuracy vs. Cost in Parallel Fixed-Precision Low-Rank Approximations of Sparse Matrices |
Robert Ernstbrunner, University of Vienna
Viktoria Mayer, University of Vienna
Wilfried Gansterer, University of Vienna |
Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning |
Tingting Tang, University of Southern California
Ramy E. Ali, University of Southern California
Hanieh Hashemi, University of Southern California
Tynan Gangwani, University of Southern California
Salman Avestimehr, University of Southern California
Murali Annavaram, University of Southern California |
Alias-Chain: Improving Blockchain Scalability via Exploring Content Locality among Transactions |
Jintong Liu, Huazhong University of Science and Technology
Shenggang Wan, Huazhong University of Science and Technology
Xubin He, Temple University |
An Efficient Block Validation Mechanism for UTXO-based Blockchains |
Xiaohai Dai, Huazhong University of Science and Technology
Bin Xiao, The Hong Kong Polytechnic University
Jiang Xiao, Huazhong University of Science and Technology
Hai Jin, Huazhong University of Science and Technology |
An Efficient Vectorization Scheme for Stencil Computation |
Kun Li, Institute of Computing Technology of Chinese Academy of Sciences
Liang Yuan, Institute of Computing Technology of Chinese Academy of Sciences
Yunquan Zhang, Institute of Computing Technology of Chinese Academy of Sciences
Yue Yue, Institute of Computing Technology of Chinese Academy of Sciences
Hang Cao, Institute of Computing Technology of Chinese Academy of Sciences |
An End-to-end and Adaptive I/O Optimization Tool for Modern HPC Storage Systems |
Bin Yang, Shandong University
Yanliang Zou, Shanghai Tech University
Weiguo Liu, Shandong University
Wei Xue, Tsinghua University |
An Integral-equation-oriented Vectorized SpMV Algorithm and its Application on CT Imaging Reconstructions |
Weicai Ye, Sun Yat-sen University
Chenghuan Huang, Sun Yat-sen University
Jiasheng Huang, Sun Yat-sen University
Jiajun Li, Sun Yat-sen University
Yao Lu, Sun Yat-sen University
Ying Jiang, Sun Yat-sen University |
Archpipe: Fast and Flexible Pipelined Erasure-coded Archival Scheme for Heterogeneous Networks |
Bin Xu, Huazhong University of Science and Technology
Jianzhong Huang, Huazhong University of Science and Technology
Qiang Cao, Huazhong University of Science and Technology
Xiao Qin, Auburn University |
As easy as ABC: Optimal (A)ccountable (B)yzantine (C)onsensus is easy! |
Pierre Civit, Sorbonne University
Seth Gilbert, NUS Singapore
Vincent Gramoli, University of Sydney and EPFL
Rachid Guerraoui, EPFL
Jovan Komatovic, EPFL |
Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching |
András Strausz, ETH Zurich
Flavio Vella, University of Trento, Italy
Salvatore Di Girolamo, ETH Zurich
Maciej Besta, ETH Zurich
Torsten Hoefler, ETH Zurich |
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning' |
Siddharth Singh, University of Maryland, College Park
Abhinav Bhatele, University of Maryland, College Park |
Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations |
Aditya Kashi, Karlsruhe Institute of Technology
Pratik Nayak, Karlsruhe Institute of Technology
Dhruva Kulkarni, Lawrence Berkeley National Laboratory
Aaron Scheinberg, Jubilee Development
Paul Lin, Lawrence Berkeley National Laboratory
Hartwig Anzt, Karlsruhe Institute of Technology |
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU |
Jou-An Chen, North Carolina State University
Ang Li, Pacific Northwest National Lab
Nathan Tallent, Pacific Northwest National Lab
Kevin Barker, Pacific Northwest National Lab
Xipeng Shen, North Carolina State University
Hsin-Hsuan Sung, North Carolina State University |
Booster: An Accelerator for Gradient Boosting Decision Trees Training and Inference |
Mingxuan He, Purdue University
Mithuna Thottethodi, Purdue University
T. N. Vijaykumar, Purdue University |
Bounding the Flow Time in Online Scheduling with Structured Processing Sets |
Louis-Claude Canon, FEMTO-ST Institute
Anthony Dugois, Inria
Loris Marchal, CNRS |
Co-Designing an OpenMP GPU Runtime and Optimizations for Near-Zero Overhead Execution |
Johannes Doerfert, Argonne National Laboratory
Atmn Patel, University of Waterloo
Joseph Huber, Oak Ridge National Laboratory
Shilei Tian, Stony Brook University
Jose M. Monsalve Diaz, Argonne National Laboratory
Barbara Chapman, Stony Brook University
Giorgis Georgakoudis, Lawrence Livermore National Laboratory |
Coloring the Vertices of 9-pt and 27-pt Stencils with Intervals |
Dante Durrman, UNC Charlotte
Erik Saule, UNC Charlotte |
Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations |
Matthieu Dorier, Argonne National Laboratory (ANL)
Zhe Wang, Rutgers University
Utkarsh Ayachit, Kitware, Inc
Shane Snyder, Argonne National Laboratory
Rob Ross, Argonne National Laboratory
Manish Parashar, University of Utah |
Communication-efficient Massively Distributed Connected Components |
Sebastian Lamm, Karlsruhe Institute of Technology
Peter Sanders, Karlsruhe Institute of Technology |
Compiler-Directed Incremental Checkpointing for Low Latency GPU Preemption |
Zhuoran Ji, The University of Hong Kong
Cho-Li Wang, The University of Hong Kong |
Coupling streaming AI and HPC ensembles to achieve 100-1000$\times$ faster bio-molecular simulations |
Alexander Brace, University of Chicago
Shantenu Jha, Brookhaven National Lab
Igor Yakushin, Argonne National Laboratory
Hyungro Lee, Rutgers University
Heng Ma, Argonne National Laboratory
Anda Trifan, University of Illinois Urbana Champaign
Li Tan, Brookhaven National Laboratory
Todd Munson, Argonne National Laboratory
Matteo Turilli, Rutgers University
Ian Foster, Argonne National Laboratory
Arvind Ramanathan, Argonne National Lab |
CSC: Collaborative System Configuration for I/O-Intensive Applications in Multi-Tenant Clouds |
Haowei Huang, Shanghai Jiao Tong University
Pu Pang, Shanghai Jiao Tong University
Quan Chen, Shanghai Jiao Tong University
Jieru Zhao, Shanghai Jiao Tong University
Wenli Zheng, Shanghai Jiao Tong University
Minyi Guo, Shanghai Jiao Tong University |
CSMV: A Highly Scalable Multi-Versioned Software Transactional Memory for GPUs |
Diogo Nunes, IST/INESC-ID
Daniel Castro, IST/INESC-ID
Paolo Romano, IST/INESC-ID |
DEAN: A Lightweight and Resource-efficient Blockchain Protocol for Reliable Edge Computing |
Abdullah Al Mamun, University of Nevada, Reno
Haoting Shen, University of Nevada, Reno
Dongfang Zhao, University of Nevada, Reno |
Degree-Aware Kernels for Computing Jaccard Weights on GPUs |
Amro Alabsi Aljundi, Sabancı University
Taha Atahan Akyıldız, Sabancı University
Kamer Kaya, Sabancı University |
DeNOVA: Deduplication Extended NOVA File System |
Hyungjoon Kwon, Sogang University
Yonghyeon Cho, Sogang University
Awais Khan, Oak Ridge National Laboratory
Yeohyeon Park, Sogang University
Youngjae Kim, Sogang University |
DFMan: A Graph-based Optimization of Dataflow Scheduling on High-Performance Computing Systems |
Fahim Tahmid Chowdhury, Florida State University
Francesco Di Natale, Lawrence Livermore National Laboratory
Adam Moody, Lawrence Livermore National Laboratory
Kathryn Mohror, Lawrence Livermore National Laboratory
Weikuan Yu, Florida State University |
DGSF: Disaggregated GPUs for Serverless Functions |
Henrique Fingler, The University of Texas at Austin
Zhiting Zhu, The University of Texas at Austin
Esther Yoon, The University of Texas at Austin
Zhipeng Jia, The University of Texas at Austin
Emmett Witchel, The University of Texas at Austin
Christopher J. Rossbach, The University of Texas at Austin |
Direct solution of larger coupled sparse/dense linear systems using low-rank compression on single-node multi-core machines in an industrial context |
Emmanuel Agullo, Inria
Marek Felšöci, Inria
Guillaume Sylvand, Airbus Central R & T |
DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices |
Xueyu Hou, New Jersey Institute of Technology
Yongjie Guan, New Jersey Institute of Technology
Tao Han, New Jersey Institute of Technology
Ning Zhang, University of Windsor |
Distributed Memory Sparse Kernels for Machine Learning |
Vivek Bharadwaj, University of California, Berkeley
Aydın Buluç, Lawrence Berkeley National Laboratory
James Demmel, University of California, Berkeley |
Dynamic Computation Offloading for Green Things-Edge-Cloud Computing with Local Caching |
Xianzhong Tian, Zhejiang University of Technology
Huixiao Meng, Zhejiang University of Technology
Yanjun Li, Zhejiang University of Technology
Pingting Miao, Zhejiang University of Technology
Pengcheng Xu, Zhejiang University of Technology |
Dynamic Task Shaping for High Throughput Data Analysis Applications in High Energy Physics |
Benjamin Tovar, University of Notre Dame
Benjamin Lyons, University of Notre Dame
Kelci Mohrman, University of Notre Dame
Barry Sly-Delgado, University of Notre Dame
Kevin Lannon, University of Notre Dame
Douglas Thain, University of Notre Dame |
Enabling Efficient Request Management through Microservice Level Parallelism |
Xinkai Wang, Shanghai Jiao Tong University
Chao Li, Shanghai Jiao Tong University
Lu Zhang, Shanghai Jiao Tong University
Xiaofeng Hou, Hong Kong University of Science and Technology
Quan Chen, Shanghai Jiao Tong University
Minyi Guo, Shanghai Jiao Tong University |
Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture |
Jing Wang, Shanghai Jiao Tong University
Chao Li, Shanghai Jiao Tong University
Taolei Wang, Shanghai Jiao Tong University
Lu Zhang, Shanghai Jiao Tong University
Pengyu Wang, Shanghai Jiao Tong University
Junyi Mei, Shanghai Jiao Tong University
Minyi Guo, Shanghai Jiao Tong University |
Exploiting Reduced Precision for GPU-based Time Series Mining |
Yi Ju, Max Planck Computing and Data Facility
Amir Raoofy, Technical University of Munich
Dai Yang, NVIDIA GmbH
Erwin Laure, Max Plank Computing and Data Facility
Martin Schulz, Technical University of Munich |
Falcon: A Timestamp-based Protocol to Maximize the Cache Efficiency in the Distributed Shared Memory |
Jin Zhang, Shanghai Jiao Tong University
Xiangyao Yu, University of Wisconsin–Madison
Zhengwei Qi, Shanghai Jiao Tong University
Haibing Guan, Shanghai Jiao Tong University |
FAM-Graph: Graph Analytics on Disaggregated Memory |
Daniel Zahka, Georgia Institute of Technology
Ada Gavrilovska, Georgia Institute of Technology |
Fast and High-Quality Influence Maximization on Multiple GPUs |
Gökhan Göktürk, Sabancı University
Kamer Kaya, Sabancı University |
Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks |
John Snyder, Duke University
Alvin R. Lebeck, Duke University |
Fast Parallel Bayesian Network Structure Learning |
Jiantong Jiang, The University of Western Australia
Zeyi Wen, The University of Western Australia
Ajmal Mian, The University of Western Australia |
Fault-tolerant Snapshot Objectsin Message Passing Systems |
Vijay Garg, UT Austin
Saptaparni Kumar, Unaffiliated
Lewis Tseng, Boston College
Xiong Zheng, Google |
Finding Small Vertex Covers in Parallel using GPUs |
Peter Yamout, American University of Beirut
Karim Barada, American University of Beirut
Adnan Jaljuli, American University of Beirut
Amer Mouawad, American University of Beirut
Izzat El Hajj, American University of Beirut |
FlashWalker: An In-Storage Accelerator for Graph Random Walks |
Fuping Niu, Huazhong University of Science and Technology
Jianhui Yue, Michigan Tech. University
Jiangqiu Shen, Michigan Tech. University
Xiaofei Liao, Huazhong University of Science and Technology
Haikun Liu, Huazhong University of Science and Technology
Hai Jin, Huazhong University of Science and Technology |
Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment |
Joseph Schuchart, University of Tennessee, Innovative Computing Laboratory
Poornima Nookala, IACS, Stony Brook University
Mohammad Mahdi Javanmard, Facebook Inc.
Thomas Herault, University of Tennessee, Innovative Computing Laboratory
Edward F. Valeev, Department of Chemistry, Virginia Tech
George Bosilca, University of Tennessee, Innovative Computing Laboratory
Robert J. Harrison, IACS, Stony Brook University |
GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs |
Yuguang Wang, Michigan Technological University
Robbie Watling, Michigan Technological University
Junqiao Qiu, Michigan Technological University
Zhenlin Wang, Michigan Technological University |
HACCS: Heterogeneity-Aware Clustered Client Selection for Accelerated Federated Learning |
Joel Wolfrath, University of Minnesota
Nikhil Sreekumar, University of Minnesota
Dhruv Kumar, University of Minnesota
Yuanli Wang, University of Minnesota
Abhishek Chandra, University of Minnesota |
HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations |
Behrooz Zarebavani, University of Toronto
Kazem Cheshmi, University of Toronto
Bangtian Liu, University of Toronto
Michelle Mills Strout, University of Arizona
Maryam Mehri Dehnavi, University of Toronto |
High-order Line Graphs of Non-uniform Hypergraphs: Algorithms, Applications, and Experimental Analysis |
Xu Tony Liu, University of Washington
Jesun Firoz, Pacific Northwest National Laboratory
Andrew Lumsdaine, University of Washington
Cliff Joslyn, Pacific Northwest National Lab
Sinan Aksoy, Pacific Northwest National Lab
Ilya Amburg, Pacific Northwest National Lab
Brenda Praggastis, Pacific Northwest National Lab
Assefaw Gebremedhin, Washington State University |
HRaft: Adaptive Erasure Coded Data Maintenance for Consensus in Distributed Networks |
Yulei Jia, Tianjin University of Technology
Guangping Xu, Tianjin University of Technology
Chi Wan Sung, City University of Hong Kong
Salwa Mostafa, City University of Hong Kong
Yulei Wu, University of Exeter |
HTS: A Threaded Multilevel Sparse Hybrid Solver |
Joshua D. Booth, University of Alabama, Huntsville |
Hybrid Workload Scheduling on HPC Systems |
Yuping Fan, Illinois Institute of technology
Zhiling Lan, Illinois Institute of technology
Paul Rich, Argonne National Laboratory
William Allcock, Argonne National Laboratory
Michael Papka, Argonne National Laboratory |
I/O-optimal Cache-oblivious Sparse Matrix-Sparse Matrix Multiplication |
Niels Gleinig, ETH Zurich
Maciej Besta, ETH Zurich
Torsten Hoefler, ETH Zurich |
In-Memory Indexed Caching for Distributed Data Processing |
Alexandru Uta, Leiden University
Bogdan Ghit, Databricks
Ankur Dave, UC Berkeley
Jan Rellermeyer, TU Delft
Peter Boncz, CWI |
Landau collision operator in the CUDA programming model applied to thermal quench plasmas |
Mark Adams, Lawrence Berkeley National Laboratory
Dylan Brennan, Princeton University
Matthew Knepley, University of Buffalo
Peng Wang, NVIDIA |
Learning Intermediate Representations using Graph Neural Networks for NUMA and Prefetchers Optimization |
Ali TehraniJamsaz, Iowa State University
Mihail Popov, Inria
Akash Dutta, Iowa State University
Emmanuelle Saillard, Inria
Ali Jannesari, Iowa State University |
Lightning: Scaling the GPU Programming Model Beyond a Single GPU |
Stijn Heldens, Netherlands eScience Center
Pieter Hijma, VU University Amsterdam
Ben van Werkhoven, Netherlands eScience Center
Jason Maassen, Netherlands eScience Center
Rob V. van Nieuwpoort, Netherlands eScience Center |
Memory Access Granularity Aware Lossless Compression for GPUs |
Sohan Lal, Technical University of Hamburg
Manuel Renz, Technical University of Berlin
Julian Hartmer, Technical University of Berlin
Ben Juurlink, Technical University of Berlin |
Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems |
Maxime Gonthier, ENS Lyon
Loris Marchal, French National Center for Scientific Research
Samuel Thibault, Univ. Bordeaux |
MICCO: An Enhanced Multi-GPU Scheduling Framework for Many-Body Correlation Functions |
Qihan Wang, College of William and Mary
Bin Ren, College of William and Mary
Jie Chen, Jefferson Lab
Robert Edwards, Jefferson Lab |
Minerva: Rethinking Secure Architectures for the Era of Fabric-Attached Memory Architectures |
Mazen Alwadi, University of Central Florida
Rujia Wang, Illinois Institute of Technology
David Mohaisen, University of Central Florida
Clayton Hughes, Sandia National Laboratories
Simon Hammond, Sandia National Laboratories
Amro Awad, North Carolina State University |
Mixed precision $s$-step Conjugate Gradient with Residual Replacement on GPUs |
Ichitaro Yamazaki, Sandia National Laboratories
Erin Carson, Charles University
Brian Kelley, Sandia National Laboratories |
MLCNN: Cross-Layer Cooperative Optimization and Accelerator Architecture for Speeding Up Deep Learning Applications |
Beilei Jiang, University of North Texas
Xianwei Cheng, University of North Texas
Sihai Tang, University of North Texas
Xu Ma, University of North Texas
Zhaochen Gu, University of North Texas
Song Fu, University of North Texas
Qing Yang, University of North Texas
Mingxiong Liu, Los Alamos National Laboratory |
Mnemonic: A Parallel Subgraph Matching System for Streaming Graphs |
Bibek Bhattarai, George Washington University
Howie Huang, George Washington University |
Modeling Matrix Engines for Portability and Performance |
Nicholai Tukanov, Carnegie Mellon University
Tze Meng Low, Carnegie Mellon University
Jose Moreira, IBM
Rajalakshmi Srinivasaraghavan, IBM |
Multi-Phase Task-Based HPC Applications: Quickly Learning how to Run Fast |
Lucas Leandro Nesi, Institute of Informatics, Federal University of Rio Grande do Sul
Lucas Mello Schnorr, Institute of Informatics, Federal University of Rio Grande do Sul
Arnaud Legrand, University Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG |
Neon: A Multi-GPU Programming Model for Grid-based Computations |
Massimiliano Meneghin, Autodesk Research
Ahmed Mahmoud, Autodesk Research
Pradeep Kumar Jayaraman, Autodesk Research
Nigel J. W. Morris, Autodesk Research |
Next-Generation Local Time Stepping for the ADER-DG Finite Element Method |
Alexander Breuer, Friedrich Schiller University Jena
Alexander Heinecke, Intel |
OmpSs@cloudFPGA: An FPGA Task-Based Programming Model with Message Passing |
Juan Miguel de Haro, Barcelona Supercomputing Center
Rubén Cano, Barcelona Supercomputing Center
Carlos Álvarez, Universitat Politécnica de Catalunya
Daniel Jiménez-González, Universitat Politécnica de Catalunya
Xavier Martorell, Universitat Politécnica de Catalunya
Eduard Ayguadé, Barcelona Supercomputing Center
Jesús Labarta, Barcelona Supercomputing Center
Burkhard Ringlein, IBM Research Europe
Francois Abel, IBM Research Europe
Beat Weiss, IBM Research Europe |
On the Parallel Reconstruction from Pooled Data |
Oliver Gebhard, Goethe University, Frankfurt
Max Hahn-Klimroth, Goethe University, Frankfurt
Dominik Kaaser, University of Hamburg
Philipp Loick, Goethe University, Frankfurt |
Optimal Arbitrary Pattern Formation on a Grid by Asynchronous Autonomous Robots |
Rory Hector, Louisiana State University
Gokarna Sharma, Kent State University
Ramachandran Vaidyanathan, Louisiana State University
Jerry L. Trahan, Louisiana State University |
Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs |
Cody Rivera, University of Alabama
Sheng Di, Argonne National Laboratory
Xiaodong Yu, Argonne National Laboratory
Jiannan Tian, Washington State University
Dingwen Tao, Washington State University
Franck Cappello, Argonne National Laboratory |
Parallel Approximations of the Tukey g-and-h Likelihoods and Predictions for Non-Gaussian Geostatistics |
Sagnik Mondal, King Abdullah University of Science and Technology
Sameh Abdulah, King Abdullah University of Science and Technology
Marc Genton, King Abdullah University of Science and Technology
Ying Sun, King Abdullah University of Science and Technology
Hatem Ltaief, King Abdullah University of Science and Technology
David Keyes, King Abdullah University of Science and Technology |
Parallel Fully Dynamic Maintenance of 2-Connected Components |
Chirayu Haryan, Indian Institute of Technology Tirupati
Ramakrishna G, Indian Institute of Technology Tirupati
Kishore Kothapalli, International Institute of Information Technology Hyderabad
Dip Sankar Banerjee, Indian Institute of Technology Jodhpur |
Parallel Global Edge Switching for the Uniform Sampling of Simple Graphs with Prescribed Degrees |
Daniel Allendorf, Goethe University Frankfurt
Ulrich Meyer, Goethe University Frankfurt
Manuel Penschuck, Goethe University Frankfurt
Hung Tran, Goethe University Frankfurt |
Parallel Tensor Train Rounding using Gram SVD |
Hussam Al Daas, Rutherford Appleton Laboratory
Grey Ballard, Wake Forest University
Lawton Manning, Wake Forest University |
Parallel, Portable Algorithms for Distance-2 Maximal Independent Set and Graph Coarsening |
Brian Kelley, Sandia National Laboratories
Sivasankaran Rajamanickam, Sandia National Laboratories |
Parallelizing and Balancing Large-scale Particle Simulations based on Coupled DSMC/PIC |
Haozhong Qiu, College of Computer, National University of Defense Technology
Chuanfu Xu, College of Computer, National University of Defense Technology
Dali Li, College of Aerospace Science and Engineer, National University of Defense Technology
Haoyu Wang, College of Aerospace Science and Engineer, National University of Defense Technology
Jie Li, College of Aerospace Science and Engineer, National University of Defense Technology
Zheng Wang, University of Leeds |
ParaTreeT: A Fast, General Framework for Spatial Tree Traversal |
Joseph Hutter, University of Illinois at Urbana-Champaign
Justin Szaday, University of Illinois at Urbana-Champaign
Jaemin Choi, University of Illinois at Urbana-Champaign
Spencer Wallace, University of Washington
Simeng Liu, University of Illinois at Urbana-Champaign
Laxmikant Kale, University of Illinois at Urbana-Champaign
Thomas Quinn, University of Washington |
PARSEC: PARallel Subgraph Enumeration in CUDA |
Vibhor Dodeja, University of Illinois at Urbana-Champaign
Mohammad Almasri, University of Illinois at Urbana-Champaign
Rakesh Nagi, University of Illinois at Urbana-Champaign
Jinjun Xiong, IBM Thomas J. Watson Research Center
Wen-Mei Hwu, University of Illinois at Urbana-Champaign |
P-ckpt: Coordinated Prioritized Checkpointing |
Subhendu Behera, North Carolina State University
Lipeng Wan, Oak Ridge National Laboratory
Frank Mueller, North Carolina State University
Matthew Wolf, Oak Ridge National Laboratory
Scott Klasky, Oak Ridge National Laboratory |
pFedGF: Enabling Personalized Federated Learning via Gradient Fusion |
Xinghao Wu, State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science, Beihang University
Jianwei Niu, State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science, Beihang University
Xuefeng Liu, State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science, Beihang University
Tao Ren, Hangzhou Innovation Institute, Beihang University, Hangzhou 310051, China
Zhangmin Huang, Hangzhou Innovation Institute of Beihang University
Zhetao Li, Hunan International Scientific and Technological Cooperation Base of Intelligent Network, Xiangtan University, Xiangtan, Hunan 411105, China |
PINT: Parallel INTerval-Based Race Detector |
Yifan Xu, Washington University in St. Louis
Anchengcheng Zhou, Washington University in St. Louis
Kunal Agrawal, Washington University in St. Louis
I-Ting Angelina Lee, Washington University in St. Louis |
Pok´eMem: Taming Wild Memory Consumers in Apache Spark |
Minhyeok Kweun, Samsung Research
Goeun Kim, Samsung Research
Byungsoo Oh, Samsung Research
Seongho Jung, Samsung Research
Taegeon Um, Samsung Research
Woo-Yeon Lee, Samsung Research |
PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling |
Xin You, Beihang University
Hailong Yang, Beihang University
Zhibo Xuan, Beihang University
Zhongzhi Luan, Beihang University
Depei Qian, Beihang University |
Preprocessing Pipeline Optimization for Scientific Deep-Learning Workloads |
Khaled Ibrahim, Lawrence Berkeley National Laboratory
Leonid Oliker, Lawrence Berkeley National Laboratory |
QoS-awareness of Microservices with Excessive Loads via Inter-Datacenter Scheduling |
Jiuchen Shi, Shanghai Jiao Tong University
Jiawen Wang, Shanghai Jiao Tong University
Kaihua Fu, Shanghai Jiao Tong University
Quan Chen, Shanghai Jiao Tong University
Deze Zeng, China University of Geosciences
Minyi Guo, Shanghai Jiao Tong University |
Resource Utilization Aware Job Scheduling to Mitigate Performance Variability |
Daniel Nichols, University of Maryland, College Park
Aniruddha Marathe, Lawrence Livermore National Laboratory
Kathleen Shoga, Lawrence Livermore National Laboratory
Todd Gamblin, Lawrence Livermore National Laboratory
Abhinav Bhatele, University of Maryland, College Park |
RLRP: High-Efficient Data Placement with Reinforcement Learning for Modern Distributed Storage Systems |
Kai Lu, Huazhong University of Science and Technology
Nannan Zhao, Northwestern Polytechnical University
Jiguang Wan, Huazhong University of Science and Technology
Changhong Fei, Huazhong University of Science and Technology
Wei Zhao, SenseTime Research
Tongliang Deng, SenseTime Research |
SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs |
Seongyeon Park, CS, Yonsei University
Hajin Kim, CS, Yonsei University
Tanveer Ahmad, TU Delft
Nauman Ahmed, TU Delft
Zaid Al-Ars, TU Delft
Peter Hofstee, TU Delft
Youngsok Kim, CS/AI, Yonsei University
Jinho Lee, CS/AI, Yonsei University |
Scalable Low-Latency Inter-FPGA Networks |
Kien Trung Pham, Graduate University for Advanced Studies, SOKENDAI
Thao Nguyen Truong, National Institute of Advanced Industrial Science and Technology (AIST)
Hiroshi Yamaguchi, Photonics Electronics Technology Research Association (PETRA)
Yutaka Urino, Photonics Electronics Technology Research Association (PETRA)
Michihiro Koibuchi, National Institute of Informatics (NII) |
Scalable Multi-Versioning Metadata Dictionaries with Persistent Memory Support |
Bogdan Nicolae, Argonne National Laboratory (ANL) |
Scaling and Selecting GPU Methods for All Pairs Shortest Paths (APSP) Computations |
Yang Xia, Ohio State
Peng Jiang, University of Iowa
Rajiv Ramnath, Ohio State
Gagan Agrawal, Augusta University |
Scheduling on Uniform and Unrelated Machines with Bipartite Incompatibility Graphs |
Tytus Pikies, Gdańsk University of Technology
Hanna Furmańczyk, University of Gdańsk |
SecFortress: Securing Hypervisor using Cross-layer Isolation |
Qihang Zhou, Institute of Information Engineering, Chinese Academy of Sciences
Xiaoqi Jia, Institute of Information Engineering, Chinese Academy of Sciences
Shengzhi Zhang, Department of Computer Science, Metropolitan College, Boston University, USA
Nan Jiang, Institute of Information Engineering, Chinese Academy of Sciences
Jiayun Chen, Institute of Information Engineering, Chinese Academy of Sciences
Weijuan Zhang, Institute of Information Engineering, Chinese Academy of Sciences |
SFP: Service Function Chain Provision on Programmable Switches for Cloud Tenants |
Hongyi Huang, Tsinghua University
Wenfei Wu, Peking University
Zehua Guo, Beijing Institute of Technology
Yongchao He, Tsinghua University |
"Smarter" NICs for Faster Molecular Dynamics: a Case Study |
Sara Karamati, Georgia Institute of Technology
Jeffrey Young, Georgia Institute of Technology
Richard Vuduc, Georgia Institute of Technology |
Sparsity-Aware Tensor Factorization |
Sureyya Emre Kurt, University of Utah
Saurabh Raje, University of Utah
Aravind Sukumaran-Rajam, Washington State University
P. Sadayappan, University of Utah |
SpectralFly: Ramanujan Graphs as Flexible and Efficient Interconnection Networks |
Sinan Aksoy, Pacific Northwest National Lab
Stephen Young, Pacific Northwest National Lab
Jesun Firoz, Pacific Northwest National Laboratory
Roberto Gioiosa, Pacific Northwest National Lab
Mark Raugas, Pacific Northwest National Lab
Mark Kempton, Brigham Young University
Tobias Hagge, Pacific Northwest National Lab
Juan Andres Escobedo Contreras, Pacific Northwest National Lab |
SPIDER: An Effective, Efficient and Robust Load Scheduler for Real-time Split Frame Rendering |
Bingzheng Ma, Nankai University
Ziqiang Zhang, Nankai University
Yusen Li, Nankai University
Wentong Cai, Nanyang Technological University
Gang Wang, Nankai University
Xiaoguang Liu, Nankai University |
SSB-Tree: Making Persistent Memory B+-Trees Crash-Consistent and Concurrent by Lazy-Box |
Tongliang Li, Tsinghua University
Haixia Wang, Tsinghua University
Airan Shao, Tsinghua University
Dongsheng Wang, Tsinghua University |
StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs |
Qingxiao Sun, Beihang University
Yi Liu, Beihang University
Hailong Yang, Beihang University
Zhonghui Jiang, Beihang University
Zhongzhi Luan, Beihang University
Depei Qian, Beihang University |
TagTree: Global Tagging Index with Efficient Querying for Time Series Databases |
Jin Xue, The Chinese University of Hong Kong
Zhiqi Wang, The Chinese University of Hong Kong
Tianyu Wang, The Chinese University of Hong Kong
Zili Shao, The Chinese University of Hong Kong |
Task-based Acceleration of Bidirectional Recurrent Neural Networks on Multi-core Architectures |
Robin Kumar Sharma, Barcelona Supercomputing center
Marc Casas, Barcelona Supercomputing center |
TEE-based decentralized recommender systems: The raw data sharing redemption |
Akash Dhasade, EPFL
Nevena Dresevic, EPFL
Anne-Marie Kermarrec, EPFL
Rafael Pires, EPFL |
The Fast and Scalable MPI Application Launch of the Tianhe HPC system |
Yiqin Dai, National University of Defense Technology
Yong Dong, National University of Defense Technology
Min Xie, National University of Defense Technology
Kai Lu, National University of Defense Technology
Ruibo Wang, National University of Defense Technology |
The Universal Gossip Fighter |
Anastasiia Gorbunova, École polytechnique fédérale de Lausanne
Anne-Marie Anne-Marie Kermarrec, École polytechnique fédérale de Lausanne
Anastasiia Kucherenko, École polytechnique fédérale de Lausanne
Rafaël Pinot, École polytechnique fédérale de Lausanne
Rachid Guerraoui, École polytechnique fédérale de Lausanne |
Top-Down Performance Profiling on NVIDIA’s GPUs |
Alvaro Saiz, University of Cantabria
Pablo Prieto, University of Cantabria
Pablo Abad, University of Cantabria
Jose Angel Gregorio, University of Cantabria
Valentin Puente, University of Cantabria |
Topological Modeling and Parallelization of Multidimensional Data on Microelectrode Arrays |
Olamide Tawose, University of Nevada, Reno
Bin Li, University of Nevada, Reno
Lei Yang, University of Nevada, Reno
Feng Yan, University of Nevada, Reno
Dongfang Zhao, University of Nevada, Reno |
Towards Distributed 2-Approximation Steiner Minimal Trees in Billion-edge Graphs |
Tahsin Reza, Lawrence Livermore National Laboratory
Geoffrey Sanders, Lawrence Livermore National Laboratory
Roger Pearce, Lawrence Livermore National Laboratory |
Traffic-Optimal Virtual Network Function Placement and Migration in Dynamic Cloud Data Centers |
Vincent Tran, University of California Riverside
Jingsong Sun, California State University Dominguez Hills
Bin Tang, California State University Dominguez Hills
Deng Pan, Florida International University |
Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators |
Raveesh Garg, Georgia Institute of Technology
Eric Qin, Georgia Institute of Technology
Francisco Muñoz-Martínez, Universidad de Murcia
Robert Guirado, Universitat Politecnica de Catalunya
Akshay Jain, Neutroon
Sergi Abadal, Universitat Politecnica de Catalunya
José Abellán, Universidad Católica de Murcia
Manuel Acacio, Universidad de Murcia
Eduard Alarcon, Universitat Politecnica de Catalunya
Sivasankaran Rajamanickam, Sandia National Laboratories
Tushar Krishna, Georgia Institute of Technology |
Unlocking Personalized Healthcare on Modern CPUs/GPUs: Three-way Gene Interaction Study |
Diogo Marques, INESC-ID
Rafael Campos, INESC-ID
Sergio Santander-Jiménez, Polytechnic School, University of Extremadura
Zakhar Matveev, Intel Corporation
Leonel Sousa, INESC-ID
Aleksandar Ilic, INESC-ID |
Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning |
Thao Nguyen Truong, AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory
François Trahay, Télécom SudParis
Jens Domke, RIKEN Center for Computational Science
Aleksandr Drozd, RIKEN Center for Computational Science
Emil Vatai, RIKEN Center for Computational Science
Jianwei Liao, College of Computer and Information Science, Southwest University of China
Mohamed Wahib, National Institute of Advanced Industrial Science and Technology
Balazs Gerofi, RIKEN Center for Computational Science |