We gratefully acknowledge support from
the Simons Foundation and member institutions.

Performance

New submissions

[ total of 5 entries: 1-5 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 19 Apr 24

[1]  arXiv:2404.11788 [pdf, other]
Title: NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)

Machine Learning (ML) operators are the building blocks to design ML models with various target applications. GEneral Matrix Multiplication (GEMM) operators are the backbone of ML models. They are notorious for being computationally expensive requiring billions of multiply-and-accumulate. Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models. GPUs and accelerators are widely deployed to accelerate ML workloads by optimizing the execution of GEMM operators. Nonetheless, the performance of NonGEMM operators have not been studied as thoroughly as GEMMs. Therefore, this paper describes \bench, a benchmark to study NonGEMM operators. We first construct \bench using popular ML workloads from different domains, then perform case studies on various grade GPU platforms to analyze the behavior of NonGEMM operators in GPU accelerated systems. Finally, we present some key takeaways to bridge the gap between GEMM and NonGEMM operators and to offer the community with potential new optimization directions.

Replacements for Fri, 19 Apr 24

[2]  arXiv:2404.09471 (replaced) [pdf, other]
Title: LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization
Comments: 11 pages, 6 figures. Accepted at FCCM 2024
Subjects: Performance (cs.PF); Hardware Architecture (cs.AR)
[3]  arXiv:2302.06836 (replaced) [pdf, other]
Title: COMET: Neural Cost Model Explanation Framework
Comments: Proceedings of the 5th MLSys Conference, Santa Clara, CA, USA, 2024
Subjects: Performance (cs.PF); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
[4]  arXiv:2308.06409 (replaced) [pdf, other]
Title: Through the Lens of Google CrUX: Dissecting Web Browsing Experience Across Devices and Countries
Comments: 9 pages, 6 figures and 1 table. Accepted for publication at 2024 IFIP Networking Conference
Subjects: Networking and Internet Architecture (cs.NI); Performance (cs.PF)
[5]  arXiv:2312.04876 (replaced) [pdf, other]
Title: GVE-Louvain: Fast Louvain Algorithm for Community Detection in Shared Memory Setting
Authors: Subhajit Sahu
Comments: 11 pages, 8 figures, 2 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[ total of 5 entries: 1-5 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help  (Access key information)