default search action
4th MLSys 2021 [virtual]
- Alex Smola, Alex Dimakis, Ion Stoica:
Proceedings of the Fourth Conference on Machine Learning and Systems, MLSys 2021, virtual, April 5-9, 2021. mlsys.org 2021 - Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma, Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto Delmas Lascorz, Gennady Pekhimenko, Andreas Moshovos:
Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick. - Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, Yunxin Liu:
To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks. - Pratik Fegade, Tianqi Chen, Phillip B. Gibbons, Todd C. Mowry:
Cortex: A Compiler for Recursive Deep Learning Models. - Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris S. Papailiopoulos:
Adaptive Gradient Communication via Critical Learning Regime Identification. - Sameer Kumar, Yu Emma Wang, Cliff Young, James Bradbury, Naveen Kumar, Dehao Chen, Andy Swing:
Exploring the Limits of Concurrency in ML Training on Google TPUS. - Lucas Liebenwein, Cenk Baykal, Brandon Carter, David Gifford, Daniela Rus:
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy. - Shantanu Mandal, Todd A. Anderson, Javier Turek, Justin Gottschlich, Shengtian Zhou, Abdullah Muzahid:
Learning Fitness Functions for Machine Programming. - Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Anshumali Shrivastava:
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More. - Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han:
IOS: Inter-Operator Scheduler for CNN Acceleration. - Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham Leghettas, Kamel Abdous, Taha Arbaoui, Karima Benatchba, Saman P. Amarasinghe:
A Deep Learning Based Cost Model for Automatic Code Optimization. - Omid Aramoon, Pin-Yu Chen, Gang Qu:
Don't Forget to Sign the Gradients! - Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang:
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference. - Hamzah Abdel-Aziz, Ali Shafiee, Jong Hoon Shin, Ardavan Pedram, Joseph Hassoun:
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators. - Brennan Saeta, Denys Shabalin:
Swift for TensorFlow: A portable, flexible platform for deep learning. - Yichen Yang, Phitchaya Mangpo Phothilimthana, Yisu Remy Wang, Max Willsey, Sudip Roy, Jacques Pienaar:
Equality Saturation for Tensor Graph Superoptimization. - Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa:
PipeMare: Asynchronous Pipeline Parallel DNN Training. - Ahmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini, Marco Canini:
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems. - Benoit Steiner, Chris Cummins, Horace He, Hugh Leather:
Value Learning for Throughput Optimization of Deep Learning Workloads. - Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum:
Scaling Distributed Training with Adaptive Summation. - Giulio Zhou, Martin Maas:
Learning on Distributed Traces for Data Center Storage Systems. - Hongyi Wang, Saurabh Agarwal, Dimitris S. Papailiopoulos:
Pufferfish: Communication-efficient Models At No Extra Cost. - Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows:
A Learned Performance Model for Tensor Processing Units. - Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu:
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters. - Yunfeng Lin, Guilin Li, Xing Zhang, Weinan Zhang, Bo Chen, Ruiming Tang, Zhenguo Li, Jiashi Feng, Yong Yu:
ModularNAS: Towards Modularized and Reusable Neural Architecture Search. - Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu:
FLAML: A Fast and Lightweight AutoML Library. - Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu:
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models. - Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng Sun, Leman Akoglu:
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection. - Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster:
Pipelined Backpropagation at Scale: Training Large Models without Batches. - Peifeng Yu, Jiachen Liu, Mosharaf Chowdhury:
Fluid: Resource-aware Hyperparameter Tuning Engine. - Colby R. Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough:
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers. - Urmish Thakker, Paul N. Whatmough, Zhi Gang Liu, Matthew Mattina, Jesse G. Beu:
Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices. - Guixiang Ma, Yao Xiao, Theodore L. Willke, Nesreen K. Ahmed, Shahin Nazarian, Paul Bogdan:
A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core Systems. - David Stutz, Nandhini Chandramoorthy, Matthias Hein, Bernt Schiele:
Bit Error Robustness for Energy-Efficient DNN Accelerators. - Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko:
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models. - Eyal Cidon, Evgenya Pergament, Zain Asgar, Asaf Cidon, Sachin Katti:
Characterizing and Taming Model Instability Across Edge Devices. - Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu:
Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery. - Bharathan Balaji, Christopher Kakovitch, Balakrishnan Narayanaswamy:
FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation. - Guanhua Wang, Zhuang Liu, Brandon Hsieh, Siyuan Zhuang, Joseph Gonzalez, Trevor Darrell, Ion Stoica:
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data. - Tom Bannink, Adam Hillier, Lukas Geiger, Tim de Bruin, Leon Overweel, Jelmer Neeven, Koen Helwegen:
Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks. - Guanhua Wang, Kehan Wang, Kenan Jiang, Xiangjun Li, Ion Stoica:
Wavelet: Efficient DNN Training with Tick-Tock Scheduling. - Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler:
Data Movement Is All You Need: A Case Study on Optimizing Transformers. - Christoph Müller, François Serre, Gagandeep Singh, Markus Püschel, Martin T. Vechev:
Scaling Polyhedral Neural Network Verification on GPUs. - Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent:
Accounting for Variance in Machine Learning Benchmarks. - Nathalie Rauschmayr, Vikas Kumar, Rahul Huilgol, Andrea Olgiati, Satadal Bhattacharjee, Nihal Harish, Vandana Kannan, Amol Lele, Anirudh Acharya, Jared Nielsen, Lakshmi Ramakrishnan, Ishan Bhatt, Kohen Chia, Neelesh Dodda, Zhihan Li, Jiacheng Gu, Miyoung Choi, Balajee Nagarajan, Jeffrey Geevarghese, Denis Davydenko, Sifei Li, Lu Huang, Edward Kim, Tyler Hill, Krishnaram Kenthapadi:
Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model Training. - James Gleeson, Moshe Gabel, Gennady Pekhimenko, Eyal de Lara, Srivatsan Krishnan, Vijay Janapa Reddi:
RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads. - Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, Rocky Rhodes:
TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. - Konstantinos Konstantinidis, Aditya Ramamoorthy:
ByzShield: An Efficient and Robust System for Distributed Training. - Nadeen Gebara, Manya Ghobadi, Paolo Costa:
In-network Aggregation for Shared Machine Learning Clusters. - Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, Gustavo Alonso:
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions. - Toshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara:
Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity. - Steve Dai, Rangharajan Venkatesan, Mark Ren, Brian Zimmer, William J. Dally, Brucek Khailany:
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. - Ettore M. G. Trainiti, Thanapon Noraset, David Demeter, Doug Downey, Simone Campanoni:
CODE: Compiler-based Neuron-aware Ensemble training.
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.