Noam Shazeer

Cited by

	All	Since 2019
Citations	207550	201035
h-index	59	54
i10-index	99	83

64000

32000

16000

48000

20172018201920202021202220232024671 2358 6944 13613 23409 36593 56373 63709

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Noam Shazeer

Character.ai

Verified email at character.ai

Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Attention is all you need A Vaswani Advances in Neural Information Processing Systems, 2017	152952	2017
Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... Journal of machine learning research 21 (140), 1-67, 2020	20029	2020
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023	5189	2023
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer N Shazeer, A Mirhoseini, K Maziarz, A Davis, Q Le, G Hinton, J Dean arXiv preprint arXiv:1701.06538, 2017	2477	2017
Scheduled sampling for sequence prediction with recurrent neural networks S Bengio, O Vinyals, N Jaitly, N Shazeer Advances in neural information processing systems 28, 2015	2431	2015
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018	2089	2018
Advances in neural information processing systems A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Attention is all you need, 2017	2018	2017
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity W Fedus, B Zoph, N Shazeer Journal of Machine Learning Research 23 (120), 1-39, 2022	1887	2022
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022	1568	2022
Exploring the limits of language modeling R Jozefowicz, O Vinyals, M Schuster, N Shazeer, Y Wu arXiv preprint arXiv:1602.02410, 2016	1456	2016
Generating wikipedia by summarizing long sequences PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer arXiv preprint arXiv:1801.10198, 2018	1025	2018
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020	1010	2020
Adafactor: Adaptive learning rates with sublinear memory cost N Shazeer, M Stern International Conference on Machine Learning, 4596-4604, 2018	989	2018
Music transformer CZA Huang, A Vaswani, J Uszkoreit, N Shazeer, I Simon, C Hawthorne, ... arXiv preprint arXiv:1809.04281, 2018	979	2018
Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017 V Ashish, S Noam, P Niki, U Jakob, J Llion Attention is all you need. In Advances in neural information processing …, 2017	876	2017
How much knowledge can you pack into the parameters of a language model? A Roberts, C Raffel, N Shazeer arXiv preprint arXiv:2002.08910, 2020	873	2020
End-to-end text-dependent speaker verification G Heigold, I Moreno, S Bengio, N Shazeer 2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016	806	2016
Glu variants improve transformer N Shazeer arXiv preprint arXiv:2002.05202, 2020	707	2020
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018	654	2018
Mesh-tensorflow: Deep learning for supercomputers N Shazeer, Y Cheng, N Parmar, D Tran, A Vaswani, P Koanantakool, ... Advances in neural information processing systems 31, 2018	426	2018

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by