About
I'm interested in making massive neural networks more efficient, and getting them…
Activity
-
I am delighted to announce that I have joined Cohere - the frontier AI model company - as the founding Solutions Architect in EMEA. Our mission at…
I am delighted to announce that I have joined Cohere - the frontier AI model company - as the founding Solutions Architect in EMEA. Our mission at…
Liked by Aidan Gomez
-
Cohere's head of public policy, Melika Carroll, attended this morning’s launch of CAISI in support of Canada’s continued leadership in safe and…
Cohere's head of public policy, Melika Carroll, attended this morning’s launch of CAISI in support of Canada’s continued leadership in safe and…
Liked by Aidan Gomez
-
I couldn’t be happier to be joining Cohere full-time as a Member of Technical Staff. Special thanks to Carlos Lassance, Nils Reimers & Tom Duncan…
I couldn’t be happier to be joining Cohere full-time as a Member of Technical Staff. Special thanks to Carlos Lassance, Nils Reimers & Tom Duncan…
Liked by Aidan Gomez
Experience
Education
Volunteer Experience
-
Volunteer
Good Shepherd Ministries
- 9 months
Poverty Alleviation
The Good Shepherd is a homeless shelter in Toronto where hundreds line up every day to receive breakfast, lunch, and dinner. I had the honour of being able to serve these individuals breakfast and have conversations with my fellow Torontonians. The volunteers and staff that I worked alongside are stunning examples of human empathy, I'm endlessly grateful for what The Good Shepherd has given me.
-
Journey of Hope
Kawartha Pine Ridge District School Board
- 6 months
Children
The Journey of Hope is a humanitarian KPRDSB initiative sending students from three schools to Tanzania, Africa. Our roles there ranged from restoration of educational infrastructure, to educating students in computer skills. In addition we donated over 1200 pounds of supplies to various institutions across the country.
-
Volunteer
The Companions of the Order of Malta, Oxford
- Present 6 years 1 month
Poverty Alleviation
I've been incredibly fortunate to have been able to spend time having conversations with and serving food and drink to fellow Oxonians.
Publications
-
The Reversible Residual Network: Backpropagation Without Storing Activations
Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's. Therefore, the…
Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's. Therefore, the activations for most layers need not be stored in memory during backpropagation. We demonstrate the effectiveness of RevNets on CIFAR-10, CIFAR-100, and ImageNet, establishing nearly identical classification accuracy to equally-sized ResNets, even though the activation storage requirements are independent of depth.
Other authorsSee publication -
One Model To Learn Them All
Arxiv
Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition…
Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.
Other authorsSee publication -
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more…
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Other authorsSee publication -
Depthwise Separable Convolutions for Neural Machine Translation
Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional…
Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves new state-of-the-art results. In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new "super-separable" convolution operation that further reduces the number of parameters and computational cost for obtaining state-of-the-art results.
Other authorsSee publication -
Blog: The Neural Turing Machine
A brief outline of the Neural Turing Machine's (NTM) design; a backpropogatable architecture that can (among many possibilities) learn to dynamically execute programs.
-
Blog: Backpropogating an LSTM: A Numerical Example
Medium
LSTMs are arguably the most widely-used architecture in recurrent neural networks. This article walks through the mathematics behind these versatile units.
-
Blog: Facebook on the creation of Machine Intelligence
Medium
An exploration of the technologies and philosophy being used to craft the first generation of artificial intelligence.
Honors & Awards
-
AI Grant Fellow
AI Grant
aigrant.org - A fellowship sponsored by Google, CRV and others; started by Nat Friedman (Xamarin) and Daniel Gross (Y Combinator).
-
Clarendon Scholar
-
clarendon.ox.ac.uk - billed as Oxford’s most competitive graduate scholarship, the Clarendon scholarship is awarded exclusively based on academic performance and contribution.
-
Open Philanthropy AI Fellow
Open Philanthropy
-
University College Alumni Scholar
-
Languages
-
English
Native or bilingual proficiency
More activity by Aidan
-
Command is evolving to support agentic RAG for more precise and efficient retrieval. We’ve just released six tutorials with notebooks for working…
Command is evolving to support agentic RAG for more precise and efficient retrieval. We’ve just released six tutorials with notebooks for working…
Liked by Aidan Gomez
-
Get your copy of LLM-book.com and stay tuned for something spectacular Andrew Ng, Maarten Grootendorst, and I are cooking! Amazon:…
Get your copy of LLM-book.com and stay tuned for something spectacular Andrew Ng, Maarten Grootendorst, and I are cooking! Amazon:…
Liked by Aidan Gomez
-
Get ready for Microsoft Ignite! Visit us at Booth #404-R to engage with our team of industry experts. Don't miss out on the theater sessions, where…
Get ready for Microsoft Ignite! Visit us at Booth #404-R to engage with our team of industry experts. Don't miss out on the theater sessions, where…
Liked by Aidan Gomez
-
Thanks VentureBeat, for covering my blog about Cohere's new multimodal Embed 3 model! Unlike other models that separate text and image data into…
Thanks VentureBeat, for covering my blog about Cohere's new multimodal Embed 3 model! Unlike other models that separate text and image data into…
Liked by Aidan Gomez
-
Stressed by open enrollment? Say hello to Oracle’s Benefits Analyst AI Agent! 🎉 We’re entering a new era of virtual, personalized help—an AI agent…
Stressed by open enrollment? Say hello to Oracle’s Benefits Analyst AI Agent! 🎉 We’re entering a new era of virtual, personalized help—an AI agent…
Liked by Aidan Gomez
-
We're thrilled that Aya Expanse is trending as one of the Hugging Face Spaces of the week! 🔥 Check out the Aya Expanse Hugging Face Space to chat…
We're thrilled that Aya Expanse is trending as one of the Hugging Face Spaces of the week! 🔥 Check out the Aya Expanse Hugging Face Space to chat…
Liked by Aidan Gomez
-
متخصصي وباحثي الذكاء الاصطناعي في المملكة، هذه فرصة وظيفية نادرة للعمل على نماذج اللغة في Cohere، احد الشركات (التي تعد على اصابع اليد) التي تدرب…
متخصصي وباحثي الذكاء الاصطناعي في المملكة، هذه فرصة وظيفية نادرة للعمل على نماذج اللغة في Cohere، احد الشركات (التي تعد على اصابع اليد) التي تدرب…
Liked by Aidan Gomez
-
Let’s be honest: predicting AI risks decades out can feel like a guessing game. That’s why the conversation between Nick Frosst from Cohere and…
Let’s be honest: predicting AI risks decades out can feel like a guessing game. That’s why the conversation between Nick Frosst from Cohere and…
Liked by Aidan Gomez
-
We open-sourced a fine-tuning repo and container for easily customizing all our models including Cohere's Command and Cohere For AI's Aya Expanse…
We open-sourced a fine-tuning repo and container for easily customizing all our models including Cohere's Command and Cohere For AI's Aya Expanse…
Liked by Aidan Gomez
-
Hugging Face welcomes Cohere's all new Aya Expanse family of multilingual models. Link to the official blog post and a small "get hands on" blog…
Hugging Face welcomes Cohere's all new Aya Expanse family of multilingual models. Link to the official blog post and a small "get hands on" blog…
Liked by Aidan Gomez
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Aidan Gomez
-
Aidan Gomez
Mechanical Engineering Student at The University of Texas at San Antonio
-
Aidan Gomez
Finance Major at The Ohio State University
-
Aidan Gomez
Data Analyst
-
Aidan Gomez
Business Development Coordinator at TelevisaUnivision
60 others named Aidan Gomez are on LinkedIn
See others named Aidan Gomez