Transformers, the tech behind LLMs | Deep Learning Chapter 5
the initials GPT stand for generative pre-trained Transformer so that first word is straightforward enough these are Bots that generate new text pre-trained refers to how the model went through a process of learning from a massive amount of data and the prefix insinuates that there's more room to fine-tune it on specific tasks with additional training but the last word that's the real key piece a Transformer is a specific kind of neural network a machine learning model and it's the core invention underlying the current boom in AI what I want to do with this video and the following chapters is go through a visually driven explanation for what actually happens inside a Transformer we're going to follow the data that flows through it and go step by step there are many different kinds of models that you can build using Transformers some models take in audio and produce a transcript this sentence comes from a model going the other way around producing synthetic speech just from text all those tools that took the World by storm in 2022 like doll in mid-journey that take in a text description and produce an image are based on Transformers and even if I can't quite get it to understand what a pie creature is supposed to be I'm still blown away that this kind of thing is even remotely possible and the original Transformer introduced in 2017 by Google was invented for the specific use case of translating text from one language into another but the variant that you and I will focus on which is the type that underlies tools like chat GPT will be a model that's trained to take in in a piece of text maybe even with some surrounding images or sound accompanying it and produce a prediction for what comes next in the passage that prediction takes the form of a probability distribution over many different chunks of text that might follow at first glance you might think that predicting the next word feels like a very different goal from generating new text but once you have a prediction model like this a simple thing you could try to make it generate a longer piece of text is to give it an initial snippet to work with have it take a random samp Le from the distribution it just generated append that sample to the ...
Watch the full video by Grant Sanderson on YouTube.