13 Things I Learned About LLMs

January 29th, 2024

13 Things I Learned About LLMs screenshot

With the massive impact of LLMs, I was curious about how they actually work. In my research, I stumbled upon Andrej Karpathy's talk, Intro To Large Language Models (LLMs). These are a series of notes I took.

You can watch the video here: Andrej Karpathy's Talk

I also recommend to check out the slides.

  • What a neural network does, at heart, is to predict the next word. That is why it takes time to generate: it goes word by word.
  • An LLM is fed by parameters. These parameters are a compression of the whole Internet. It takes vast amounts of GPU power and time to do that.
  • A model like Llama has basically two files: a huge one with the parameters, and another one to run it.
  • The network hallucinates based on what it has been trained on. Some stuff it returns can be data it has memorized, but some can be completely made up. There is no way for us to check. Well, the only way is for us to know more than the LLM and verify the information.
  • In the end, we do not understand how these models work on the inside. It is all empirical.
  • There are two stages when making a model:
    1. Pre-training: where we gather vast amounts of data. Quantity matters. This will be called the base model.
    2. Fine-tuning: a qualitative stage where we refine the data. We do this by switching to an assistant kind of model.
  • These two stages are done in cycles, so they constantly keep improving.
  • Recently a Stage 3 was introduced, called Reinforcement Learning From Human Feedback (RLHF). This stage is used to provide even better results.
  • Private models work better than public ones. We do not know the weights of the private models. Here you can see a ranking of the most popular models. Notice how the first ones all have a proprietary license.
  • LLMs scale based on the number of parameters & amount of text. The more computing power we have, the better the model will be. This is growing on a linear scale, and so far there haven't been signs of it changing. So by having more GPUs, and more powerful, we can have just better models. This is one of the reasons that there's a gold rush in acquiring GPUs.
  • LLMs can use tools like humans do. For example, when you ask ChatGPT to plot a graph, it will reach into its plotting tools and present it to you. These tools are constantly evolving.
  • So far, LLMs only have System 1 thinking. This means that when we ask a question, the LLM will start generating instantly, and provide on response. We are hoping that soon these models have a System 2 thinking, where they take time to think, and can produce a more precise answer. There's a lot of research going on trying to do that.
  • There are many ways to break LLMs in terms of security breaches, notably jailbreak and prompt injection

Wanna work with me?