WTF are LLMs?

Now why is It important for you to understand how an LLM works? Because at some point in the future, you will be forced to.

abhiram kandiyana
August 30, 2023

LLMs are everywhere. From OpenAI to Meta, all the firms are trying to get into the race for LLMs. I mean, Microsoft and Meta collaborated to release LLAMA-2. When was the last time you remember giant companies like these coming together?

Some experts believe that they are our way to AGI and they will transform the world just like the internet did. But others say, LLMs can’t go any bigger and they will break eventually.

What do I think? hmm, I don’t care, and neither should you. I know that LLMs will be here at least for another decade and that’s all that matters for now.

Now why is It important for you to understand how an LLM works? Because at some point in the future, you will be forced to. Companies of every size from every industry will start building their own LLMs personalized for their customers and having pre-context about them and understanding how they work might help you climb up the ladder. Hey, who knows? I know many friends were hired mostly because they talked smartly in the interviews.

With that clear, Let’s get started.

Note: To All LLM and AI experts, I took some liberty to condense some concepts into simpler terms so everyone can understand them and in the process may have undermined the effect or the importance of any particular process. Kindly apologize.

What are Large Language Models? simple. they are language models that are huge in size with billions of parameters and thousands of layers. language models are like any other machine learning models except these are used to generate natural language.

They take text input and do a variety of natural language processing(NLP) tasks like classification, semantic analysis, Named Entity recognition, and creating dialogue.

What sets these newer language models than the older ones is all these models are made up of transformer layers or transformers.

Until 2017, Recurrent Neural Nets(RNNs) were SoTA for NLP tasks. but they had many issues that were slowing down the training process and were quite inflexible.

Transformers were introduced in 2017 and they have changed the NLP world ever since. These models have figured out how to learn text faster, better, and more efficiently. The key concept that differentiates these from other NLP models is that they can track relationships in sequential data. They did this with the help of the Attention mechanism.

The song “Attention” by Charlie Puth was released in 2017. Though the attention mechanism was introduced in 2015, It’s true potential was exposed in 2017 when researchers at google decided to build transformers using only attention modules. Coincidence? Maybe. Or we just live in a simulation.

Attention mechanisms allow the model to highlight the most relevant part of the input and use that for further processing. Each token of a sequence is given a weight that implies its relevancy to the output task.

Let’s get a bit technical. There are three components to the attention mechanism: Query, key, and value.

Query: It’s a representation of the expected output. It may be a word or a sentence.

Key: These are some representations of the words in the input to identify which parts are relevant to the query.

value: These are the semantic information of the words in keys. The model uses the values to extract the information that It needs to focus on.

Steps in Attention Mechanism:

credits
These scores are passed through Softmax because the keys are highly dependent on each other.
credits
credits

If this all doesn’t make sense, please watch this YouTube video by my friend Michael Phi. It has great infographics and Michael's explanation is on point.

When the researchers in 2015 discovered the attention mechanism, they added it to the static encoder-decoder architecture of RNNs. This improved the results to some extent as expected.

However, the Transformer was born in 2017 when a group of researchers at Google decided to try only the attention mechanism without any other RNN modules. The paper titled “Attention is All You Need” is one of the most momentous papers in mankind.

OpenAI’s GPT-2 used these transformer modules to build a Language Model(LM) and they have seen that these have a high potential to generalize human data. for their next iteration, GPT-3, they introduced LLMs as few shot learners. Few-shot learning enables the models to generate text on a topic with which the model is acquainted with just a few samples in the training data. The model was the talk of the town with 135B parameters and trained on a humongous amount of “public” human data. This means you can train the model on diverse data and the model can generalize only with a few samples per category in training data. The “foundation” model for NLP was born.

But, the model still sounded robot. It didn’t feel human and was not able to generate text that humans would say in a dialogue. The model had the potential as It was trained on human data but It couldn’t generate text that made complete sense. It didn’t know how to talk like humans even though It could understand our language. This led to a new training technique called Reinforcement Learning with Human Feedback(RLHF).

RLHF is a unique training approach where the model learns from human feedback. It is useful for conversation data only. So, the training data contains conversations between humans. The model is given an input prompt and is asked to generate a response to sound like humans. Its output is given a score on a scale based on metrics like honesty, helpfulness, etc. Now, I can only speculate on the metrics as they are not generally released to the public.

The feedback from humans is used as a signal in its reward function to update its Q matrix. This process repeats for thousands of trials on different topics of data. RLHF was first used on a small LM named InstructGPT which is based on GPT-3.

When OpenAI released ChatGPT, a chat service that allows us to talk to the GPT-3.5 model, the world went nuts. People couldn’t believe that an AI model, which has always been behind the screens as a recommendation system is now able to talk, interact, and create conversations, just like humans. All thanks to RLHF.

However many scientists believe that there are many side effects to RLHF. That it could be biased to the humans who give the feedback. They believe that even with rigorous selection criteria and extensive training, a few samples can never represent our diverse population. And that feels fair. Humans are biased. In fact, we live in bubbles and we believe what we listen to and see as truth and are not even ready to listen to what other people say. When you want an AI to sound human, these biases are inherent.

chatGPT-3.5 still had problems. lot of hallucinations, no nuances in its replies and It was not really great at solving equations or generating code and was easily jailbroken. The introduction of GPT-4 fixed a few of these but the most remarkable is its ability to generate a nuanced reply which we don’t listen to even from humans nowadays. Check out the example below.

Another feature that I felt helpful was its ability to acknowledge and correct its mistakes. Though GPT-4 can’t learn many tasks we can perform without even thinking, It is the closest thing to human intelligence, and that brings joy to me.

And we may be decades away from another breakthrough but what we have now is already enough to transform our world, in a good way. see you next week.

Reply

or to participate.