The AI Solopreneur
Posts
LLAMA-2: SoTA for open-access LLM's

LLAMA-2: SoTA for open-access LLM's

breaking down Meta's llama

abhiram kandiyana
July 28, 2023

Last week, Meta released LLaMA-2, the latest version of their open-access Large Language Model (LLM). This new model builds on the original LLaMA released earlier this year, with improved performance and capabilities.

In this post, we'll break down what's new in LLaMA-2, how it compares to other open-access LLMs, and what impact this model could have on the future of AI research and applications.

Difference Between LLaMA and LLaMA-2

The original LLaMA was groundbreaking as one of the first very large open-access models, but it did have some limitations. LLaMA-2 introduces several key improvements:

More training steps: While LLaMA-2 uses the same model architecture and size options as LLaMA, it was trained for significantly more steps. Meta increased the training time and budget to improve the model quality, especially for robustness, actuality, and helpfulness.
More training data: Meta increased the training dataset by 40%, including more high-quality textual data to improve factuality. However, they provided fewer specifics on the sources compared to LLaMA's transparent dataset.
Longer context: LLaMA-2 can now take advantage of up to 4,096 tokens of context, up from 2,048 in LLaMA. This allows it to reason across more information.
Preference learning: LLaMA-2 was trained using human preference ratings focused on making the model more helpful and safe. This refinement led to more controlled, useful responses.
Faster inference: Optimizations like sparse attention allow LLaMA-2 to run more efficiently, especially for the larger versions.
Commercial usage: The new open-access license permits commercial applications, unlike the restrictive LLaMA license.

Performance Against Other Open-Access LLMs

According to Meta's benchmarks, LLaMA-2 achieves state-of-the-art results compared to other open-access LLMs. Right now, It is the leading open-access LLM. It outperforms models like GPT-NeoX and Jurassic-1 across natural language tasks.

credits: latent-space

In particular, LLaMA-2 does very well on robustness, actuality, and helpfulness metrics that were the focus of its preference learning. This means it provides informative, truthful responses to many input prompts.

However, experts note that LLaMA-2 trails closed LLMs like GPT-3.5 in overall capabilities, especially for things like complex reasoning. And its code generation skillshind without specialized fine-tuning.

Nonetheless, LLaMA-2 establishes a new high watermark for open-access generative models in terms of size, data volume, and real-world usefulness.

Training Methodology and Data

They used a technique called cross-query attention to allow efficient training across huge datasets. This enabled scaling up to trillions of training tokens.
The training data included more "high factuality" content like Wikipedia to improve accuracy. Deduplication and filtering helped increase data quality.
Human preference ratings were collected iteratively to guide the training. Models were used to generate some of the ratings as well.
A 34 billion parameter version was trained but not released due to safety concerns from Meta's internal red team review.

Balancing Safety and Creativity

While Meta focused on training LLaMA-2 to give helpful, inoffensive responses, some examples indicate this may have hindered its creative capabilities:

When asked to provide all the animal emoji, LLaMA-2 replied that doing so would be "disrespectful". However, simply listing emoji seems harmless.
The transcript mentions LLaMA-2 refusing to explain how to kill a Linux process, stating it does not want to kill anything. But explaining a technical computing concept is benign.
Over-weighting safety preferences may block innocuous creative usages like storytelling or poetry generation.

Meta is understandably cautious about potential dangers from such a powerful model. But taken too far, an over-zealous safety focus could limit innovative applications in art, humor, fiction, and more.

Going forward, developers fine-tuning models like LLaMA-2 will need to carefully balance safety with enabling creativity. Achieving this balance will require thorough testing across diverse use cases.

Impact on the Future of AI

The release of LLaMA-2 is a pivotal moment for AI development in several regards:

Acceleration of research: With an advanced open-access model as a starting point, researchers can investigate techniques like retrieval augmentation and reinforcement learning much faster. This will lead to rapid advances.
The explosion of applications: Startups and developers can now build useful commercial applications on top of LLaMA-2 without API costs. We'll see creative new uses across many verticals.
More openness: Meta's move pressures closed companies to open-access their models. This shift to an open ecosystem benefits innovation and transparency.
On-device usage: With efficient inference, LLaMA-2 can enable offline AI on phones and laptops. This expands possibilities for mobile apps and edge computing.
Safety improvements: As methods to interpret, analyze and fine-tune LLMs spread, we can better understand how they work and reduce potential risks.

Llama-2 Chat

While LLaMA-2 demonstrates strong performance on standard language tasks, how does it fare for casual conversation?

Along with the LLAMA-2 models, Meta also released a fine-tuned version of LLAMA-2 that is optimized for dialogue use cases.

https://arxiv.org/abs/2307.09288

Experts who tested LLaMA-2 in a chatbot setting found its abilities somewhat limited:

Without fine-tuning, the multi-turn conversation quickly breaks down into repetitive or generic responses.
The relatively short context window of 4,096 tokens constrains its ability to maintain long, coherent dialog.
There are no clear signs that LLaMA-2 was specifically trained on dialog data for open-domain chitchat.

However, its reasoning capabilities provide a foundation for improving conversation skills:

Techniques like context distillation could compress lengthy dialog history into digestible prompts.
Further training on increasing amounts of conversational data can enhance its domain-general language fluency.
Reinforcement learning from human chatters could sharpen its ability to respond meaningfully.

LLaMA-2 may not match specialized chatbots like Claude-2 or Character.AI out of the box. But its strong core language proficiency offers a fertile starting point for honing more human-like discussion skills.

Potential for On-Device Applications

With the model available for commercial use, there is significant potential to run LLaMA-2 locally on devices for offline applications:

On smartphones, it could enable AI assistance without an internet connection. Users could do things like get answers, write poetry or brainstorm ideas through the model.
On laptops, developers could build custom tools leveraging LLaMA-2's capabilities in specialized domains like coding.
For edge computing, shrinking the model size through quantization will allow deployment on small devices for low-latency inference.

Community Fine-Tuning Efforts

Now that LLaMA-2 is open-access, the AI community is already starting to fine-tune it for different domains:

Organizations like HuggingFace are adding support for easy fine-tuning using frameworks like Transformers.
Startups will likely create fine-tuned versions tailored to verticals like medicine, law, finance, and more.
Researchers are eager to adapt LLaMA-2 for techniques like retrieval augmentation and reinforcement learning.
programmers can contribute their knowledge by annotating data to improve performance on specialized tasks.

With LLaMA-2 as a foundation, we can expect a Cambrian explosion of creative fine-tuning initiatives from individuals, startups, big tech firms, and the academic community.

Of course, risks like misinformation and job displacement(in a few industries) from automation remain with more advanced AI. But LLaMA-2's openness, performance, and license are undeniably positive overall for democratizing access to AI.

The full capabilities of LLaMA-2 have yet to be unlocked, but its release kicks off the next phase of AI's evolution. With models getting more powerful, data expanding, and research accelerating, I am hoping to see dramatic breakthroughs in the coming years.

Reply

or to participate.