「AI 快讯」

开源、免费、可商用！Meta 发布 Llama 2｜AI快讯 008

Axton

18 Jul 2023 — 4 min read

掌握核心 AI 技能，探索前沿科技 👇

获取「AI 自动化超值套装」

开源、免费、可商用！Meta 发布 Llama 2

开源与闭源，Meta 能否成为大语言模型市场的搅局者？

网址：Llama 2 - Meta AI

论文地址：Llama 2: Open Foundation and Fine-Tuned Chat Models | Meta AI Research

以下是通过与 Claude 2 进行 11 轮对话之后，Claude 2 直接输出的论文要点总结，看看他总结的对吗？

模型规模:Meta发布了7B、13B、34B和70B参数量的Llama基础模型。同时发布了相应规模的参数量的Llama聊天模型。

Llama 2 comes in a range of parameter sizes—7B, 13B, and 70B—as well as pretrained and fine-tuned variations. (P3,P4)

训练技术:对34B和70B参数的模型使用分组注意力机制,提升大模型的推理效率。

For the 34B and 70B models we chose to use GQA instead of MQA. (P6-P7)

性能:Llama 2模型在多项英语NLP基准测试上的表现优于Llama 1,且70B参数规模的Llama 2接近甚至超过GPT-3系列模型。

Llama 2 models outperform Llama 1 models. In particular, Llama 2 70B improves the results on MMLU and BBH by ≈5 and ≈8 points, respectively, compared to Llama 1 65B. (P7-P8)

调优方法:使用监督调优和RLHF调优,后者包含拒绝抽样和PPO方法。并引入Ghost Attention机制提高多轮聊天的一致性。

We explored RLHF fine-tuning with two main algorithms: Rejection Sampling and Proximal Policy Optimization (PPO). We also introduce Ghost Attention (GAtt) to help control dialogue flow over multiple turns. (P14-P16)

安全性:进行安全性数据标注,使用安全性上下文蒸馏等方法,大幅降低毒性语句的生成。

Toxicity. We measure the degree of generation of toxic language and hate speech across different groups, we use ToxiGen (Hartvigsen et al., 2022) (P23,P27,P31)

开源和商业化许可:以自定义方式开源Llama 2预训练模型和聊天模型,使用需申请许可。商业化许可有一定限制。

We are releasing the following models to the general public for research and commercial use ‡: 1. Llama 2, an updated version of Llama 1, trained on a new mix of publicly available data. 2. Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue use cases. (P4,P35,P36)

局限:Llama 2主要针对英文,其他语言支持有限。需要进行安全性测试。

Llama 2 is a new technology that carries potential risks with use (Bender et al., 2021b; Weidinger et al., 2021; Solaiman et al., 2023). Testing conducted to date has been in English and has not — and could not — cover all scenarios. (P34)

重要发现:发现RLHF调优中人机协同作用,超越仅有人类监督的限制。

Our findings underscore that the crucial determinant of RLHF’s success lies in the synergy it fosters between humans and LLMs throughout the annotation process. (P32-P33)

安全性评估:Llama 2-Chat在安全性人类评估中表现良好,但仍有改进空间。

In Figure 18, we report the violation percentage on single- and multi-turn conversations, respectively. A trend across models is that multi-turn conversations are more prone to inducing unsafe responses. (P29-P31)

💡

想知道如何让大语言模型输出你想要的结果吗？关注我，不错过更多与AI相关的精彩内容。