快速上手 – PyPi （pip install）-33台词网

快速上手 – PyPi （pip install）

本教程在配置为 A800（80GB） 的本地机器上运行 Yi-34B-Chat，并进行推理。

第 0 步：前提条件

确保安装了 Python 3.10 以上版本。
如果你想运行 Yi 系列模型，参阅「部署要求」。

第 1 步：准备环境

如需设置环境，安装所需要的软件包，运行下面的命令。

git clone https://github.com/01-ai/Yi.git
cd yi
pip install -r requirements.txt

第 2 步：下载模型

你可以从以下来源下载 Yi 模型。

第 3 步：进行推理

你可以使用 Yi Chat 模型或 Base 模型进行推理。

使用 Yi Chat 模型进行推理

创建一个名为的文件，并将以下内容复制到该文件中。quick_start.py

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = '<your-model-path>'

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)

# Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt content: "hi"
messages = [
    {"role": "user", "content": "hi"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)

# Model response: "Hello! How can I assist you today?"
print(response)

运行代码。quick_start.py
```
python quick_start.py
```
你将得到一个类似输出，如下所示。?
```
Hello! How can I assist you today?
```

使用 Yi Base 模型进行推理

步骤与「使用 Yi Chat 模型进行推理」类似。

你可以使用现有文件 text_generation.py进行推理。

python demo/text_generation.py  --model <your-model-path>

你将得到一个类似输出，如下所示。? ⬇️

[ 返回顶部 ⬆️ ]

快速上手 – Docker

? 教程：在本地 Docker 上运行 Yi-34B-Chat。⬇️

快速上手 – conda-lock

? 如需创建一个可以完全重现的 conda 环境锁定文件，你可以使用工具。⬇️conda-lock

快速上手 – llama.cpp

? 教程：在本地 llama.cpp 上运行 Yi-chat-6B-2bits。⬇️

[ 返回顶部 ⬆️ ]

快速上手 – 使用 Web demo

你可以使用 Yi Chat 模型（Yi-34B-Chat）创建 Web demo。注意：Yi Base 模型（Yi-34B）不支持该功能。

第一步：准备环境

第二步：下载模型

第三步：启动 Web demo 服务，运行以下命令。

python demo/web_demo.py -c <你的模型路径>

命令运行完毕后，你可以在浏览器中输入控制台提供的网址，来使用 Web demo 功能。

[ 返回顶部 ⬆️ ]

微调

bash finetune/scripts/run_sft_Yi_6b.sh

完成后，你可以使用以下命令，比较微调后的模型与 Base 模型。

bash finetune/scripts/run_eval.sh

你可以使用 Yi 6B 和 34B Base 模型的微调代码，根据你的自定义数据进行微调。⬇️

[ 返回顶部 ⬆️ ]

量化

GPT-Q 量化

python quantization/gptq/quant_autogptq.py \
  --model /base_model                      \
  --output_dir /quantized_model            \
  --trust_remote_code

如需评估生成的模型，你可以使用以下代码。

python quantization/gptq/eval_quantized_model.py \
  --model /quantized_model                       \
  --trust_remote_code

详细的量化过程。⬇️

AWQ 量化

python quantization/awq/quant_autoawq.py \
  --model /base_model                      \
  --output_dir /quantized_model            \
  --trust_remote_code

如需评估生成的模型，你可以使用以下代码。

python quantization/awq/eval_quantized_model.py \
  --model /quantized_model                       \
  --trust_remote_code

详细的量化过程。⬇️

[ 返回顶部 ⬆️ ]

部署

如果你想部署 Yi 模型，确保满足以下软件和硬件要求。

软件要求

在使用 Yi 量化模型之前，确保安装以下软件。

模型	软件
Yi 4-bits 量化模型	AWQ 和 CUDA
Yi 8-bits 量化模型	GPTQ 和 CUDA

硬件要求

部署 Yi 系列模型之前，确保硬件满足以下要求。

Chat 模型

模型	最低显存	推荐GPU示例
Yi-6B-聊天室	15 千兆字节	RTX 3090 RTX 4090 A10 A30
Yi-6B-聊天-4位	4 千兆字节	RTX 3060 RTX 4060
Yi-6B-Chat-8位	8 千兆字节	RTX 3070 RTX 4060
Yi-34B-聊天室	72 千兆字节	4 x RTX 4090 A800 （80GB）
Yi-34B-聊天-4位	20 千兆字节	RTX 3090 RTX 4090 A10 A30 A100 （40GB）
Yi-34B-聊天-8位	38 千兆字节	2 个 RTX 3090 2 个 RTX 4090 A800 （40GB）

以下是不同 batch 使用情况下的最低显存要求。

模型	批次=1	批次=4	批次=16	批次=32
Yi-6B-聊天室	12 千兆字节	13 千兆字节	15 千兆字节	18千兆字节
Yi-6B-聊天-4位	4 千兆字节	5千兆字节	7 千兆字节	10 千兆字节
Yi-6B-Chat-8位	7 千兆字节	8 千兆字节	10 千兆字节	14千兆字节
Yi-34B-聊天室	65 千兆字节	68千兆字节	76千兆字节	> 80 GB
Yi-34B-聊天-4位	19 千兆字节	20 千兆字节	30 千兆字节	40 千兆字节
Yi-34B-聊天-8位	35 千兆字节	37 千兆字节	46千兆字节	58 千兆字节

Base 模型

模型	最低显存	推荐GPU示例
易-6B	15 千兆字节	RTX3090 RTX4090 A10 A30
易-6B-200K	50 千兆字节	A800 （80 GB）
易-34B	72 千兆字节	4 x RTX 4090 A800 （80 GB）
一-34B-200K	200 千兆字节	4 个 A800 （80 GB）

快速上手 - PyPi （pip install）