集成LoRA#

目前，Xinference 可以在啟動 LLM 和 image 模型時連帶一個 LoRA 微調模型，用以輔助基礎模型。

使用方式#

啟動#

不同於內建模型，Xinference 目前不會涉及管理 LoRA 模型。使用者需要先下載對應的 LoRA 模型，然後將模型儲存路徑提供給 Xinference。

xinference launch <options>
--lora-modules <lora_name1> <lora_model_path1>
--lora-modules <lora_name2> <lora_model_path2>
--image-lora-load-kwargs <load_params1> <load_value1>
--image-lora-load-kwargs <load_params2> <load_value2>
--image-lora-fuse-kwargs <fuse_params1> <fuse_value1>
--image-lora-fuse-kwargs <fuse_params2> <fuse_value2>

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")

lora_model1={'lora_name': <lora_name1>, 'local_path': <lora_model_path1>}
lora_model2={'lora_name': <lora_name2>, 'local_path': <lora_model_path2>}
lora_models=[lora_model1, lora_model2]
image_lora_load_kwargs={'<load_params1>': <load_value1>, '<load_params2>': <load_value2>},
image_lora_fuse_kwargs={'<fuse_params1>': <fuse_value1>, '<fuse_params2>': <fuse_value2>}

peft_model_config = {
"image_lora_load_kwargs": image_lora_load_params,
"image_lora_fuse_kwargs": image_lora_fuse_params,
"lora_list": lora_models
}

client.launch_model(
    <other_options>,
    peft_model_config=peft_model_config
)

應用#

對於大型語言模型，使用時指定其中一個 lora。具體來說，在 generate_config 參數中配置 lora_name 參數。lora_name 對應 launch 過程中你的配置。

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<model_uid>")
model.chat(
    messages=[{"role": "user", "content": "<prompt>"}],
    generate_config={"lora_name": "<your_lora_name>"}
)

注意事項#

上述 image_lora_load_kwargs 和 image_lora_fuse_kwargs 選項僅適用於 image 模型。它們對應於 diffusers 庫的 load_lora_weights 和 fuse_lora 介面中的額外參數。如果啟動的是 LLM 模型，則無需設置這些選項。
You need to add the parameter lora_name during inference to specify the corresponding lora model. You can specify it in the Additional Inputs option.
對於 LLM 聊天模型，目前僅支援那些微調後不改變原始基礎模型提示詞模板的 LoRA 模型。
使用 GPU 時，LoRA 模型與其基礎模型在同樣的設備上，不會對其他模型造成影響。