「手把手教程：部署Dify并集成Ollama与Xinference实现本地AI应用」

‌系统要求‌
- 确保机器满足最低配置（推荐4核CPU/8GB内存/20GB存储）‌3；
- 安装 ‌Docker‌ 和 ‌Docker Compose‌（用于容器化部署）‌34；
- 安装 ‌Git‌（用于拉取代码库）‌3；
‌依赖软件‌
- Python 3.x 和 Node.js（部分功能需要开发环境支持）‌3；
- NVIDIA显卡驱动（若需GPU加速，显存需≥6GB）‌4；

‌克隆代码库‌

bashCopy Code
git clone https://github.com/langgenius/dify.git cd dify/docker

‌启动Dify服务‌

bashCopy Code
docker-compose up -d # 后台启动所有容器‌:ml-citation{ref="1,3" data="citationList"}

‌启动本地模型服务‌

bashCopy Code
ollama run qwen # 运行中文模型如qwen‌:ml-citation{ref="4" data="citationList"}

‌创建虚拟环境‌

bashCopy Code
conda create --name xinference python=3.10
conda activate xinference

‌安装依赖包‌

bashCopy Code
pip install "xinference[all]" -i https://pypi.tuna.tsinghua.edu.cn/simple # 全功能安装‌:ml-citation{ref="6" data="citationList"}

‌配置环境变量‌

bashCopy Code
export XINFERENCE_HOME=/自定义存储目录 export XINFERENCE_MODEL_SRC=modelscope # 指定模型源‌:ml-citation{ref="6" data="citationList"}

‌启动Xinference服务‌

bashCopy Code
xinference-local -H 0.0.0.0 # 暴露服务端口‌:ml-citation{ref="6" data="citationList"}

‌注册模型‌

bashCopy Code
xinference launch --model-name chatglm3 --size-in-billions 6 # 部署中文模型‌:ml-citation{ref="6" data="citationList"}

‌添加Ollama模型‌
- 在Dify控制台的 ‌模型供应商‌ 中选择 ‌Ollama‌，输入服务地址（如 http://localhost:11434）‌14；
- 验证连接性：通过内置测试工具检查接口可用性‌1；
‌集成Xinference模型‌
- 在Dify的 ‌模型供应商‌ 中选择 ‌自定义API‌，输入Xinference的API地址（如 http://localhost:6006）‌67；
- 输入模型名称及API密钥（若需鉴权）‌6；
‌配置知识库与Agent工具‌
- 在Dify中创建RAG Pipeline，上传文档（PDF/PPT等格式），并关联Xinference的嵌入模型（如 bge-large-zh）‌18；
- 定义Agent工具，调用Ollama和Xinference的模型能力（如文本生成、图像生成等）‌8；