在 Dify 中创建 video-craft-agent Workflow

用 Dify Workflow 搭建 video-craft-agent V0.1：从 User Input 接收商品信息，经 LLM 生成短视频脚本 JSON，再用 Code 节点校验并通过 Output 返回 script_json。

#tech / ai #type / howto #status / growing #resource / dify

[!info] related notes

前置笔记: Dify, Dify AI 应用类型, Dify Workflow 起始节点

相关 MOC: Dify MOC

相关资源: AI 功能模块设计, 视频素材理解管线, FFmpeg

在 Dify 中创建 video-craft-agent Workflow

目标

创建一个 video-craft-agent V0.1 Workflow：

User Input
  -> LLM
  -> Code
  -> Output

它的目标不是直接合成视频，而是生成稳定的短视频脚本 JSON，交给 FastAPI 后端继续做素材匹配、字幕生成和 FFmpeg 合成。

虽然项目名叫 video-craft-agent，但 V0.1 的路径是固定任务链，所以在 Dify 中应选择 Workflow，而不是 Agent 应用类型。

前置条件

已有可用的 Dify 工作区。
已配置至少一个可用模型。
已明确 V0.1 只输出脚本 JSON，不直接调用 FFmpeg。
后端服务能够调用 Dify 发布后的 Workflow API。

步骤

1. 创建 Workflow 应用

在 Dify Studio 中创建新应用：

应用类型：Workflow
应用名：video-craft-agent
起始节点：User Input

不要选 Trigger。V0.1 是由前端或 FastAPI 主动提交参数运行，不是定时任务或外部事件自动唤醒。

2. 配置 User Input 变量

在 User Input 节点中定义这些输入字段：

变量名	类型建议	说明
`product_name`	Short Text	商品或课程名称
`target_audience`	Paragraph	目标人群
`selling_points`	Paragraph	核心卖点，可用逗号或换行分隔
`style`	Select / Short Text	视频风格，例如干货、种草、口播、剧情
`platform`	Select / Short Text	发布平台，例如抖音、快手、小红书、TikTok
`duration_seconds`	Number	视频总时长，例如 15、30、60

可选字段：

aspect_ratio：默认 9:16。
language：默认 zh-CN。
cta：结尾行动号召。

3. 添加 LLM 节点生成初稿 JSON

LLM 节点的职责是根据输入变量生成脚本初稿。Prompt 的重点是让模型只输出 JSON，不输出 Markdown 解释。

示例 Prompt：

你是短视频广告脚本策划。

请根据输入信息生成一个短视频脚本 JSON。
要求：
1. 只输出 JSON，不要输出 Markdown 代码块。
2. JSON 必须包含 title、aspect_ratio、duration_seconds、scenes。
3. scenes 是数组，每个元素包含 index、duration_seconds、subtitle、voiceover、visual_keywords、source_hint。
4. 每个 scene 的 subtitle 要适合直接作为字幕。
5. visual_keywords 用英文关键词，方便后续匹配素材。
6. 所有 scene 的 duration_seconds 总和应接近输入的 duration_seconds。

输入：
- product_name: {{product_name}}
- target_audience: {{target_audience}}
- selling_points: {{selling_points}}
- style: {{style}}
- platform: {{platform}}
- duration_seconds: {{duration_seconds}}

输出 JSON 示例：
{
  "title": "AI 编程课 15 秒推广视频",
  "aspect_ratio": "9:16",
  "duration_seconds": 15,
  "scenes": [
    {
      "index": 1,
      "duration_seconds": 5,
      "subtitle": "还在只会写 CRUD？",
      "voiceover": "AI Agent 时代已经来了。",
      "visual_keywords": ["coding", "programmer", "ai"],
      "source_hint": "uploaded_or_library"
    }
  ]
}

在真实配置里，优先用 Dify 的变量选择器插入变量，避免手写占位符写错。

4. 添加 Code 节点清洗和校验 JSON

Code 节点的职责是把 LLM 输出收敛成后端能稳定消费的结构。

校验目标：

能解析成 JSON。
必须有 title、duration_seconds、scenes。
scenes 必须是非空数组。
每个 scene 必须有 index、duration_seconds、subtitle、voiceover、visual_keywords。
最终输出变量名固定为 script_json。

示例校验逻辑：

import json
import re

def main(llm_text: str, duration_seconds: int = 15) -> dict:
    raw = llm_text.strip()
    raw = re.sub(r"^```(?:json)?", "", raw).strip()
    raw = re.sub(r"```$", "", raw).strip()

    data = json.loads(raw)

    if not isinstance(data, dict):
        raise ValueError("script must be a JSON object")

    data.setdefault("aspect_ratio", "9:16")
    data["duration_seconds"] = int(data.get("duration_seconds") or duration_seconds)

    scenes = data.get("scenes")
    if not isinstance(scenes, list) or not scenes:
        raise ValueError("scenes must be a non-empty list")

    for i, scene in enumerate(scenes, start=1):
        scene["index"] = int(scene.get("index") or i)
        scene["duration_seconds"] = int(scene.get("duration_seconds") or 3)
        scene["subtitle"] = str(scene.get("subtitle") or "").strip()
        scene["voiceover"] = str(scene.get("voiceover") or "").strip()
        keywords = scene.get("visual_keywords") or []
        if isinstance(keywords, str):
            keywords = [item.strip() for item in keywords.split(",") if item.strip()]
        scene["visual_keywords"] = keywords
        scene.setdefault("source_hint", "uploaded_or_library")

        if not scene["subtitle"] or not scene["voiceover"]:
            raise ValueError(f"scene {i} missing subtitle or voiceover")

    return {
        "script_json": json.dumps(data, ensure_ascii=False)
    }

实际入参名要按 Dify Code 节点中配置的变量名调整。

5. 添加 Output 节点返回 `script_json`

Output 节点中添加输出变量：

输出名：script_json
来源：Code 节点的 script_json

如果没有 Output 节点，Workflow 可能执行成功，但后端 API 调用方拿不到脚本结果。

6. 让 FastAPI 接入 Dify 输出

FastAPI 后端只负责两件事：

把前端表单字段转发给 Dify Workflow。
解析返回的 script_json，进入本地视频合成流程。

后续本地流程可以继续拆成：

script_json
  -> 素材选择
  -> 字幕文件生成
  -> 配音或旁白音频
  -> FFmpeg 合成
  -> 输出 mp4

Dify V0.1 不要承担 FFmpeg 编排，避免把后端工程逻辑塞进画布。

验证

用一组最小测试输入运行 Workflow：

product_name: AI 编程课
target_audience: 有前端基础、想转 AI 应用开发的程序员
selling_points: 项目实战、Agent 工作流、可落地作品
style: 干货口播
platform: 抖音
duration_seconds: 15

验证点：

Workflow 能从 User Input 启动。
LLM 节点输出接近 JSON，而不是解释性文字。
Code 节点能解析并返回 script_json。
Output 中能看到 script_json。
FastAPI 能把 script_json 当 JSON 再解析一次。
scenes 数组中每个片段都有字幕、旁白和视觉关键词。

常见问题

LLM 输出 Markdown 代码块

Prompt 中强调“只输出 JSON”，Code 节点中也要剥离 ```json 代码块。

输出字段缺失

不要只靠 prompt 约束。Code 节点必须做必填字段校验，否则 FFmpeg 合成阶段会在更晚的位置失败。

duration 不匹配

V0.1 可以先允许轻微误差。后续如果要严格卡点，可以在 Code 节点中按总时长重新分配每个 scene 的时长。

API 调用没有返回值

检查 Workflow 是否配置了 Output 节点，并且 Output 中是否选择了 Code 节点输出的 script_json。

误选了 Trigger

Trigger 适合自动化事件驱动。V0.1 是用户或后端主动提交输入，应回到 User Input。

在 Dify 中创建 video-craft-agent Workflow

目标

前置条件

步骤

1. 创建 Workflow 应用

2. 配置 User Input 变量

3. 添加 LLM 节点生成初稿 JSON

4. 添加 Code 节点清洗和校验 JSON

5. 添加 Output 节点返回 script_json

6. 让 FastAPI 接入 Dify 输出

验证

常见问题

LLM 输出 Markdown 代码块

输出字段缺失

duration 不匹配

API 调用没有返回值

误选了 Trigger

参考来源

5. 添加 Output 节点返回 `script_json`