Author: ivring

Large-scale language models (LLMs) have a large number of data sources that provide different forms of answers to questions asked by users, but they are limited to "text." Although the text content is clear, there are limitations in text expression in scenarios that contain complex logic or need to be presented outward. As you can imagine, converting "text" into "visualization" analytical models and even UI interfaces will have a better effect. This article will summarize the exploration and implementation ideas for this scenario.

Effect display

AI visual analytics models combine the capabilities of LLM to generate interactive, intuitive, and common models for interaction designers, visual designers, and product designers based on user needs, such as user journey maps, user portraits, and so on. At the same time, it also meets the common analysis models such as business canvas and SWOT analysis required by product managers. The effect is as follows:

AI-oriented programming: Explore visual analysis models

Functional analysis and disassembly

Implementation ideas

From the user's point of view, this requirement consists of the following stages:

As you can see from the above flow, the whole process requires two conversations with LLM:

The first call for template filtering is very simple for LLM, just write a simple prompt.
The second call is to produce template data, which is the most critical step. So how is this template data produced?

Ideal AI

The ideal AI is, of course, to write prompts, directly output the design draft data through LLM, and then create the design draft in the graphics editor through the graphical data parser. According to this idea, it can be done in the following steps:

Learn about design draft data formats
Let LLM learn from the design data
Write a prompt for each model and let LLM produce the design draft data
Parse the design document data through the graphical data parser and insert it into the graphical editor

Learn about design draft data formats

Select Figma as the graphical editor. So understand the data structure of the Figma design draft. We can get the source data of Figma design drafts from this website Figma API Live. You can see that the data of a design draft is very complex. Contains: hierarchical relationships, coordinates, matrix, padding, text, borders, etc. Such a simple artboard design has a data of 6.8k. Because LLM has a maximum return token limit. So this idea proved unworkable from the first step.

AI in reality

The picture above is a comparison from the design draft of the user portrait model to the final product. It can be seen that the basic outline of the design draft will not change during the whole process, only the creation of sticky notes and the change of text in specific blocks. Going back to the essence of this requirement, it is actually the replacement of LLM output text to the design Chinese.

Final idea

Only the structured text output by LLM needs to be utilized, and the text of the design document is replaced. According to this idea, it can be done in the following steps:

Design draft data for a single model
Defines the data structure of the LLM text output for the model
Combine LLM text data structures and design draft data to produce design draft data
Write a parser, parse design document data, insert it into a graphical editor

Prompt word development

As mentioned above, there will be two conversations with LLM throughout the requirement, and LLM-related conversations Prompt debugging is an essential part. In the AIGC era, becoming a Prompt engineer seemed like an inevitable fate.

Model recommendations

The model recommendation is relatively simple, and the Prompt is as follows:

我想让你做一个模版推荐，我会把你精通的模版都告诉你，请你根据我输入的问题给我推荐适合的模版。你现在精通：用户旅程地图、用户画像、精益画布、用户故事、SWOT分析、干系人地图。推荐模板要求：1.如果问题内含有模板相关的词汇，请优先推荐对应模板。2.如果没有适合的模板，请回复：暂无适合的模板。3.推荐模板最多5个，最少1个，按推荐优先级排序。注意：只要推荐模板名称，不要回答问题，也不要对模板进行解释。

You only need to use regular extraction for the text output by LLM.

export const getRecommends = (txt) => {
  return txt.match(/(用户旅程地图|用户画像|精益画布|用户故事|SWOT分析|干系人地图)/g);
};

The structured text of the model

The above is based on the conversation between the product students and LLM when they are prompting them to conduct demand research, you can see that LLM can generate structured data, so how to parse these data?

Try letting LLM output JSON data directly

As you can see from the image above, just tell LLM a fixed JSON output format. Then the defined JSON data format can be generated in the dialog, and the JSON only needs to be extracted through regular extraction.

export const parseGptJson = (txt) => {
  const data = txt.match(/'([^']*)'/g); // 提取json字符串片段
  const res = JSON.parse(txt);
}

However, the following problems were also found during the development process:

Although structured data can be generated, the overall generated result content is still somewhat "general", the quality of the data is not high, and the value for user reference is not great.
The data output by LLM is not stable enough, and the error rate by adding JSON.parse to regular extraction is high.
LLM takes a long time to output complete data and requires a long wait time, which can cause user waiting anxiety.

Prompt word enhancement

During the exploration, the prompt words were tweaked several times, and it was never possible to produce stable, high-quality content. Prompt words for Lean Canvas:

你现在是一个创业专家，能够熟练的使用精益画布，精益画布分成：1.问题&用户痛点。2.用户细分。3.独特卖点。4.解决方案。5.渠道。6.关键指标。7.竞争壁垒。8.成本结构。9.收入分析。现在有用户向你询问如何进行创业，你需要用精益画布的方式分点（不少于4个点）给出解答，解答内容务必非常详细。固定回答格式如下：xxx

In order to improve stability, the following measures have been taken:

Various qualifiers such as as detailed as possible, no less than 4 per point, and so on.
Add sample data: From the principle of LLM, its pattern is to generate answers through inference. Then we can directly tell it your ideal data example and improve its reasoning efficiency.

Finally, the entire prompt consists of the following parts:

Design draft disassembly

Minimum master for design drafts

Taking the user journey map as an example, disassembling the design draft and reducing the content results in a minimal design master. This minimal template is combined with the data output by LLM to output the final design draft. The entire model consists of the following categories:

Header of column (fixed content)
Journey modules: Journey 1, Journey 2, Journey 3 (Form)
Name, user goals and expectations

Predefined data for the model

Combined with the above, we can summarize that a model needs to do the following data preparation:

Design draft data for the smallest master is available through Figma API Live
The schema.json associated with the model is customized to the characteristics of each model
Model prompt words are determined by different model characteristics
Sample data is used to enhance the prompt words

Design draft data assembly

Output data structure definition

By inductively analyzing each model, it can be concluded that there are a large number of commonalities in the model design draft, which can be divided into the following three modules:

Fixed output module

Such as headers, headers, dividing lines. Used to define modules that the user will not modify, such modules do not require text replacement.

Tabular modules

For example, for each stage in the user journey map, this kind of table needs to be described by schema.json as the data.

"ths": ["阶段一","阶段二"],
"data": {
  "阶段一": {
    "栏目一": ["内容", "内容"],
    "栏目二": ["内容", "内容"]
  },
  "阶段二": {
    "栏目一": ["内容", "内容"],
    "栏目二": ["内容", "内容"]
  }
}

Chunked modules

As in SWOTA analysis, each block of the model is fixed, and only the processing of text content blocks is required. Then its schema.json is as follows:

"data": {
  "优势": ["xx","xx"],
  "劣势": ["xx","xx"],
  "机会": ["xx","xx"],
  "威胁": ["xx","xx"],
  "通过优势利用机会的策略": ["xx","xx"],
  "利用优势防止威胁的策略": ["xx","xx"],
  "通过机会最小化弱点的策略": ["xx","xx"],
  "将其潜在威胁最小化的策略": ["xx","xx"]
}

Design draft data production

The next step is to produce the data of the design draft for each model, and the above figure can be a list of artboards in the design draft, and the index relationship with the data will be established according to this artboard structure.

Take the user journey map as an example, with schema.json as follows

"schema": {
    "user": {
      "用户信息": "x",
      "用户目标与期望": "x"
    },
    "ths": ["旅程一"],
    "data": {
      "旅程一": {
        "用户行为": ["1"],
        "服务触点": ["1"],
        "用户预期": ["1"],
        "情绪曲线": "中",
        "用户痛点": ["1"],
        "机会点": ["1"]
      }
    }
  },

To produce a design based on schema.json, you need to do the following:

The design document establishes the index relationship between the layer name and the key in scheme.json
The fixed module can output the corresponding data directly from the design document master data
Name and user goals are expected to find the index and replace the text
Create a corresponding journey module based on the journey. Using journey one in the master as a baseline, after copying, the position is offset and the width of the outermost layer is calculated.
Each column is based on the amount of text returned, such as 4 texts in the user behavior in journey one. Four sticky notes are created. And handle the position relationship of each note.

The assembler needs to write assembly logic for each template. But the logic is mostly generic, such as adding templates in the future, and the development cost here is very low.

Figma data parsing

The figure above is the flow chart of the analysis of design draft data to Figma, and the core process is as follows:

Enter the design data
Node tree depth priority traversal
The node type determines and creates the node
Node property settings: position, size, padding, border, etc

The graphics creation capabilities provided by Figma can be found in https://www.figma.com/plugin-docs/api/api-reference/ documentation.

Address user wait anxiety

Different from the traditional webAPI, the response time of the complete data of the LLM interface is determined by the amount of data, this application will output a large amount of text, and the model takes 40-60 seconds to complete all data responses, which will cause user waiting time anxiety.

Conversational interaction

The original design was to display text to the user, which is the current mainstream LLM conversational application interactive form. To achieve this, only the text output by the LLM stream needs to be displayed. However, the drawbacks of this approach are also obvious, there is no substantial graphical rendering in the canvas, and the user cannot interact through dialogue. Therefore, using text as the primary user interaction experience is incomplete and less than ideal, and is not an optimal solution.

Progressive rendering

Can it be like a typewriter effect, while generating a render content during streaming data transfer?

The difficulty is that during the assembly and rendering process, we get standardized data structures and insert them into the canvas at one time. The data returned during streaming data transfer is just a fragment of the overall final structured data. It looks like this:

// 最终的json数据
const data = '你好，以下是头脑风暴/n {"data":{"用户获取":["1","2","3","4","5"],"用户活跃":["1","2","3","4","5"],"用户留存":["1","2","3","4","5"],"获得收益":["1","2","3","4","5"],"推荐传播":["1","2","3","4","5"]}}'

// 流式传输过程中数据示例1
const process1 = '你好，以下是头脑风暴/n {"data":{"用户获取":["1","2","3","4","5"],"用户活跃":["1","2","3","4","5"],"用户留存":["1","2","3","4","5"],"获得收益":["1","2","3","4'

// 流式传输过程中数据示例2
const process2 = '你好，以下是头脑风暴/n {"data":{"用户获取":["1","2","3","4","5"],"用户活跃":["1","2","3","4","5"],"用户留存":["1","2","3","4","5"],"获得收益":[,'

As shown in the data example above, during streaming, the data from process1 and process needs to be converted into the following standardized JSON data:

// 过程中数据示例1
const process1Filling = '{"data":{"用户获取":["1","2","3","4","5"],"用户活跃":["1","2","3","4","5"],"用户留存":["1","2","3","4","5"],"获得收益":["1","2","3","4"]}}'

const process2Filling = '{"data":{"用户获取":["1","2","3","4","5"],"用户活跃":["1","2","3","4","5"],"用户留存":["1","2","3","4","5"]}}'

As mentioned earlier, the method of adding JSON.parse through regular extraction has a high error rate. To achieve the above extraction and completion, we need to extract and complete the content returned by LLM into standard JSON data to achieve controllable JSON data extraction. Then write a timer during the streaming output process, and go through the design assembly + rendering process every once in a while.

As shown in the following figure, the entire rendering process is simplified to five frames:

JSON extraction and completion algorithms

The figure above shows the running flow of the entire algorithm, the core of which is to implement a finite state automaton to generate a standardized JSON data format by parsing strings one by one and assembling and supplementing them.

Example of an analytic model

summary

In this article, we summarize in detail the functional disassembly and implementation ideas required to implement an AI visual analysis model. On this basis, the problems encountered when generating structured data with LLM and their corresponding solutions are also shared. We believe that these lessons learned can provide valuable help to our peers working in this field to better understand and master these technologies. At the same time, these experiences can also provide useful references for subsequent research work. We hope that these summaries will provide readers with clear and detailed guidance and practical ideas.