This n8n Function node script is designed to process incoming items (likely AI model outputs) and extract `title`, `title2`, and `body` (also mapped to `content`) fields from them, even if the input structure is complex or inconsistent.

Rate this post

1. **`collectStrings(x, out = [])` Function:**
* **Purpose:** This is a recursive helper function to find and collect *every* string value present anywhere within a given JavaScript value `x` (which could be an object, array, or string).
* **How it works:**
* If `x` is `null`, it does nothing.
* If `x` is a `string`, it adds it to the `out` array.
* If `x` is an `array`, it iterates over each element and recursively calls `collectStrings` on that element.
* If `x` is an `object`, it iterates over all its property values (`Object.values(x)`) and recursively calls `collectStrings` on each value.
* **Why it’s useful:** It acts as a comprehensive fallback. If the expected fields (`text`, `output_text`, etc.) are not found or are empty, this function ensures that any string content generated by an AI model (which might be deeply nested or in an unexpected location) is captured.
2. **`extractTitleBody(raw)` Function:**
* **Purpose:** This function takes a raw string and attempts to parse it into distinct `title`, `title2`, and `body` components.
* **Preprocessing:**
* It first ensures the input `raw` is a string and handles `null`/`undefined` gracefully.
* It removes common prefixes like “part 1”, “parts 2”, or “text : ” from the beginning of the string to clean it up.
* **Regex-based Extraction:**
* It uses regular expressions to look for specific patterns:
* `Title: (…)` to extract the main title.
* `Title 2: (…)` to extract a secondary title.
* `Content: (…)` to extract the main body/content.
* These patterns are case-insensitive and handle optional newlines before the markers.
* **Fallback Logic (if regex fails):**
* If none of the specific “Title:”, “Title 2:”, “Content:” markers are found, it resorts to a simpler heuristic:
* It splits the `raw` string into lines.
* The first non-empty line becomes the `title`.
* The second non-empty line becomes the `title2`.
* All subsequent lines are joined together to form the `body`.
* **Default Title:** If `title` is still empty after all attempts, it defaults to ‘Untitled’.
* **Returns:** An object `{ title, title2, body }`.
3. **Main n8n Function Node Logic:**
* **`return items.map(item => { … });`**: This is the standard structure for an n8n Function node, where it processes each `item` coming into the node.
* **Collect All Strings:** `const strings = collectStrings(item.json);` – It starts by using the `collectStrings` function to gather every string within the current item’s JSON data (`item.json`).
* **Candidate Selection:** It creates an array `candidates` of potential sources for the `raw` text, prioritizing common locations for AI model output:
1. `item.json?.text`
2. `item.json?.output_text`
3. `item.json?.parts?.[0]?.text` (common in some chat/text generation APIs)
4. `item.json?.candidates?.[0]?.content?.parts?.map(p => p?.text || ”).join(‘\n’)` (specifically designed for the structure of Vertex AI Gemini responses).
5. `strings.join(‘\n\n’)` (The ultimate fallback: all collected strings from `collectStrings` joined with double newlines).
* **Determine `raw` text:** `const raw = candidates[0] || ”;` It takes the *first non-empty* string from the `candidates` array as the `raw` input for extraction. If all candidates are empty, `raw` will be an empty string.
* **Extract Fields:** `const { title, title2, body } = extractTitleBody(raw);` It calls the `extractTitleBody` function with the determined `raw` text to get the structured data.
* **Output Transformation:**
* It returns a new item with a `json` property.
* This `json` property contains the extracted `title`, `title2`, `body`, and also maps `body` to `content` (which is a common practice for content management).
**In essence, this script provides a robust and flexible way to parse various AI model outputs (or any unstructured text) and transform them into a standardized format with `title`, `title2`, and `body`/`content` fields, handling different input structures and providing intelligent fallbacks.**

相关文章