1. **`collectStrings(x, out = [])` Function:**
* **Purpose:** This is a recursive helper function to traverse any given JavaScript value (`x`) – be it an object, array, or primitive – and collect *all* string values it finds into a single flat array.
* **Logic:**
* If `x` is `null` or `undefined`, it returns the current `out` array.
* If `x` is a `string`, it pushes it directly to `out`.
* If `x` is an `array`, it iterates over each element and recursively calls `collectStrings` on it.
* If `x` is an `object` (and not an array or null), it gets all the `Object.values()` (the values of its properties) and recursively calls `collectStrings` on each value.
* **Why it’s needed:** This provides a “catch-all” mechanism. If an AI model’s output structure is unexpected or doesn’t match the specific fields checked later, this function ensures that *any* string content present is gathered, so it can be used as a last resort.
2. **`extractTitleBody(raw)` Function:**
* **Purpose:** Takes a raw string (potentially containing the AI’s output) and attempts to parse it into distinct `title`, `title2`, and `body` parts.
* **Initial Cleaning:**
* Converts `raw` to a string (handling `null`/`undefined`).
* Removes common prefixes like “part 1 “, “parts 2 “, or “text: ” (case-insensitive).
* Trims leading/trailing whitespace.
* Returns empty fields if the string becomes empty after cleaning.
* **Regex-based Extraction (Preferred):**
* It uses regular expressions to find specific patterns for `Title:`, `Title 2:`, and `Content:` within the `s` string.
* `/(?:^|\n)\s*Title:\s*(.+?)\s*(?:\r?\n|$)/i`: Looks for “Title:” at the start of the string or after a newline, captures everything until the next newline or end of string, case-insensitive.
* `/(?:^|\n)\s*Title\s*2:\s*(.+?)\s*(?:\r?\n|$)/i`: Similar for “Title 2:”.
* `/(?:^|\n)\s*Content:\s*([\s\S]+)/i`: Looks for “Content:” and captures *everything* that follows (including newlines) until the end of the string.
* If a match is found, the captured group (e.g., `mTitle[1]`) is trimmed and assigned to the respective variable.
* **Fallback Logic (If no regex matches):**
* If `title`, `title2`, and `body` are all still empty after the regex attempts, it falls back to a line-by-line parsing:
* Splits the string into non-empty lines.
* The first line becomes `title`.
* The second line becomes `title2`.
* All remaining lines are joined back with newlines to form `body`.
* **Default Title:** If `title` is still empty after all attempts, it defaults to ‘Untitled’.
* **Returns:** An object `{ title, title2, body }`.
3. **Main Execution Block (`return items.map(item => { … });`)**
* This is the core logic that runs for each item coming into the Function node.
* **Collect All Strings:** `const strings = collectStrings(item.json);` This gathers every string from the entire `item.json` object, preparing the ultimate fallback.
* **Define Content Candidates:** It creates an array `candidates` listing various common paths where the main text content might be found, in order of preference:
1. `item.json?.text`
2. `item.json?.output_text`
3. `item.json?.parts?.[0]?.text` (common in some older AI outputs)
4. `item.json?.candidates?.[0]?.content?.parts?.map(p => p?.text || ”).join(‘\n’)` (specific to Vertex AI Gemini output format)
5. `strings.join(‘\n\n’)` (the fallback using all collected strings, separated by double newlines for readability).
* **Select Raw Content:**
* `.filter(Boolean)` removes any `null` or `undefined` entries from `candidates`.
* `const raw = candidates[0] || ”;` selects the *first* non-empty candidate as the primary string to be processed by `extractTitleBody`. If `candidates` is empty, `raw` becomes an empty string.
* **Extract Title/Body:** `const { title, title2, body } = extractTitleBody(raw);` calls the parsing function with the selected `raw` string.
* **Return Result:** It returns a new object for the n8n workflow, where the `json` property contains the extracted `title`, `title2`, and `body`. It also includes `content: body` for compatibility with other nodes that might expect a `content` field.
In summary, this script is a powerful tool for standardizing AI model outputs by intelligently parsing their textual content into structured fields, with multiple layers of robustness to handle varying output formats.