This is a well-structured and robust n8n Function node script. It effectively addresses the common challenge of extracting structured data (title, body) from potentially varied and unstructured AI outputs.

Rate this post

### 1. `collectStrings` Function
* **Purpose**: Recursively collects all string values from a nested JavaScript object or array.
* **Correctness**:
* Handles `null` and `undefined` inputs gracefully.
* Correctly identifies and pushes strings.
* Recurses into arrays using `forEach`.
* Recurses into objects using `Object.values().forEach()`, which is appropriate for collecting *all* values regardless of key.
* **Edge Cases**: `null`, empty arrays, empty objects are handled correctly. Non-string primitives (numbers, booleans) are naturally ignored, which is the desired behavior for “collectStrings”.
* **Readability**: Clear, concise, and easy to understand.
* **Efficiency**: For typical JSON structures found in n8n, recursion depth and performance should not be an issue.
### 2. `extractTitleBody` Function
* **Purpose**: Parses a raw string to extract `title`, `title2`, and `body` based on specific patterns or fallback heuristics.
* **Robust Input Handling**: `String(raw || ”)` is excellent for ensuring `raw` is always a string and defaults to empty if `null`/`undefined`.
* **Initial Cleaning**:
* `.replace(/^parts?\s*\d+\s*/i, ”)`: Clears common prefixes like “part 1”, “parts 2″. Very useful for AI outputs.
* `.replace(/^\s*text\s*:\s*/i, ”)`: Clears a “text:” prefix.
* `.trim()`: Essential for clean parsing.
* **Early Exit**: `if (!s) return { title: ”, title2: ”, body: ” };` is good for empty inputs.
* **Regex Extraction**:
* Uses distinct regexes for `Title:`, `Title 2:`, and `Content:`.
* `(.+?)` for titles ensures non-greedy matching.
* `([\s\S]+)` for body is a common pattern to match any character including newlines.
* `(?:^|\n)` and `(?:\r?\n|$)` anchors ensure these patterns are matched at the beginning/end of lines, treating them as distinct sections.
* **Strength**: This approach allows the different sections to be present in any order or for some to be missing, as each regex is run against the original cleaned string `s`.
* **Fallback Logic**:
* `if (!title && !title2 && !body)`: This is a critical and excellent fallback. If no structured `Title:`, `Title 2:`, or `Content:` is found, it falls back to a simple line-by-line parsing.
* `s.split(/\r?\n/).filter(l => l.trim() !== ”)`: Effectively gets non-empty lines.
* `lines.shift().trim()`: Assigns the first line to `title`, second to `title2`.
* `lines.join(‘\n’).trim()`: Assigns the rest to `body`.
* **Strength**: This makes the function very resilient, handling both highly structured and very simple text inputs.
* **Default Title**: `if (!title) title = ‘Untitled’;` ensures that a title is always provided, which is good for consistency.
* **Readability**: The two-stage parsing (regex first, then fallback) is clear and logical.
### 3. Main N8n Logic (`items.map` block)
* **Comprehensive String Collection**: `collectStrings(item.json)` is a smart way to ensure no text is missed, especially from complex or unknown AI output formats.
* **Prioritized Candidate Fields**:
* The `candidates` array correctly lists common places to find the primary text (e.g., `text`, `output_text`, `parts[0].text`).
* It specifically targets Vertex AI/Gemini style output (`item.json?.candidates?.[0]?.content?.parts?.map(p => p?.text || ”).join(‘\n’)`), which is a great addition for modern AI integrations.
* The final fallback to `strings.join(‘\n\n’)` ensures *some* content is always captured if structured fields fail.
* `.filter(Boolean)` effectively removes any `null`, `undefined`, or empty string candidates, so `candidates[0]` will always be the highest-priority *non-empty* string.
* **Output Structure**:
* Returning `{ json: { title, title2, body, content: body } }` provides both `body` and `content` fields, which is a good practice for flexibility in downstream nodes that might expect either name.
### Overall Strengths
* **Robustness**: Handles a wide variety of input structures and gracefully falls back when specific patterns aren’t found.
* **Flexibility**: Caters to different AI output formats (e.g., general text, Gemini/Vertex specific).
* **Clarity**: Code is well-organized into functions with clear responsibilities. Variable names are descriptive.
* **Maintainability**: Easy to understand and modify if new input formats or parsing rules are needed.
* **N8n Integration**: Perfectly suited for an n8n Function node, transforming raw AI output into a usable structured format.
### Minor Considerations (Not Necessarily Improvements, but Points to Note)
* **Regex Specificity**: While `([\s\S]+)` for `Content:` is robust, if there were a scenario where `Content:` could be followed by `Title:` or `Title 2:` *which should still be extracted as separate fields*, a more complex regex for `Content:` (e.g., `([\s\S]+?)(?=\n\s*(?:Title:|Title 2:)|$)`) would be needed. However, given that the current regexes are all run independently against the *original* string `s`, this is likely not an issue here. The current approach is simpler and likely sufficient for the intended use case where `Title`, `Title 2`, and `Content` are expected to be distinct top-level sections.
* **Performance for Extremely Large Inputs**: For `collectStrings`, an extremely deeply nested object could theoretically hit recursion limits (though JavaScript engines are quite good). For typical n8n usage, this is highly unlikely to be a concern. The multiple `match` calls in `extractTitleBody` are also fine for typical text lengths.
### Conclusion
This is an excellent piece of code. It demonstrates strong programming practices, anticipates various input scenarios, and provides a highly effective solution for a common problem in AI automation workflows. No significant changes are recommended.

相关文章