Prompt Architecture for Personality Inference v2.1 — Methods Note

What Changed in v2.1

The v2.0 prompt architecture used a single-pass scoring approach where all five dimensions were scored in one LLM call. This created cross-contamination: scoring Neuroticism influenced the model's framing of Agreeableness.

v2.1 changes:

1. Dimension isolation — each Big Five dimension is scored in a separate LLM call with no prior dimension context 2. Chain-of-thought anchoring — the model is required to identify specific textual evidence before assigning a score 3. Confidence calibration — the model outputs a confidence score (0–1) alongside each dimension score; low-confidence outputs are flagged for human review

Performance Impact

On our internal benchmark (n = 200 scored responses), v2.1 achieved:

Mean absolute error reduction: 0.18 points (5-point scale)

Confidence calibration: Brier score = 0.09 (well-calibrated)

Processing time: +40% due to separate calls (acceptable for research use)

Prompt Template

The full prompt template is available in the project repository. Key structure:

``System: You are a psychometric scoring assistant... User: [DIMENSION]: {dimension_name} Anchor low (1): {low_anchor_example} Anchor high (5): {high_anchor_example} Text to score: {participant_text} Step 1: Identify evidence... Step 2: Assign score... Step 3: Confidence...``

Next Version

v2.2 will explore multi-turn dialogue for ambiguous responses.