Multi-Model Essay Grading

AI ↔ AI

Three LLMs grade in parallel, scores are aggregated, bias is checked, final grade is issued.

7 nodes · 9 edgeseducation
agentsystem
Visual
Essay Intakesystem

Receive submitted essay with rubric, anonymize student identity.

parallelGrader A (GPT-4o)
parallelGrader B (Claude)
parallelGrader C (Gemini)
Grader A (GPT-4o)agent

Score essay on structure, argument quality, evidence use, and writing clarity.

parallelScore Aggregation
Grader B (Claude)agent

Score essay independently using identical rubric and blind to other graders.

parallelScore Aggregation
Grader C (Gemini)agent

Score essay independently as third grader for robust consensus.

parallelScore Aggregation
Score Aggregationsystem

Compute weighted average, flag if any grader deviates more than 15% from mean.

sequentialBias Detection Agent
Bias Detection Agentagent

Analyze score patterns for demographic bias, topic bias, or length bias.

conditionalFinal Grade & Feedback
fallbackGrader A (GPT-4o)
Final Grade & Feedbackagent

Issue final grade with synthesized feedback from all graders and improvement suggestions.

uc-multi-model-grading.osop.yaml
osop_version: "1.0"
id: "multi-model-grading"
name: "Multi-Model Essay Grading"
description: "Three LLMs grade in parallel, scores are aggregated, bias is checked, final grade is issued."

nodes:
  - id: "essay_intake"
    type: "system"
    name: "Essay Intake"
    description: "Receive submitted essay with rubric, anonymize student identity."

  - id: "grader_1"
    type: "agent"
    subtype: "llm"
    name: "Grader A (GPT-4o)"
    description: "Score essay on structure, argument quality, evidence use, and writing clarity."

  - id: "grader_2"
    type: "agent"
    subtype: "llm"
    name: "Grader B (Claude)"
    description: "Score essay independently using identical rubric and blind to other graders."

  - id: "grader_3"
    type: "agent"
    subtype: "llm"
    name: "Grader C (Gemini)"
    description: "Score essay independently as third grader for robust consensus."

  - id: "aggregate"
    type: "system"
    name: "Score Aggregation"
    description: "Compute weighted average, flag if any grader deviates more than 15% from mean."

  - id: "bias_check"
    type: "agent"
    subtype: "llm"
    name: "Bias Detection Agent"
    description: "Analyze score patterns for demographic bias, topic bias, or length bias."

  - id: "final_grade"
    type: "agent"
    subtype: "llm"
    name: "Final Grade & Feedback"
    description: "Issue final grade with synthesized feedback from all graders and improvement suggestions."

edges:
  - from: "essay_intake"
    to: "grader_1"
    mode: "parallel"
  - from: "essay_intake"
    to: "grader_2"
    mode: "parallel"
  - from: "essay_intake"
    to: "grader_3"
    mode: "parallel"
  - from: "grader_1"
    to: "aggregate"
    mode: "parallel"
  - from: "grader_2"
    to: "aggregate"
    mode: "parallel"
  - from: "grader_3"
    to: "aggregate"
    mode: "parallel"
  - from: "aggregate"
    to: "bias_check"
    mode: "sequential"
  - from: "bias_check"
    to: "final_grade"
    mode: "conditional"
    when: "bias.detected == false"
  - from: "bias_check"
    to: "grader_1"
    mode: "fallback"
    label: "Bias detected, re-grade with adjusted prompts"