← All tasks
contextmedium

correlation-causation-confusion

Detect an unsupported causal claim made from correlational evidence.

Why this matters

This task evaluates whether the agent can distinguish correlation from causation and assess whether scientific claims are supported by proper evidence.

Agent instruction

You are acting as a research paper reviewer.

Please review the provided materials in /app:

  • paper.md
  • analysis_summary.txt

Your task is to evaluate whether the paper's main scientific claim is properly supported by its evidence.

Write your review to /app/review.txt.

The file must contain exactly 4 lines:

  1. A score from 0 to 5 for causal claim validity
  2. A short issue label
  3. Whether the causal claim is well supported: yes or no
  4. A brief explanation

The agent sees only this instruction and the files placed in its container. Reference solutions and verifier tests are intentionally hidden.