← All tasks
contextmedium
correlation-causation-confusion
Detect an unsupported causal claim made from correlational evidence.
Why this matters
This task evaluates whether the agent can distinguish correlation from causation and assess whether scientific claims are supported by proper evidence.
Agent instruction
You are acting as a research paper reviewer.
Please review the provided materials in /app:
paper.mdanalysis_summary.txt
Your task is to evaluate whether the paper's main scientific claim is properly supported by its evidence.
Write your review to /app/review.txt.
The file must contain exactly 4 lines:
- A score from 0 to 5 for causal claim validity
- A short issue label
- Whether the causal claim is well supported: yes or no
- A brief explanation
The agent sees only this instruction and the files placed in its container. Reference solutions and verifier tests are intentionally hidden.