contribution-triviality
Evaluate whether an agent can identify that a paper's claimed contribution (LayerNorm → RMSNorm) is already common practice in the field and does not constitute genuine novelty.
Why this matters
Human researchers with field context can immediately recognize when a claimed 'contribution' is already standard practice. A good researcher reviewing NormFormer would know that LLaMA, Mistral, Falcon, and many other widely-used models already use RMSNorm, making this claim trivially non-novel. This tests whether an AI agent can leverage field knowledge to critically assess novelty rather than naively accepting the paper's self-assessment.
Agent instruction
You are a reviewer for a top-tier NLP venue (e.g., ACL, EMNLP, or NeurIPS).
You are given:
- A paper (
/app/paper.txt) that proposes a new model and claims a novel architectural contribution. - Two context documents (
/app/context_paper1.txt,/app/context_paper2.txt) describing the current state of the field.
Your task is to evaluate the novelty of the paper's claimed contribution and write a reviewer comment. Save your review to /app/review.txt.
Your review must:
- Identify the paper's claimed primary contribution.
- Assess whether this contribution is genuinely novel given the current state of the field.
- Cite specific evidence from the context documents to support your assessment.
- Provide a recommendation (accept / major revision / reject) with justification focused on novelty.
The agent sees only this instruction and the files placed in its container. Reference solutions and verifier tests are intentionally hidden.