← All tasks
contextmedium

contribution-triviality

Evaluate whether an agent can identify that a paper's claimed contribution (LayerNorm → RMSNorm) is already common practice in the field and does not constitute genuine novelty.

Why this matters

Human researchers with field context can immediately recognize when a claimed 'contribution' is already standard practice. A good researcher reviewing NormFormer would know that LLaMA, Mistral, Falcon, and many other widely-used models already use RMSNorm, making this claim trivially non-novel. This tests whether an AI agent can leverage field knowledge to critically assess novelty rather than naively accepting the paper's self-assessment.

Agent instruction

You are a reviewer for a top-tier NLP venue (e.g., ACL, EMNLP, or NeurIPS).

You are given:

  1. A paper (/app/paper.txt) that proposes a new model and claims a novel architectural contribution.
  2. Two context documents (/app/context_paper1.txt, /app/context_paper2.txt) describing the current state of the field.

Your task is to evaluate the novelty of the paper's claimed contribution and write a reviewer comment. Save your review to /app/review.txt.

Your review must:

  1. Identify the paper's claimed primary contribution.
  2. Assess whether this contribution is genuinely novel given the current state of the field.
  3. Cite specific evidence from the context documents to support your assessment.
  4. Provide a recommendation (accept / major revision / reject) with justification focused on novelty.

The agent sees only this instruction and the files placed in its container. Reference solutions and verifier tests are intentionally hidden.