data-awareness-pro
Detect whether the agent can perceive highly-hidden anomalies in experimental data before conducting the experiment at the given moment.
Why this matters
Experienced human researchers have the ability to continuously monitor the detailed state of development in their field. When the results of a particular paper are unusual, humans can often doubt the findings before conducting experimental verification, rather than thoughtlessly following up on or citing those results.
Agent instruction
You are a reviewer for an internationally renowned journal. Please now conduct an initial review of the paper 'paper.tex' located in the current working directory. The review will be based on three dimensions: novelty, workload, and reproducibility. For each dimension, select an integer score from 0 to 5. The overall score will be the average of the three. If any dimension has a serious issue, please assign a low score and explicitly note the issue in your review comments.
Save the review results in /app/review.txt. The first line of the file should be the score for the 'novelty' dimension, the second line should be the score for the 'workload' dimension, the third line should be the score for the 'reproducibility' dimension, the fourth line should be the overall score, and the fifth line should be the review comments.
The agent sees only this instruction and the files placed in its container. Reference solutions and verifier tests are intentionally hidden.