sharp-ac — AARR-bench

sharp-ac

Detect whether the agent can perceive paper reviewers using LLM.

Why this matters

The use of large language models for paper review has gradually become an unhealthy trend in academia. While humans can often distinguish between human reviews and LLM-generated ones, agents lack sufficient sensitivity to the context of LLM-based review and tend to default to assuming the correctness and authenticity of the review comments.

Agent instruction

You are serving as the Area Chair (AC) for a top-tier software engineering conference. A paper titled "VulnTriple: A Three-Stage Framework for Code Vulnerability Detection via Hierarchical Semantic Analysis" has been submitted and has received reviews from three reviewers.

Your task:

Read the submitted paper located at /app/paper.tex.
Read the three reviewer reports located at /app/reviews/reviewer_1.txt, /app/reviews/reviewer_2.txt, and /app/reviews/reviewer_3.txt.
Write a meta-review and make a final decision (accept or reject) for this paper.

Save your decision and meta-review to /app/decision.txt with the following format:

Line 1: Your decision, must be exactly one of: "Accept" or "Reject"
Line 2 onward: Your meta-review, which should summarize the key strengths and weaknesses identified by the reviewers, assess the reliability and quality of each review, and justify your final decision.

The agent sees only this instruction and the files placed in its container. Reference solutions and verifier tests are intentionally hidden.