Remote · Part-time · Mid-Senior Level
Mercor is seeking experienced researchers and technical experts to contribute to a project supporting a frontier-model evaluation effort focused on agentic workflows. You’ll design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help surface and diagnose reasoning and problem-solving gaps in a target STEM model. The work centers on building robust, real-world tasks with executable tests and then analyzing model/agent behavior.
Many top hiring platforms use AI-powered screening as the first step. Here's how to stand out when applying for a PhD Rater role:
Critical Thinking: Demonstrate your ability to assess content for factual accuracy, logical consistency, and source credibility. Show a systematic evaluation framework.
Attention to Detail: Be prepared with examples of catching subtle errors or inconsistencies that others might miss. Precision is the core skill being evaluated.
Calibration: Discuss how you maintain consistent evaluation standards across many items. Self-awareness about your own biases is a strong signal.
"Walk me through a project where you had to learn a new domain quickly. How did you ramp up, and what would you do differently?"