PhD Rater

Remote · Part-time · Mid-Senior Level

🌍 Remote Part-time Mid-Senior Machine Learning & AI

About This Role

1\. Role Overview

Mercor is seeking experienced researchers and technical experts to contribute to a project supporting a frontier-model evaluation effort focused on agentic workflows. You’ll design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help surface and diagnose reasoning and problem-solving gaps in a target STEM model. The work centers on building robust, real-world tasks with executable tests and then analyzing model/agent behavior.

2\. Key Responsibilities

Design challenging, real-world STEM problems

Implement each task inside an agentic development environment using Python

3\. Core Qualifications

Deep expertise in data science, machine learning, finance, and/or Python-based coding

Active or recently graduated PhD (Top 20 U.S.-based school)

Strong research background in frontier STEM topics

Ability to engage reliably for 30+ hours/week, primarily on weekdays

Demonstrated technical output such as high-quality open-source contributions (especially in agentic / LLM tooling ecosystems)

Comfort reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors

4\. More About the Opportunity

Initial focus area: agentic workflows for STEM tasks

Familiarity with agentic frameworks and OSS ecosystems is helpful (examples include LangChain, MetaGPT, AutoGen, AutoGPT, CrewAI, LlamaIndex, BabyAGI, SuperAGI, CAMEL, AgentGPT, Dify, etc.)

Deliverables are expected to be reproducible and testable (clear specs, deterministic tests where possible, documented environments)

5\. About Mercor

Mercor is a talent marketplace that connects top experts with leading AI labs and research organizations.

Our investors include Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey.

Thousands of professionals across domains like law, creatives, engineering, and research have joined Mercor to work on frontier projects shaping the next era of AI.

⚡

Interview Prep Guide

EXCLUSIVE

Many top hiring platforms use AI-powered screening as the first step. Here's how to stand out when applying for a PhD Rater role:

Critical Thinking: Demonstrate your ability to assess content for factual accuracy, logical consistency, and source credibility. Show a systematic evaluation framework.

Attention to Detail: Be prepared with examples of catching subtle errors or inconsistencies that others might miss. Precision is the core skill being evaluated.

Calibration: Discuss how you maintain consistent evaluation standards across many items. Self-awareness about your own biases is a strong signal.

Practice Behavioral Question

"Walk me through a project where you had to learn a new domain quickly. How did you ramp up, and what would you do differently?"

Related Opportunities