Research Engineer, Agentic AI Evals

📍 Remote, San Francisco, USA, Singapore, Singapore
HUDAI JobsSep 6, 2025
Type
Full-TimePart-TimeInternship
Level
Entry Level

Overview

Join HUD as a Research Engineer to develop agentic evals for CUAs, with a focus on Python, Docker, and Linux.

Job Description

## Content Summary

### About HUD
HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs. We're backed by Y Combinator and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.

### About the Role
We're looking for a research engineer to help build out task configs and environments for evaluation datasets on HUD's CUA evaluation framework. Responsibilities include:
- Building environments for HUD's CUA evaluation datasets, including evals for safety redteaming, general business tasks, long-horizon agentic tasks, etc.
- Creating custom CUA datasets/evaluation pipelines.

### Requirements
- Proficiency in Python, Docker, and Linux environments.
- React experience for frontend development.
- Production-level software development experience preferred.
- Strong technical aptitude and demonstrated problem-solving ability.
- Experience with LLM evaluation frameworks and methodologies.
- Contribution to evaluation harnesses (EleutherAI, Inspect, or similar).
- Experience with agentic or multimodal AI evaluation systems.

### Team & Company Details
- Team Size: ~15 people, mostly full-time in-person, but some remote.
- Our team includes international Olympiad medallists, serial AI startup founders, and researchers with publications at ICLR, NeurIPS, etc.
- Company stage: $2 million in seed funding, scaling profitably and fast.

### Logistics
- Employment: Full-time preferred, but part-time/internship arrangements considered.
- Location: Fully remote-friendly, with offices in San Francisco Bay Area and Singapore.
- Visa Sponsorship: Support for relocation and visas for strong full-time candidates.
- Timeline: Applications are rolling, with a process involving an initial call, a take-home assignment, and a paid work trial.

### Contact
For applications, contact recruiting@hud.so.

Required Skills

PythonDockerLinuxReactLLM evaluation frameworksproblem-solving

Job Details

Job Type
Full-Time, Part-Time, Internship
Experience Level
Entry Level
Location
Remote, San Francisco, USA, Singapore, Singapore

Ready to Apply?

Click below to apply directly with HUD

Apply Now
Research Engineer, Agentic AI Evals at HUD | JibJob