Research Engineer, Agentic AI Evals
📍 Remote, San Francisco, USA, Singapore, SingaporeOverview
Join HUD as a Research Engineer to develop agentic evals for CUAs, with a focus on Python, Docker, and Linux.
Job Description
## Content Summary
### About HUD
HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs. We're backed by Y Combinator and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.
### About the Role
We're looking for a research engineer to help build out task configs and environments for evaluation datasets on HUD's CUA evaluation framework. Responsibilities include:
- Building environments for HUD's CUA evaluation datasets, including evals for safety redteaming, general business tasks, long-horizon agentic tasks, etc.
- Creating custom CUA datasets/evaluation pipelines.
### Requirements
- Proficiency in Python, Docker, and Linux environments.
- React experience for frontend development.
- Production-level software development experience preferred.
- Strong technical aptitude and demonstrated problem-solving ability.
- Experience with LLM evaluation frameworks and methodologies.
- Contribution to evaluation harnesses (EleutherAI, Inspect, or similar).
- Experience with agentic or multimodal AI evaluation systems.
### Team & Company Details
- Team Size: ~15 people, mostly full-time in-person, but some remote.
- Our team includes international Olympiad medallists, serial AI startup founders, and researchers with publications at ICLR, NeurIPS, etc.
- Company stage: $2 million in seed funding, scaling profitably and fast.
### Logistics
- Employment: Full-time preferred, but part-time/internship arrangements considered.
- Location: Fully remote-friendly, with offices in San Francisco Bay Area and Singapore.
- Visa Sponsorship: Support for relocation and visas for strong full-time candidates.
- Timeline: Applications are rolling, with a process involving an initial call, a take-home assignment, and a paid work trial.
### Contact
For applications, contact recruiting@hud.so.
Required Skills
Job Details
- Job Type
- Full-Time, Part-Time, Internship
- Experience Level
- Entry Level
- Location
- Remote, San Francisco, USA, Singapore, Singapore