Research Scientist / Engineer – Multimodal Capabilities
📍 RemoteOverview
Research role focusing on multimodal AI capabilities, requiring strong Python and PyTorch skills.
Job Description
## Content Summary
### Location
- Remote - US
### Employment Type
- Not specified
### Department
- Multimodal Capabilities
### Compensation
- Estimated Base Salary: $200K – $300K
- Offers Equity
### About the Role
The Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the open-ended challenges of what makes multimodal AI systems truly powerful and versatile.
### Responsibilities
- Identify capability gaps and research solutions
- Design datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and language
- Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
- Create prototypes and demonstrations that showcase new multimodal capabilities
### Experience
- Strong programming skills in Python and PyTorch
- Experience with multimodal data processing pipelines and large-scale dataset curation
- Understanding of computer vision, audio processing, and/or natural language processing techniques
- (Preferred) Expertise working with interleaved multimodal data
- (Preferred) Hands-on experience with Vision Language Models, Audio Language Models, or generative video models
### Additional Information
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Required Skills
Job Details
- Job Type
- Full-Time
- Experience Level
- Entry Level
- Location
- Remote
- Salary
- $200,000-300,000/yr