Machine Learning Engineer, Prompt Safety and Agent Security

New

Posted 1 hour ago • Less than 10 applicants • Be one of the first to apply!

Our Client - Information Technology & Services company

Mountain View, CA

$100.00 - $107.97/hour

Exact compensation may vary based on skills, experience, and location.

40 hrs/wk

Contract (w2)

Remote work no

Travel not required

Start date

June 8, 2026

End date

June 8, 2027

Superpower

Technology

Capabilities

Data Science and Machine Learning

Technology Product Management

Technology Architecture

IT Security and Governance

Technical Program/Project Management

Preferred skills

Cross-Functional Collaboration

Extended Reality

Augmented Reality

Quantization

AI Research

Python (Programming Language)

Machine Learning

Software Engineering

Red Teaming

PyTorch (Machine Learning Library)

Mobile Agent

Agentic AI

Data Curation

Computer Science

Version Control

Artificial Intelligence

Pipelines

Preferred industry experience

Information Technology & Services

Experience level

0 - 4 years of experience

Job description

Our Customer is a Silicon Valley-based company that is engaged in researching emerging technologies.

We are seeking a contract Machine Learning Engineer to help support our Customer's business needs. This role is on-site in Mountain View, CA.

This role will lead the development of prompt injection and prompt safety models that protect the downstream agentic AI systems across phone, cloud, and XR/AR. You will design, train, and deploy classifier and guardrail models (both cloud-based and hybrid on-device) that screen agent inputs and outputs for injection attacks, unsafe content, and policy violations. A core part of the role is post-training these models with RLHF, DPO, and related optimization techniques to push detection accuracy and false-positive rates beyond what off-the-shelf solutions provide.

Responsibilities:

Design and train prompt injection detection models and prompt safety classifiers for agentic AI systems.
Build safety models that evaluate both inputs and outputs across AI workflows.
Develop hybrid deployment pipelines that split safety inference between on-device and cloud environments.
Optimize safety inference systems for latency, privacy, and detection coverage.
Apply post-training techniques such as RLHF, reward modeling, DPO, RLAIF, and policy optimization to improve guardrail model performance.
Improve model calibration, stability, and robustness against adaptive adversarial attacks.
Curate and generate adversarial training data, including prompt injections, jailbreaks, tool-use exploits, and unsafe-output cases.
Leverage red-teaming outputs and production signals to improve training datasets.
Build evaluation harnesses to measure attack success rate, false positive rate, latency, and on-device footprint.
Evaluate model iterations across threat categories and deployment environments.
Partner with agent, device, and platform teams to integrate safety models into mobile agents, XR/AR assistants, and cloud agentic workflows.
Close the loop between production incidents, model evaluation, and training data improvements.
Collaborate cross-functionally with security researchers, modeling teams, and product engineers.
Document technical methods and contribute to patents, publications, or open-source work where appropriate.

Skills and Qualifications:

M.S. or Ph.D. in Computer Science, Machine Learning, Electrical Engineering, or a related field, or B.S. with equivalent industry experience.
3+ years of industry experience in ML engineering or applied AI research with ownership of production ML systems.
2+ years of industry experience in software engineering.
Strong proficiency in Python and PyTorch, JAX, or TensorFlow.
Strong software engineering fundamentals, including version control, testing, and reproducible experimentation.
Hands-on experience post-training LLMs using RLHF, DPO, RLAIF, or reward modeling.
Experience with reward design, preference data curation, and training stability.
Hands-on experience training and deploying classifier or guardrail models for safety, content moderation, abuse detection, or adversarial robustness.
Familiarity with prompt injection, jailbreak detection, and agentic AI threat models.
Experience with distributed training frameworks such as DeepSpeed, FSDP, or Accelerate.
Strong experience in machine learning engineering, applied AI research, and software engineering.
Strong understanding of safety model deployment, classifier training, and guardrail model training.
Strong analytical, documentation, and cross-functional collaboration skills.

Preferred Qualifications:

Experience building safety or moderation systems for agentic AI.
Experience with tool-use guardrails, indirect prompt injection defenses, or output filtering for autonomous agents.
Experience with red-teaming, adversarial data generation, or automated attack pipelines such as GCG, PAIR, or generator-critic frameworks.
Experience with on-device or edge ML deployment using ExecuTorch, Core ML, TFLite, MLC-LLM, or vendor NPU toolchains.
Experience with model compression techniques such as quantization, distillation, or pruning for safety models.
Experience with telemetry, logging, or user-facing data systems on mobile, XR/AR, or consumer platforms.
Experience with privacy-preserving user data handling, including anonymization, on-device processing, or federated approaches.
Publications at top-tier ML, NLP, or security venues.
Patents or open-source contributions in safety, alignment, or AI security.

We offer a competitive salary range for this position. Most candidates who join our team are hired at the median of this range, ensuring fair and equitable compensation based on experience and qualifications.

Contractor benefits are available through our 3rd Party Employer of Record (Available upon completion of waiting period for eligible engagements)

Benefits include: Medical, Dental, Vision, 401k.

An Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.

All applicants applying for U.S. job openings must be legally authorized to work in the United States and are required to have U.S. residency at the time of application.

If you are a person with a disability needing assistance with the application, or at any point in the hiring process, please contact us at support@themomproject.com.

Apply