The systems we are building — models that defend other models against adversarial manipulation, agents that reason autonomously about security threats, platforms that deploy AI safely at enterprise scale — require capabilities that do not exist off the shelf. They require research.
Our research program is not separate from our products. It directly informs what we build. The adversarial taxonomy that will shape how Claeth classifies inputs comes from our own attack research. The vulnerability reasoning that will power Prion comes from our work on contextual security analysis. Every system we ship will be grounded in work we can point to, explain, and defend.
We publish openly because the industry improves when knowledge about vulnerabilities and defenses is shared responsibly. We also publish because accountability requires it — claims about safety that cannot be scrutinized are not claims worth making.
Research areas
Adversarial AI. How do language models fail under adversarial pressure — and how do you prevent it without degrading the model's usefulness? We study attack vectors across seven categories: prompt injection, jailbreaks, data exfiltration, instruction override, tool abuse, context manipulation, and multi-turn coercion. For each, we are developing classification systems that operate at inference time with latency constraints that make production deployment realistic. This work feeds directly into Claeth.
Autonomous security. Can AI reason about vulnerabilities the way a human security researcher does — not through pattern matching against known signatures, but through contextual analysis of configurations, dependencies, and failure modes? We are training systems that understand how infrastructure fails in context, that can trace a misconfiguration through its consequences, and that can prioritize findings by exploitability rather than by the severity rating someone assigned to a CVE three years ago. The results shape Prion.
Safe agent systems. Autonomous agents that take actions in the real world need fundamentally different safety guarantees than models that generate text. A chatbot that hallucinates wastes time. An agent that hallucinates can delete a database, send an email to the wrong person, or deploy code that breaks production. We study task planning under uncertainty, supervision boundaries, human-in-the-loop escalation, sandboxed execution, and the audit systems needed to trust automated systems with real decisions. This is the foundation of Apps.
Open problems
These are the questions we are actively working on. We do not have answers yet. When we do, we will publish them.
Model-agnostic adversarial robustness. Can a defense layer protect any model — open-weight, closed-API, fine-tuned, or distilled — without requiring access to the model's internals? Current approaches are tightly coupled to specific architectures. We are investigating classification methods that operate purely on the input stream, making them portable across any inference pipeline.
Contextual vulnerability reasoning. Static analysis tools produce findings. Humans produce judgment. The gap between the two is where most security debt accumulates — findings that are technically correct but practically irrelevant, and findings that are technically minor but contextually catastrophic. We are studying how to train systems that bridge that gap: models that understand not just what a vulnerability is, but whether it matters given the surrounding architecture.
Agent supervision at scale. A single autonomous agent can be supervised by a human. A thousand cannot. As agent deployments scale, the supervision model must change from direct oversight to policy-based governance — rules that constrain what agents can do, monitoring that detects when they deviate, and escalation paths that bring a human in before damage compounds. We are working on frameworks that make this transition safe and auditable.
Adversarial robustness under distribution shift. Attacks evolve. A defense trained on today's prompt injection techniques may not catch tomorrow's. We are studying how to build classification systems that maintain robustness as the attack distribution shifts — without requiring retraining on every new variant.
Publications
Our first research publications are in preparation. When they are ready, they will appear here and on our newsroom. We will not publish prematurely — incomplete findings presented as conclusions do more harm than silence.
In the meantime, our thinking on AI safety, responsible scaling, and the principles that guide our research is published in these documents:
Core views on AI safety
Responsible Scaling Policy
Our approach to AI safety
Join the research team
Our research program operates on hard problems in adversarial machine learning, AI safety, autonomous systems, and security. If you have experience in these areas — or if you think you should be working on these problems even though your background does not fit a standard job description — we want to hear from you.
View open roles or reach out directly at [email protected].