Safety Company News
Get Workers
← News

Publishing our Responsible Scaling Policy

Today we are publicly releasing Neuraphic's Responsible Scaling Policy (RSP), a document that governs how we evaluate the safety of our AI systems before, during, and after deployment. The RSP is not a set of aspirations. It is a binding internal framework — one that requires us to demonstrate, through empirical testing, that a model is safe before we permit it to operate at a given capability level.

We publish this document because we believe the AI industry's approach to safety communication has been insufficient. Voluntary commitments, while well-intentioned, lack the specificity required to hold organizations accountable. Our RSP is designed to be falsifiable: either we meet its criteria, or we do not ship.

The problem with capability thresholds

As language models grow more capable, the surface area for misuse expands in ways that are difficult to predict from benchmarks alone. A model that scores marginally higher on a reasoning evaluation may, in practice, cross a qualitative threshold — enabling new categories of harm that were previously theoretical. We reject the notion that capability improvements are inherently desirable. Each increment in capability must be accompanied by a corresponding increment in our ability to understand and control the system's behavior.

Our RSP defines four capability levels, each associated with a distinct risk profile. At each level, we specify the evaluations that must be passed before deployment is authorized. These evaluations cover autonomous behavior, persuasion and manipulation, weapons-related knowledge, and cybersecurity offense. The evaluations are conducted by an internal safety team that operates independently of our product organization.

Evaluation methodology

We evaluate models using a combination of automated red-teaming, structured human evaluations, and adversarial probing by external researchers. No single methodology is sufficient. Automated testing catches systematic vulnerabilities but misses novel attack vectors. Human evaluators bring contextual judgment but cannot operate at scale. External adversaries provide the closest approximation to real-world threat conditions.

For each capability level, the RSP specifies minimum passing criteria — quantitative thresholds that a model must clear across all evaluation categories. These thresholds are set conservatively. We would rather delay a deployment by months than discover, post-launch, that a model enables a category of harm we failed to anticipate.

The commitment to pause

The most consequential provision of our RSP is the pause commitment. If a model under development approaches a capability level for which we have not yet developed adequate safety evaluations, we will halt scaling until those evaluations exist and the model passes them. This is not a theoretical provision. We have already exercised it once during the development of our current generation of systems, delaying an internal milestone by eleven weeks while our safety team developed new evaluation protocols for multi-step reasoning chains.

We recognize that pausing carries real costs — to our engineering timelines, to our competitive position, and to the expectations of our partners. We accept those costs. The alternative — deploying systems whose risk profiles we do not fully understand — is not a cost we are willing to externalize to the public.

Transparency and iteration

This policy is versioned. We expect to revise it as our understanding of AI risk matures and as the capabilities of our systems evolve. Each revision will be published, and material changes will be accompanied by an explanation of what prompted the update. We invite scrutiny from researchers, policymakers, and the public.

We do not claim that our RSP is a complete solution to the challenge of AI safety governance. It is, however, a concrete and enforceable one. We believe the field will be better served by organizations that commit to specific, verifiable standards than by those that offer broad assurances without mechanisms for accountability.

What comes next

In the coming months, we will publish the full technical specifications of our evaluation suite, including the datasets, scoring rubrics, and threshold calculations that underpin each capability level assessment. We will also release a detailed account of the internal governance structure that oversees RSP enforcement — including the authority of our safety team to block deployments without executive override.

Building powerful AI systems is a privilege that carries obligations. This policy is how we intend to meet them.