AI Safety Guides

Updated 2025 Overview

Intro

AI safety is the set of practices that prevent harm from AI systems. It covers accident risk and misuse risk and applies across the full product life cycle.

Safety security ethics

Safety prevent harmful model behavior in products
Security protect models data users and infra from adversaries
Ethics ensure uses align with values rights and fairness

Risk taxonomy

Capability hallucination deception goal misgeneralization unwanted autonomy
Content self harm counseling hate illegal instructions sexual content involving minors
Privacy PII leakage training data exposure reidentification
Integrity prompt injection prompt leakage jailbreaking data poisoning
Operational unreliable tools overreliance downtime cost spikes
Societal bias discrimination misinformation at scale

Safety life cycle

Plan set objectives constraints and risk thresholds
Build apply alignment methods and secure engineering
Evaluate run capability and safety checks before and after release
Deploy guardrails monitoring and human oversight
Improve learn from incidents and update models and prompts