AI Safety Guides

Intro

AI safety is the set of practices that prevent harm from AI systems. It covers accident risk and misuse risk and applies across the full product life cycle.

Safety security ethics

  • Safety prevent harmful model behavior in products
  • Security protect models data users and infra from adversaries
  • Ethics ensure uses align with values rights and fairness

Risk taxonomy

  • Capability hallucination deception goal misgeneralization unwanted autonomy
  • Content self harm counseling hate illegal instructions sexual content involving minors
  • Privacy PII leakage training data exposure reidentification
  • Integrity prompt injection prompt leakage jailbreaking data poisoning
  • Operational unreliable tools overreliance downtime cost spikes
  • Societal bias discrimination misinformation at scale

Safety life cycle

  1. Plan set objectives constraints and risk thresholds
  2. Build apply alignment methods and secure engineering
  3. Evaluate run capability and safety checks before and after release
  4. Deploy guardrails monitoring and human oversight
  5. Improve learn from incidents and update models and prompts