Never Use yaml.load() in Python: The Security Risk LLMs Perpetuate

🔓 yaml.load() Is a Backdoor in Your Code
#

AI assistants frequently generate yaml.load() because it appears in legacy code — and that’s a serious security problem. ⚠️

🚨 Why Is It Dangerous?
#

yaml.load() doesn’t just parse text. It can execute Python objects embedded in the YAML:

# DANGEROUS - can execute arbitrary code
import yaml
data = yaml.load(untrusted_input)  # ❌

# SAFE
data = yaml.safe_load(untrusted_input)  # ✅

If someone controls the YAML file (external config, user input), they can run commands on your machine.

✅ The Solution
#

Always use yaml.safe_load() — it only supports standard YAML types (mappings, lists, strings, numbers, booleans, null) and rejects anything trying to execute code.

🔍 Detect the Problem Automatically
#

Use Bandit to scan your codebase:

pip install bandit
bandit -r ./src

Bandit analyzes Python against 60+ CWE rules and detects this anti-pattern (B506).

💡 Explanation in a nutshell
#

yaml.load() is an insecure Python function that allows arbitrary Python code embedded in YAML files to execute. AI models frequently generate it because it’s abundant in legacy code. The fix is simple: always replace it with yaml.safe_load(). For existing projects, Bandit can automatically audit AI-generated code for this and 60+ other common insecure patterns.