Why frameworks beat heroics
In our first year doing WordPress emergency work, we learned a brutal lesson: every emergency is different, but the response pattern that works is always the same. The teams that handle disasters well don't have smarter engineers β they have a framework that prevents them from making the obvious mistakes that compound damage.
This article describes the 5-phase framework we use on every WordPress emergency, from "the site is down" to "this won't happen again." Each phase has its own goal, its own success criteria, and its own anti-patterns.
Phase 1 β Identify
The first thing to know is what actually broke. This sounds obvious but is the most-skipped phase. Engineers under pressure start fixing before they understand what they're fixing.
Goal: characterize the incident in three sentences.
- What is the user experience? (white screen, redirect to spam, 500 error, slow page)
- What is the technical signature? (PHP fatal, MySQL connection refused, malware payload signature, etc.)
- What is the scope? (single page, single user, site-wide, multiple sites)
Standard identification toolkit
# What does a visitor see?
curl -I https://yoursite.com/
# What does WordPress think is happening?
wp doctor check --all
# What does the server log say?
tail -200 /var/log/nginx/error.log
tail -200 /var/log/php-fpm.log
journalctl -u mysql -n 200
# When did the failure start?
last -F | head -20 # recent logins / system events
find /var/www/yoursite -newer /tmp/marker -type f # files modified after a known good timeAnti-patterns
- Jumping into a fix before reading any logs
- Assuming the obvious cause is the real cause (slow site = "needs cache plugin" is a 50/50 guess)
- Trying to fix more than one issue simultaneously
- Skipping the scope question and over-fixing
Time budget: 5β15 minutes. If you can't characterize the incident in 15 minutes, you're missing something fundamental β start at the network layer and work up.
Phase 2 β Contain
Once you know what's wrong, stop the damage from spreading. Containment is not the fix; it's the stop-bleeding measure that gives you space to do the fix properly.
Goal: limit blast radius without making things worse.
Standard containment moves
- Take site offline if it's serving malicious content. A maintenance page is better than serving malware to your customers and getting blacklisted. We use a static maintenance HTML or temporarily redirect to a status page.
- Snapshot current state immediately. Whatever you're about to change, take a backup of what it looks like right now. This is your fallback if your fix makes things worse.
- Block the attack vector if active. Brute force attack? Cloudflare under attack mode. SQL injection probe? Block the IP at the WAF. Compromised admin user? Disable the user account.
- Stop background processes that could compound the issue. Disable cron, pause backups (they'd back up the broken state), suspend deploy pipelines.
Containment for common scenarios
| Scenario | Containment action |
|---|---|
| Active malware serving | Static maintenance page, block all writes |
| Brute force on login | Cloudflare under attack mode, rate limit |
| Database overload | Disable cron, kill long queries, isolate slow plugin |
| Hacked admin user | Reset all admin passwords, invalidate sessions, disable suspicious accounts |
| Plugin update broke site | Roll back the plugin, hold further updates |
Anti-patterns
- Skipping the snapshot because "we know what we're doing"
- Hoping the issue resolves on its own
- Communicating to customers before containment is complete (they'll ask questions you can't answer yet)
Time budget: 5β10 minutes.
Phase 3 β Recover
Now you fix the actual issue. This is where most articles spend all their time, but it's only one phase of five. Recovery work depends on what was broken.
Goal: restore normal operation with confidence the fix is durable.
Recovery principles
- Fix the root cause, not the symptom. If 500 errors started after a plugin update, don't just deactivate the plugin and walk away β identify why the update broke it, decide whether to wait for a fix or replace the plugin.
- Test in isolation before deploying. If you're patching a function, run the patched file through
php -lfirst. If you're changing config, check syntax before reloading. - Apply changes incrementally. One change, verify, next change. Big-bang fixes have higher rollback risk.
- Restore from backup if uncertain. For hacked sites where you can't be sure you've found every backdoor, a clean restore from pre-incident backup is faster and safer than chasing every modified file.
Recovery checklists for the four most common emergencies
Plugin update broke the site 1. Identify the offending plugin (Health Check Troubleshooting, error log) 2. Roll back via WP-CLI: wp plugin install pluginname --version=X.Y.Z --force 3. Verify front-end and admin both function 4. Decide: stay on old version, or wait for fix?
Malware infection 1. Take full file + DB snapshot of compromised state (for forensics) 2. Restore from clean backup (taken before infection) 3. If no clean backup: forensic cleanup of every modified file (we use file integrity scan output) 4. Reset all credentials: admin passwords, DB password, salts, API keys 5. Scan for backdoors that survived (hidden mu-plugins, modified core files)
Database corruption 1. Stop writes immediately 2. Take a binary backup of the data directory 3. Try mysqlcheck --repair first 4. If InnoDB: innodb_force_recovery=1 through 6 in my.cnf, restart, dump good data 5. Restore from the dump into a fresh database
Server resource exhaustion 1. Identify the resource (CPU, RAM, disk, connections) 2. Identify the consumer (htop, iotop, SHOW PROCESSLIST) 3. Kill or throttle the consumer 4. Add capacity if it's a legitimate workload 5. Optimize if it's a misbehaving plugin
Anti-patterns
- Applying multiple fixes without verifying each
- "Fix" by uninstalling everything (loses customer data)
- Skipping the backup-before-restore step
- Declaring victory before testing in a real browser
Time budget: 30 minutes to several hours depending on severity.
Phase 4 β Harden
The site is back up. Don't stop here. The same attack vector that caused this incident will be used again β sometimes by the same attacker, sometimes by a different one running the same automated tool.
Goal: make the same incident impossible to repeat.
Hardening based on incident type
| Incident | Hardening |
|---|---|
| Brute force succeeded | Add 2FA, change /wp-login.php path, IP whitelist if possible |
| Plugin vulnerability exploited | Remove vulnerable plugin, audit similar plugins, add WAF rules |
| Stolen credentials | Reset all passwords, audit account creation events, enable activity logging |
| Malware via wp-content/uploads | Disable PHP execution in uploads dir at server level |
| Database leak | Audit query patterns, restrict DB user privileges, encrypt sensitive columns |
Generic hardening that helps every incident
- Update WordPress core, themes, all plugins to latest
- Remove all inactive plugins and themes
- Set strong unique passwords for every account
- Enforce 2FA for administrator and editor roles
- Configure file integrity monitoring
- Move backups off-server
- Implement a WAF if not already present
Anti-patterns
- "Quick hardening" β making changes without testing them
- Adding a security plugin without configuring it
- Sharing the new credentials in plaintext channels (Slack, email)
Time budget: 1β4 hours depending on the changes needed.
Phase 5 β Monitor
The final phase is the one that pays for the previous four. Without monitoring, you find out about the next incident the same way you found out about this one β too late, from a customer complaint.
Goal: detect a recurrence within minutes, not days.
The monitoring stack
- Uptime β external monitor pinging every 60 seconds (UptimeRobot, BetterStack)
- File integrity β alert when any file in wp-content changes outside a planned deploy
- Database β alert when query patterns deviate from baseline
- Error rate β alert when 5xx errors spike above baseline
- Reputation β daily check against Google Safe Browsing and major blacklists
- Audit log β every admin action recorded and reviewable
Where alerts go
- File integrity changes β Slack, instant
- Uptime down β Email + SMS, instant
- Blacklist appearance β Email + SMS, instant
- Error rate spike β Slack
- Audit log review β Weekly email summary
The post-incident debrief
48β72 hours after the incident is closed, run a debrief:
- What happened?
- How did we find out?
- What did we do?
- What worked? What didn't?
- What will we change in our process?
Document this. Three incidents into a real debrief practice, your team's response time will halve.
When this framework breaks down
The framework assumes you have access, time, and basic competence. It breaks down when:
- The hosting account itself is compromised (lockout from the hosting panel)
- The attacker is still actively in the system (you need to evict them first)
- The database is destroyed beyond restoration and no backups exist
- Legal/regulatory implications require external counsel before action
These are the cases where you call in specialists. We handle these scenarios routinely β average time to incident closure for a compromised hosting account is 4β6 hours.
Emergency response β under 15 minutes to first contact. Malware removal and hacked website repair follow this exact framework with the technical depth required for each scenario.

