Post-crash Troubleshooting with FusionReactor Cloud
Overview
This guide explains how to use FusionReactor Cloud to diagnose and prevent application server crashes. It includes steps to identify root causes using memory metrics, transaction traces, and logs, followed by setting up proactive alerts.
Info
While this guide highlights a memory-related example, the same process can help uncover other root causes like CPU saturation, slow database calls, or blocked threads.
Video
Key features for crash investigation
Feature | Description |
---|---|
Historic Metrics | Retained for 13 months — accessible even after server restarts. |
Traces & Logs | Stored for 30 days — detailed transaction-level visibility. |
Anomaly Detection | Uses R.E.D. metrics to automatically flag unusual application behavior. |
Custom Alerts | User-defined thresholds for memory, latency, CPU, and uptime monitoring. |
Example scenario
An application server (e.g., Storefront 1) is crashing intermittently. The goal is to:
- Investigate the cause using FusionReactor Cloud.
- Prevent future crashes with proactive monitoring.
This example focuses on memory issues, but the same steps apply when diagnosing other causes such as CPU pressure or database slowdowns.
Step 1: Spot the issue
- Open the affected server in FusionReactor Cloud.
- Use the Live Mode Clock to select a custom time range (e.g., last 6 hours).
- The time filter syncs across:
- Metrics
- Transactions
- JDBC calls
- Logs
What to look for
- Inspect the Used Heap Memory graph.
- Look for sharp spikes, followed by sudden drops or gaps, in the graph.
- Also check for anomalies in:
- CPU usage
- GC activity
- Thread states
Step 2: Isolate the root cause
- Navigate to the Transactions tab.
- Select Saved in Cloud to access stored transaction history.
- Sort by Duration to identify slow or abnormal requests.
Signs of trouble
- A normally fast transaction (e.g.,
Checkout
) starts taking significantly longer. - Specific outliers (e.g.,
Store Cache
) may be consuming excessive memory, CPU, or holding resources. - Watch for patterns where long-running transactions line up with metric spikes.
Validate with logs
- Check server logs for crash-related errors like:
OutOfMemoryError
- Thread deadlocks
- Uncaught exceptions
- Match the timestamps to metrics and transactions to confirm correlation.
Step 3: Prevent future crashes
Anomaly Detection (AI Plan)
Detects irregularities in:
- Request volume
- Response times
- Error rates
Sends alerts via:
- Webhooks (e.g., Slack, Microsoft Teams)
Sensitivity is adjustable per environment.
Custom Alerts
Set up rules under Alerting & Thresholds:
- Memory usage > 80%
- CPU usage > 90%
- Response time > 3 seconds
- Server offline
- JDBC pool nearly full
Get real-time alerts to act quickly when thresholds are breached.
Outcome
By following this process, you can:
- Identify the specific cause of a crash — whether memory, CPU, thread, or database related.
- Correlate data between metrics, transactions, and logs.
- Configure automated alerts to catch early warning signs and avoid recurrence.