Post-crash Troubleshooting with FusionReactor Cloud

Overview

This guide explains how to use FusionReactor Cloud to diagnose and prevent application server crashes. It includes steps to identify root causes using memory metrics, transaction traces, and logs, followed by setting up proactive alerts.

Info

While this guide highlights a memory-related example, the same process can help uncover other root causes like CPU saturation, slow database calls, or blocked threads.

Video

Key features for crash investigation

Feature	Description
Historic Metrics	Retained for 13 months — accessible even after server restarts.
Traces & Logs	Stored for 30 days — detailed transaction-level visibility.
Anomaly Detection	Uses R.E.D. metrics to automatically flag unusual application behavior.
Custom Alerts	User-defined thresholds for memory, latency, CPU, and uptime monitoring.

Example scenario

An application server (e.g., Storefront 1) is crashing intermittently. The goal is to:

Investigate the cause using FusionReactor Cloud.
Prevent future crashes with proactive monitoring.

This example focuses on memory issues, but the same steps apply when diagnosing other causes such as CPU pressure or database slowdowns.

Step 1: Spot the issue

Open the affected server in FusionReactor Cloud.
Use the Live Mode Clock to select a custom time range (e.g., last 6 hours).
The time filter syncs across:
- Metrics
- Transactions
- JDBC calls
- Logs

What to look for

Inspect the Used Heap Memory graph.
Look for sharp spikes, followed by sudden drops or gaps, in the graph.
Also check for anomalies in:
- CPU usage
- GC activity
- Thread states

Step 2: Isolate the root cause

Navigate to the Transactions tab.
Select Saved in Cloud to access stored transaction history.
Sort by Duration to identify slow or abnormal requests.

Signs of trouble

A normally fast transaction (e.g., Checkout) starts taking significantly longer.
Specific outliers (e.g., Store Cache) may be consuming excessive memory, CPU, or holding resources.
Watch for patterns where long-running transactions line up with metric spikes.

Validate with logs

Check server logs for crash-related errors like:
- OutOfMemoryError
- Thread deadlocks
- Uncaught exceptions
Match the timestamps to metrics and transactions to confirm correlation.

Step 3: Prevent future crashes

Anomaly Detection (AI Plan)

Detects irregularities in:

Request volume
Response times
Error rates

Sends alerts via:

Email
Webhooks (e.g., Slack, Microsoft Teams)

Sensitivity is adjustable per environment.

Custom Alerts

Set up rules under Alerting & Thresholds:

Memory usage > 80%
CPU usage > 90%
Response time > 3 seconds
Server offline
JDBC pool nearly full

Get real-time alerts to act quickly when thresholds are breached.

Outcome

By following this process, you can:

Identify the specific cause of a crash — whether memory, CPU, thread, or database related.
Correlate data between metrics, transactions, and logs.
Configure automated alerts to catch early warning signs and avoid recurrence.