Troubleshooting and System Notifications Guide: Ibm Security Qradar 7.3.3
Troubleshooting and System Notifications Guide: Ibm Security Qradar 7.3.3
Troubleshooting and System Notifications Guide: Ibm Security Qradar 7.3.3
7.3.3
IBM
Note
Before you use this information and the product that it supports, read the information in “Notices” on
page 57.
Product information
This document applies to IBM® QRadar® Security Intelligence Platform V7.3.3 and subsequent releases unless
superseded by an updated version of this document.
© Copyright International Business Machines Corporation 2012, 2019.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
iii
Disk full for the asset change queue ...................................................................................................27
Disk replication falling behind..............................................................................................................28
Disk storage available.......................................................................................................................... 28
Disk storage unavailable...................................................................................................................... 29
Disk usage exceeded max threshold................................................................................................... 29
Disk usage exceeded warning threshold............................................................................................. 29
Disk usage returned to normal.............................................................................................................30
Insufficient disk space to export data ................................................................................................ 30
Predictive disk failure...........................................................................................................................30
Process monitor must lower disk usage.............................................................................................. 31
Event and flow notifications for QRadar appliances................................................................................. 31
Event or flow data not indexed............................................................................................................ 31
Event pipeline dropped connections................................................................................................... 31
Event pipeline dropped events............................................................................................................ 32
Events routed directly to storage.........................................................................................................32
Expensive custom properties found.................................................................................................... 33
Flow collector cannot establish initial time synchronization.............................................................. 33
Maximum events or flows reached...................................................................................................... 34
Failure notifications for QRadar appliances.............................................................................................. 34
Accumulator cannot read the view definition for aggregate data ......................................................34
Accumulator is falling behind.............................................................................................................. 34
Filter initialization failed.......................................................................................................................35
Infrastructure component is corrupted or did not start......................................................................36
Process monitor application failed to start multiple times................................................................. 36
Store and forward schedule did not forward all events...................................................................... 36
Time synchronization failed................................................................................................................. 36
User authentication failed for automatic updates...............................................................................37
User does not exist or is undefined..................................................................................................... 37
Certificate expires soon....................................................................................................................... 37
Certificate is expired............................................................................................................................ 38
High Availability notifications for QRadar appliances...............................................................................38
Active high-availability (HA) system failure.........................................................................................38
Failed to uninstall a high-availability (HA) appliance.......................................................................... 39
Failed to install high availability .......................................................................................................... 39
Standby high-availability (HA) system failure..................................................................................... 39
License notifications for QRadar appliances.............................................................................................39
License expired.....................................................................................................................................40
License near expiration........................................................................................................................ 40
Process monitor license expired or invalid..........................................................................................40
Limit notifications for QRadar appliances................................................................................................. 41
Aggregated data limit was reached..................................................................................................... 41
Found an unmanaged process that is causing long transaction......................................................... 41
Long running reports stopped..............................................................................................................42
Long transactions for a managed process........................................................................................... 42
Maximum sensor devices monitored...................................................................................................43
Process exceeds allowed run time...................................................................................................... 43
SAR sentinel operation restore............................................................................................................ 43
SAR sentinel threshold crossed........................................................................................................... 44
Threshold reached for response actions............................................................................................. 44
Log and log source notifications for QRadar appliances...........................................................................44
An error occurred when the log files were collected...........................................................................44
Expensive DSM extensions were found............................................................................................... 45
Log files were successfully collected ..................................................................................................46
Log source created in a disabled state................................................................................................ 46
Unable to determine associated log source........................................................................................ 46
Memory and backup notifications for QRadar appliances........................................................................47
Backup unable to complete a request................................................................................................. 47
Backup unable to run a request........................................................................................................... 47
iv
Device backup failure........................................................................................................................... 48
Last backup exceeded the allowed time limit..................................................................................... 48
Backup unable to find storage directory error.....................................................................................48
Out of memory error.............................................................................................................................49
Out of memory error and erroneous application restarted.................................................................49
Offense notifications for QRadar appliances............................................................................................ 50
Magistrate is unable to persist offense updates................................................................................. 50
Maximum active offenses reached...................................................................................................... 50
Maximum total offenses reached........................................................................................................ 51
Repair notifications for QRadar appliances...............................................................................................51
Accumulation is disabled for the anomaly detection engine.............................................................. 51
An infrastructure component was repaired.........................................................................................51
Custom property disabled....................................................................................................................52
Data replication difficulty..................................................................................................................... 52
Replication cleanup skipped for host.................................................................................................. 52
MPC: Process not shutdown cleanly....................................................................................................53
Protocol source configuration incorrect.............................................................................................. 53
Raid controller misconfiguration..........................................................................................................53
Restored system health by canceling hung transactions....................................................................54
Vulnerability scan notifications for QRadar appliances............................................................................ 54
External scan gateway failure.............................................................................................................. 54
Scan failure error.................................................................................................................................. 54
Scan tool failure....................................................................................................................................55
Scanner initialization error................................................................................................................... 55
Notices................................................................................................................57
Trademarks................................................................................................................................................ 58
Terms and conditions for product documentation................................................................................... 58
IBM Online Privacy Statement.................................................................................................................. 59
General Data Protection Regulation..........................................................................................................59
v
vi
About This Guide
This information is intended for use with IBM QRadar and provides diagnostic and resolution information
for common system notifications and errors that can be displayed when using QRadar SIEM.
IBM QRadar Troubleshooting and System Notifications Guide provides information on how to troubleshoot
and resolve system notifications that display on the QRadar console. System notifications that display on
the console can apply to any appliance or QRadar product in your deployment.
Unless otherwise noted, all references to QRadar can refer to the following products:
• IBM QRadar SIEM
• IBM QRadar Log Manager
Intended audience
System administrators responsible for troubleshooting must have administrative access to IBM QRadar
and your network devices and firewalls. The system administrator must have knowledge of your corporate
network and networking technologies.
Network administrators who are responsible for installing and configuring QRadar systems must be
familiar with network security concepts and the Linux operating system.
Technical documentation
To find IBM QRadar product documentation on the web, including all translated documentation, access
the IBM Knowledge Center (http://www.ibm.com/support/knowledgecenter/SS42VS/welcome).
For information about how to access more technical documentation in the QRadar products library, see
QRadar Support – Assistance 101 (https://ibm.biz/qradarsupport).
Procedure
To run health checks, type the following command.
drq
This command runs all available checks in /opt/ibm/si/diagnostiq with the checkup mode, and with
the summary output mode.
The following table shows the general parameters for DrQ.
The following table shows the output parameters for DrQ. These parameters are mutually exclusive.
drq -j | jq
-s Runs in summary mode. Outputs the number of successes and failures. This
is the default output mode for DrQ.
-v Runs in verbose mode. Outputs success and failure messages for each check.
Troubleshooting DSMs
Device Support Modules (DSMs) parse the events in IBM QRadar. You can think of DSMs as software plug-
ins that are responsible for understanding and parsing events that are provided by an event source. An
event source can be a security appliance, server, operating system, firewall, or database. DSMs can be any
type of system that generates an event when an action occurs.
How can you find these unknown or stored events in the Log Activity tab?
To find events specific to your device, you search in QRadar for the source IP address of your device. You
can also select a unique value from the event payload and search for Payload Contains. One of these
searches might locate your event, and it is likely either categorized as unknown or stored.
You can also add a search filter for Event in Unparsed. This search locates all events that either
cannot be parsed (stored) or events that might not be associated with a log source or auto discovered
(unknown).
What do you do if the product version you have is not listed in the DSM Configuration
Guide?
The DSM Configuration Guide contains a list of product manufacturers and the DSMs that are officially
tested and validated against specific products. If the DSM is for a product that is officially supported by
QRadar, but the version is out-of-date, you might need a DSM update to resolve any parsing issues. The
product versions in the DSM guide were officially tested in-house, but software updates by vendors might
add or change the event format for a specific DSM. In these cases, open a support ticket in IBM Support
for a review of the log source. (https://www.ibm.com/support/home/)
What do you do if the product device you have is not listed in the DSM Configuration
Guide?
If your product device is not listed in the DSM Configuration Guide, it is not officially supported. For
example, DSMs that appear on the IBM Security App Exchange are supplied by vendors and aren't
officially supported by IBM. Not having an official DSM doesn't mean that the events are not collected. It
indicates that the event that is received by QRadar might be identified as unknown on the Log Activity
tab. You have these options:
If the message is displayed repeatedly, then verify the problem. For more information, see “Verifying
partition storage problem” on page 6.
Procedure
1. Use SSH to log in QRadar Console.
2. Create a test by typing the following commands:
touch /store/backup/testfile
ls -la /store/backup/testfile
3. If one of the following two messages is displayed, increase the partition test timeout period.
• touch: cannot touch `/store/backup/testfile': Read-only file system
• nfs server time out
a) Click the Admin tab.
b) On the System Configuration menu, click System Settings > Advanced.
c) In the Partition Tester Timeout (seconds) list box, select or type 20.
A file might be cached by QRadar web service or your desktop browser. You must restart QRadar web
service and remove the cached files on your desktop.
Procedure
1. Use SSH to log in QRadar.
2. Stop the QRadar web service by typing the following command:
service tomcat stop
3. Keep one web browser window open.
4. To clear your browser cache, go to your web browser's preference settings.
5. Restart the browser.
6. Restart the QRadar web service by typing the following command:
service tomcat start
Procedure
1. Use SSH to log in QRadar or a managed host.
2. To review the disk partition usage, type the following command:
df -h
3. Review the partitions to check their disk usage levels.
What to do next
If any of the monitored partitions reach 95%, see “Resolving disk usage issues” on page 8.
Procedure
1. Identify and remove older debug or patch files in the / file system.
2. Reduce disk usage on the /store file system.
3. Choose one of the following options:
• Remove the oldest data from the /store/ariel/events file system.
• Reduce your data retention period by adjusting the default retention bucket storage settings. For
more information, see the IBM QRadar Administration Guide.
• If the /store file is full, identify which log sources you can retain for shorter periods. Use the
retention buckets to manage the log sources. For more information, see the IBM QRadar
Administration Guide.
• Consider an offboard storage solution such as iSCSI or Fibre Channel. For more information, see the
Offboard Storage Guide.
• If the /var/log file system reaches 100% capacity, QRadar does not shut down. Other issues
might cause your log files to grow faster than expected.
Events FAQ
Use these frequently asked questions and answers about events to help you understand how QRadar
correlates user activities in log files to generate offenses.
What is an Event?
In QRadar, an event is a message that is received and processed from a device on your network, and is a
log of a particular action on that device. For example, an SSH login on a UNIX server, a VPN connection to
a VPN device, or a firewall deny logged by your perimeter firewall are all events. These actions occur at an
instance of time and are recorded in log files.
What is coalescing?
Coalescing is used to reduce data that is processed by the event pipeline. As data comes in and is
coalesced, a large burst of events can convert hundreds of thousands of events into only a few dozen
records. This action is done while QRadar maintains the count of the number of actual events. Coalescing
gives QRadar the ability to detect, enumerate, and track an attack on a huge scale. It also protects the
performance of the pipeline by reducing the workload of the system, including storage requirements for
those events.
One limitation of coalescing occurs when data is being normalized. The first event in the coalesced record,
which is used as the base record, is the only one that is kept in its entirety, including the payload. You can
disable coalescing for devices and log sources that are used to track audit and compliance requirements
in your environment. Examples of these kinds of devices might be custom applications, any customer-
facing services, critical assets, or other important devices.
If a load balancer is used, do events get parsed by any event collector? Do multiple
log sources get created?
Any Syslog-based source that is sending data to a load balancer in front of QRadar can be parsed on any
of the event collectors. All auto-detected log sources in QRadar can be processed by any event collector
in the deployment. When auto-detection is triggered and a request to create a log source is sent to the
QRadar Console, the log source is created. Within a minute, all event processors and collectors are aware
of this new log source, and any data that is sent to any event processor is automatically associated with
that log source. Therefore, you can enable a load balancer in front of multiple event collectors and
processors.
One log source is created in this scenario. Multiple create commands can be sent from multiple
processors during the first few minutes that a log source is being detected, but it is only created once.
When the log source manager on the QRadar Console receives the create command, it creates the log
source if the log source does not exist. The log source manager ignores the create request if the log
source exists.
How does QRadar assign IP addresses from central Syslog servers and NAT devices?
If you have an existing central Syslog service infrastructure, or you want to add a forwarding rule to this
device that copies a stream of all events to the QRadar system. The IP address that QRadar uses is the
packet IP address. If you use a central Syslog server, you might see the server's IP address in many
events, and in the log source names.
To avoid this situation, configure the central Syslog server to add a prefix to a new Syslog header. This new
header includes the original source IP address of the packet that was received. In this practice, common
when forwarding events, QRadar provides this option as part of the forwarding destinations configuration.
When you add the prefix, the IP address of the original event source device is always in the Syslog header
Hostname field, and QRadar uses that IP address in the events. With NAT devices, you might need to go
back to the log source devices and reconfigure them to use the IP address of the host in the Syslog header
Hostname field, rather than a string-based host name. For example, syslog-ng services refer to this
option as chain_hostname.
Global views
A saved search that is grouped by multiple fields generates a global view that has many unique entries. As
the volume of data increases, disk usage, processing times, and search performance can be impacted.
To prevent increasing the volume of data, only aggregate searches on necessary fields. You can reduce
the impact on the accumulator by adding a filter to your search criteria.
Procedure
1. Disable any DSM extension or custom property that is recently installed or enabled.
2. Choose one of the following options:
• If QRadar stops dropping events and you receive a system notification, then review your DSM
extensions or custom properties to identify and improve the inefficient regex patterns.
• If QRadar continues dropping events, then multiple DSM extensions or custom properties might be
causing a problem with the event pipeline.
3. Use SSH to log in to the QRadar Event Processor that is dropping events and type the following
command:
/opt/qradar/support/threadTop.sh –p 7777
The command displays the data processing engine activity. The following table describes the columns
in the output:
What to do next
If the Java thread stack contains java.util.regex.Pattern$Curly.match, then the performance
degradation might be caused by your expensive DSM extensions or custom properties. For more
information, see “Expensive DSM extensions were found” on page 45 or “Expensive custom properties
found” on page 33.
If the Java thread stack doesn't have expensive regular expressions, then your DSM extensions or custom
properties might have parsing issues. For more information, see the parsing issues topic in the IBM
QRadar Log Sources User Guide.
System notifications about limited disk space occur when free space in the /store/backup/ partition is
less than double the last backup file size. Limited disk space results from the volume of data and your
backup retention period settings. For more information, see the IBM QRadar Administration Guide.
When you configure the retention bucket storage settings, a global impact occurs on the storage across
your QRadar deployment.
Disk usage warnings can occur on the QRadar Console or any managed host in your QRadar deployment.
To check disk usage levels, review the monitored partitions on your QRadar Console or managed hosts.
Procedure
1. Log in to QRadar Console.
2. Click the Admin tab.
3. On the navigation menu, click System Configuration.
4. Click the System and License Management icon.
5. From the Display list box, select Licenses.
6. Select the expired license.
License Information Messages lists the expired licenses.
7. Click Actions > Delete License.
8. Click Confirm.
What to do next
Update your expired license. For more information, see Uploading a license key (http://www.ibm.com/
support/knowledgecenter/SS42VS_7.2.7/com.ibm.qradar.doc/t_qradar_adm_upload_license_key.html).
You can manually synchronize data between the QRadar server and the LDAP authentication server.
If you use authorization that is based on user attributes or groups, user information is automatically
imported from the LDAP server to the QRadar console.
Each group that is configured on the LDAP server must have a matching user role or security profile that is
configured on the QRadar console. For each group that matches, the users are imported and assigned
permissions that are based on that user role or security profile.
By default, synchronization happens every 24 hours. The timing for synchronization is based on the last
run time. For example, if you manually run the synchronization at 11:45 pm, and set the synchronization
interval to 8 hours, the next synchronization will happen at 7:45 am. If the access permissions change for
a user that is logged in when the synchronization occurs, the session becomes invalid. The user is
redirected back to the login screen with the next request.
do these steps.
Procedure
1. If your Active Directory was not recently configured, use SSH to log in to QRadar as the root user.
Procedure
1. Log in to the QRadar Console as an administrative user.
2. On the Admin tab, click the Auto Update icon.
3. Click Get New Updates. Wait for the connection and updates to complete. A dashboard system
notification is generated when updates are successfully downloaded or when errors occur.
4. Click View Log to view a detailed summary.
• If the update fails, a connection error message is displayed.
• If the update is successful, the log provides a success message and displays the most current
updates as "already installed."
5. If the test fails, try the test again or verify that any corporate firewall and proxy settings are enabled to
allow external connections.
Related information
QRadar: Important auto update server changes for administrators
Procedure
1. Use SSH to log in to QRadar as the root user.
2. If the syslog destination is on another appliance, such as an event collector, use SSH to log in to the
event collector.
3. Choose one of the following options.
• For a TCP syslog, type the following command:
tcpdump -s 0 -A host Device_Address and port 514
• For a UDP syslog, type the following command:
tcpdump -s 0 -A host Device_Address and udp port 514
The Device_Address must be an IPv4 address or a host name. The tcpdump command must run on the
QRadar appliance that receives the events from your device. By default, QRadar appliances are
configured to receive syslog events by using TCP or UDP and port 514. Do not configure the QRadar
firewall.
4. If the tcpdump command do not display events, then the syslog events are not sent to the QRadar
Console.
a) Ask your firewall administrator or operations group to check for firewalls that block communication
between the QRadar appliance and the device.
Procedure
1. Review your system notifications.
2. If the system notifications display the incorrect source address for the log source, choose one of the
following options:
• Manually re-create the log source.
• Update the Log Source Identifier field with the correct host name or IP address.
3. Verify that the device supports QRadar automatic discovery.
The IBM QRadar DSM Configuration Guide appendix lists which Device Support Modules (DSMs)
support automatic log source creation.
4. Verify that the log sources in QRadar match the tcpdump results.
a) Search for the log source host name or packet IP address in the tcpdump results.
b) Click the Admin tab.
c) On the navigation menu, click Data Sources.
d) In the Events pane, click Log Sources.
e) Search for the log source host name or packet IP address.
If the QRadar host name or packet IP address does not match the tcpdump results, then the log
source might be created with an incorrect address. For some devices, unexpected values occur in the
syslog header when the event source handles events from multiple devices. Your device might be able
to preserve the original event IP address before the syslog event is sent.
5. Search for a unique payload value in QRadar.
a) Review the tcpdump raw payloads.
b) Select an identifier that is unique to your event source.
c) Click the Log Activity tab.
d) On the toolbar, click Add Filter.
e) From the Parameter menu, select Payload Contains.
f) In the Value field, type your unique identifier.
g) Review the search results.
What to do next
If the results return a different log source, then an auto-detection false positive occurred. Delete the
wrongly detected log source.
If the log source is discovered incorrectly, verify that your QRadar Console is installed with the latest DSM
version. Rediscover the log source.
Procedure
1. Download the Certificate Authority (CA) content from the QRadar server:
a) Download the root CA from http://<host_ip>:9381/vault-qrd_ca.pem
b) Download the intermediate CA from http://<host_ip>:9381/vault-qrd_ca_int.pem
Tip: If you need the CA bundle, you can concatenate the intermediate CA with the root CA.
2. Copy the CA files to your local computer, and then log out of QRadar.
3. Import the CA into your browser by using the appropriate method for your browser:
• Mozilla Firefox [https://www.cyberciti.biz/faq/firefoxadding- trusted-ca/]
• Google Chrome[https://wiki.wmtransfer.com/projects/ webmoney/wiki/
Installing_root_certificate_in_Google_Chrome]
• Microsoft Internet Explorer [https://msdn.microsoft.com/en-us/library/ cc750534.aspx]
• Apple Safari [https://portal.threatpulse.com/docs/sol/ Solutions/ManagePolicy/SSL/
ssl_safari_cert_ta.htm]
• Opera [http://wiki.wmtransfer.com/projects/ webmoney/wiki/ Installing_root_certificate_in_Opera]
4. Restart the browser to ensure that the certificate is loaded.
5. Log in to QRadar and verify that the browser no longer displays the security warning.
Procedure
1. On the QRadar Console, type the following command: /opt/qradar/vault/bin/install-
qradar-ca.sh
2. Restart Tomcat on the Console by typing the following command: service tomcat restart
3. Log in to the Console, navigate to the Admin tab, and then click Deploy.
Explanation
An asset change exceeded the change threshold and the asset profile manager ignores the asset change
request.
The asset profile manager includes an asset persistence process that updates the profile information for
assets. The process collects new asset data and then queues the information before the asset model is
updated. When a user attempts to add or edit an asset, the data is stored in temporary storage and added
to the end of the change queue. If the change queue is large, the asset change can time out and the
temporary storage is deleted.
Explanation
The system detected one or more asset profiles in the asset database that show deviating or abnormal
growth. Deviating growth occurs when a single asset accumulates more IP addresses, DNS host names,
NetBIOS names, or MAC addresses than the system thresholds allow. When growth deviations are
detected, the system suspends all subsequent incoming updates to these asset profiles.
User response
Determine the cause of the asset growth deviations:
• Hover your mouse over the notification description to review the notification payload. The payload
shows a list of the top five most frequently deviating assets. It also provides information about why the
system marked each asset as a growth deviation and the number of times that the asset attempted to
grow beyond the asset size threshold.
• In the notification description, click Review a report of these assets to see a complete report of asset
growth deviations over the last 24 hours.
• Review Updates to asset data (http://www.ibm.com/support/knowledgecenter/SS42VS_7.3.1/
com.ibm.qradar.doc/c_qradar_ug_asset_reconciliation.html).
Blacklist notification
38750136 - The Asset Reconciliation Exclusion rules added new asset data to the
asset blacklists.
Explanation
A piece of asset data, such as an IP address, host name, or MAC address, shows behavior that is
consistent with asset growth deviations.
An asset blacklist is a collection of asset data that is considered untrustworthy by the asset reconciliation
exclusion custom engine rules. The rules monitor asset data for consistency and integrity. If a piece of
asset data shows suspicious behavior twice or more within 2 hours, that piece of data is added to the
asset blacklists. Subsequent updates that contain blacklisted asset data are not applied to the asset
database.
User response
• In the notification description, click Asset Reconciliation Exclusion rules to see the rules that are used
to monitor asset data.
• In the notification description, click Asset deviations by log source to view the asset deviation reports
that occurred in the last 24 hours.
• If your blacklists are populating too aggressively, you can tune the asset reconciliation exclusion rules
that populate them.
Explanation
When a scan profile includes a CIDR range or IP address outside of the defined asset list, the scan
continues. However, any CIDR ranges or IP addresses for assets that are not within your external scanner
list are ignored.
User response
Update the list of authorized CIDR ranges or IP addresses for assets that are scanned by your external
scanner. Review your scan profiles to ensure that the scan is configured for assets that are included in the
external network list.
Explanation
The most common reason for automatic update errors is a missing software dependency for a DSM,
protocol, or scanner update.
User response
Select one of the following options:
• In the Admin tab, click the Auto Update icon and select View Update History to determine the cause
of the installation error. You can view, select, and then reinstall a failed RPM.
• If an auto update is unable to reinstall through the user interface, manually download and install the
missing dependency on your console. The console replicates the installed file to all managed hosts.
Explanation
The update process encountered an error or cannot connect to an update server. The system is not
updated.
User response
Select one of the following options:
Explanation
Automatic software updates were successfully downloaded and installed.
User response
No action is required.
Explanation
Software updates were automatically downloaded.
User response
Click the link in the notification to determine whether any downloaded updates require installation.
Explanation
An automatic update, such as an RPM update, was downloaded and requires that you deploy the change
to finish the installation process.
User response
In the Admin tab, click Deploy Changes.
Explanation
The custom rules engine (CRE) on an event processor is unable to read a rule to correlate an incoming
event. The notification might contain one of the following messages:
• If the CRE was unable to read a single rule, in most cases, a recent rule change is the cause. The
payload of the notification message displays the rule or rule of the rule chain that is responsible.
• In rare circumstances, data corruption can cause a complete failure of the rule set. An application error
is displayed and the rule editor interface might become unresponsive or generate more errors.
User response
For a single rule read error, review the following options:
• To locate the rule that is causing the notification, temporarily disable the rule.
• Edit the rule to revert any recent changes.
• Delete and re-create the rule that is causing the error.
For application errors where the CRE failed to read rules, contact Customer Support.
Explanation
A single rule referred to itself directly or to itself through a series of other rules or building blocks. The
error occurs when you deploy a full configuration. The rule set is not loaded.
User response
Edit the rules that created the cyclic dependency. The rule chain must be broken to prevent a recurring
system notification. After the rule chain is corrected, a save automatically reloads the rules and resolves
the issue.
Explanation
The custom rules engine (CRE) is a process that validates if an event matches a rule set and then trigger
alerts, offenses, or notifications.
A user can create a custom rule that has a large scope, uses a regex pattern that is not efficient, includes
Payload contains tests, or combines the rule with regular expressions. When this custom rule is used, it
negatively impacts performance, which can cause events to be incorrectly routed directly to storage.
Events are indexed and normalized but they don't trigger alerts or offenses.
When multiple, expensive, or inefficient rule tests are used, the maximum event throughput rate can be
reduced, causing backlogs of events to go through the rules engine. Events might be routed directly to
storage, and this warning is displayed.
• On the Offenses tab, click Rules and use the search window to find and either edit or disable the
expensive rule. By editing the rule, you can reduce the amount of data that goes through the rule, by
applying a log source or IP address range filter. Expensive tests, such as payload contains, can also be
reduced or removed if they are not required. Reference set tests are to be reviewed to ensure that they
are not querying a large reference set.
• Use SSH to log in to the Event Processor and verify that parser threads are running for longer than 1500
milliseconds for EPS loads by using the following command:
/opt/qradar/support/threadTop.sh
Search the Java thread stack for regex.Pattern.Curly, referenceSet, assets, host profile,
and port profile by using the following command:
/opt/qradar/support/threadTop.sh -p 7799 -s -e ".*CRE Processor.*"
– If the output contains regex.Pattern.Curly, issues with Payload contains tests are possible.
– If the output contains referenceSet, issues might occur with tests against large reference sets.
– If the output contains assets, host profile, and port profile, issues might occur with Host
with port open tests or asset tests.
Explanation
The system detected the spillover disk space that is assigned to the asset persistence queue is full. Asset
persistence updates are blocked until disk space is available. Information is not dropped.
User response
Reduce the size of your scan. A reduction in the size of your scan can prevent the asset persistence
queues from overflowing.
Explanation
The system detected that the spillover disk space that is assigned to the asset resolver queue is full.
The system continually writes the data to disk to prevent any data loss. However, if the system has no disk
space, it drops scan data. The system cannot handle incoming asset scan data until disk space is
available.
User response
Review the following options:
• Ensure that your system has free disk space. The notification can accompany SAR Sentinel notifications
to notify you of potential disk space issues.
• Reduce the size of your scans.
• Decrease the scan frequency.
Disk failure
38750110 - Disk Failure: Hardware Monitoring has determined that a disk is in
failed state.
Explanation
On-board system tools detected that a disk failed. The notification message provides information about
the failed disk and the slot or bay location of the failure.
User response
If the notification persists, contact Customer Support or replace the parts.
Explanation
The asset profile manager includes a process, change listener, that calculates statistics to update the
CVSS score of an asset. The system writes the data to disk, which prevents data loss of pending asset
statistics. However, if the disk space is full, the system drops scan data.
User response
Select one of the following options:
• Ensure that your system has sufficient free disk space.
• Reduce the size of your scans.
• Decrease the scan frequency.
Explanation
If the replication queue fills on the primary appliance, system load on the primary might increases.
Replication issues are commonly caused by performance issues on the primary system, or storage issues
on the secondary system, or bandwidth problems between the appliances.
User response
Select one of the following options:
• Review bandwidth activity by loading a saved search MGMT: Bandwidth Manager from the Log Activity
tab. This search displays bandwidth usage between the console and hosts.
• If SAR sentinel notifications are recurring on the primary appliance, Distributed Replicated Block Device
queues might be full on the primary system.
• Use SSH and the cat /proc/drbd command to monitor the Distributed Replicated Block Device
status of the primary or secondary hosts.
Explanation
The disk sentry detected that the storage partition is available after the notification from “Disk storage
unavailable” on page 29 appeared. Disk unavailability was resolved.
User response
No action is required.
Related concepts
Disk storage unavailable
Explanation
The disk sentry did not receive a response within 30 seconds. A storage partition issue might exist, or the
system might be under heavy load and not able to respond within the 30-second threshold.
User response
Select one of the following options:
• Verify the status of your /store partition by using the touch command.
If the system responds to the touch command, the unavailability of the disk storage is likely due to
system load.
• Determine whether the notification corresponds to dropped events.
The system drops events when it cannot write events to disk. Investigate the status of storage partitions.
Related concepts
Disk storage available
38750093 - One or more storage partitions that were previously inaccessible are
now accessible.
Explanation
At least one disk on your system is 95% full.
To prevent data corruption, some processes shut down. Event collection is suspended until the disk usage
falls below 92%.
User response
Identify which partition is full, such as the / and /store file systems. Free disk space by deleting files
that are not needed. For example, remove debug output and patch files from the / file system. If the /
store file system is at 95% capacity, look to the subdirectories to determine whether you can move the
files to a temporary location or you can delete any files.
Note: If the files are deleted, you cannot search these events.
You can also manually delete older data in the /store/ariel/ directories. The system automatically
restarts processes after you free enough disk space to fall below a threshold of 92% capacity.
Explanation
The disk sentry detected that the disk usage on your system is greater than 90%.
User response
You must free some disk space by deleting files or by changing your data retention policies. The system
can automatically restart processes after the disk space usage falls below a threshold of 92% capacity.
Explanation
The disk sentry detected that the disk usage is below 90% of the overall capacity.
User response
No action is required.
Explanation
If the export directory does not contain enough space, the export of event, flow, and offense data is
canceled.
User response
Select one of the following options:
• Free some disk space in the /store/exports directory.
• Configure the Export Directory property in the System Settings window to use to a partition that has
sufficient disk space.
• Configure an offboard storage device.
Explanation
The system monitors the status of the hardware on an hourly basis to determine when hardware support
is required on the appliance.
The on-board system tools detected that a disk is approaching failure or end of life. The slot or bay
location of the failure is identified.
User response
Schedule maintenance for the disk that is in a predictive failed state.
Explanation
The process monitor is unable to start processes because of a lack of system resources. The storage
partition on the system is likely 95% full or greater.
User response
Free some disk space by manually deleting files or by changing your event or flow data retention policies.
The system automatically restarts system processes when the used disk space falls below a threshold of
92% capacity.
Explanation
If too many indexes are enabled or the system is overburdened, the system might drop the event or flow
from the index portion.
User response
Select one of the following options:
• If the dropped index interval occurs with SAR sentinel notifications, the issue is likely due to system
load or low disk space.
• To temporarily disable some indexes to reduce the system load, on the Admin tab, click the Index
Management icon.
Explanation
A TCP-based protocol dropped an established connection to the system.
The number of connections that can be established by TCP-based protocols is limited to ensure that
connections are established and events are forwarded. The event collection service (ECS) allows a
maximum of 15,000 file handles and each TCP connection uses three file handles.
TCP protocols that provide drop connection notifications include the following protocols:
• TCP syslog protocol
• TLS syslog protocol
• TCP multi-line protocol
User response
Review the following options:
Explanation
If there is an issue with the event pipeline or you exceed your license limits, an event or flow might be
dropped.
Dropped events and flows cannot be recovered.
User response
Review the following options:
• Verify the incoming event and flow rates on your system. If the license is exceeded and the event
pipeline is dropping events, expand your license to handle more data.
• Review the recent changes to rules or custom properties. Rule or custom property changes can cause
changes to your event or flow rates and might affect system performance.
• Determine whether the issue is related to SAR notifications. SAR notifications might indicate that
queued events and flows are in the event pipeline. The system usually routes events to storage, instead
of dropping the events.
• Tune the system to reduce the volume of events and flows that enter the event pipeline.
Explanation
To prevent queues from filling, and to prevent the system from dropping events, the event collection
system (ECS) routes data to storage. Incoming events and flows are not categorized. However, raw event
and flow data is collected and searchable.
User response
Review the following options:
• Verify the incoming event and flow rates. If the event pipeline is queuing events, expand your license to
hold more data. To determine how close you are to your EPS/FPM license limit, monitor the Event Rate
(Events Per Second Raw) graph on the System Monitoring dashboard. The graph shows you the
current data rate. Compare the data rate to the per-appliance license configuration in your deployment.
For more information about EPS/FPM license limits, see QRadar: About EPS & FPM Limits (https://
www.ibm.com/support/pages/qradar-about-eps-fpm-limits).
• Review recent changes to rules or custom properties. Rule or custom property changes might cause
sudden changes to your event or flow rates. Changes might affect performance or cause the system to
route events to storage.
• DSM parsing issues can cause the event data to route to storage. To verify whether the log source is
officially supported, see the DSM Configuration Guide.
• SAR notifications might indicate that queued events and flows are in the event pipeline.
Explanation
During normal processing, custom event and custom flow properties that are marked as optimized are
extracted in the pipeline during processing. The values are used in the custom rules engine (CRE) and
search indexes.
Regex statements, which are improperly formed regular expressions, can cause events to be incorrectly
routed directly to storage.
User response
Select one of the following options:
• Disable any custom property that was recently installed.
• Review the payload of the notification. If possible, improve the regex statements that are associated
with the custom property.
For example, the following payload reports the regex pattern:
• Modify the custom property definition to narrow the scope of categories that the property tries to
match.
• Specify a single event name in the custom property definition to prevent unnecessary attempts to parse
the event.
• Order your log source parsers from the log sources with the most sent events to the least and disable
unused parsers.
Explanation
The QFlow Collector process contains an advanced function for configuring a server IP address for time
synchronization. In most cases, do not configure a value. If configured, the QFlow process attempts to
synchronize the time every hour with the IP address time server.
Explanation
Each appliance is allocated a specific volume of event and flow data from the license pool. In the last
hour, the appliance exceeded the allocated EPS or FPM.
If the appliance continues to exceed the allocated capacity, the system might queue events and flows, or
possibly drop the data when the backup queue fills.
User response
• Adjust the license pool allocations to increase the EPS and FPM capacity for the appliance.
• Tune the system to reduce the volume of events and flows that enter the event pipeline.
Explanation
A synchronization issue occurred. The aggregate data view configuration that is in memory wrote
erroneous data to the database.
To prevent data corruption, the system disables aggregate data views. When aggregate data views are
disabled, time series graphs, saved searches, and scheduled reports display empty graphs.
User response
Contact Customer Support.
Explanation
This message appears when the system is unable to accumulate data aggregations within a 60-second
interval.
Every minute, the system creates data aggregations for each aggregated search. The data aggregations
are used in time-series graphs and reports and must be completed within a 60-second interval. If the
count of searches and unique values in the searches are too large, the time that is required to process the
aggregations might exceed 60 seconds. Time-series graphs and reports might be missing columns for the
time period when the problem occurred.
User response
The following factors might contribute to the increased workload that is causing the accumulator to fall
behind:
Frequency of the incomplete accumulations
If the accumulation fails only once or twice a day, the drops might be caused by increased system
load due to large searches, data compression cycles, or data backup.
Infrequent failures can be ignored. If the failures occur multiple times per day, during all hours, you
might want to investigate further.
High system load
If other processes use many system resources, the increased system load can cause the aggregations
to be slow. Review the cause of the increased system load and address the cause, if possible.
For example, if the failed accumulations occur during a large data search that takes a long time to
complete, you might prevent the accumulator drops by reducing the size of the saved search.
Large accumulator demands
If the accumulator intervals are dropped regularly, you might need to reduce the workload.
The workload of the accumulator is driven by the number of aggregations and the number of unique
objects in those aggregations. The number of unique objects in an aggregation depends on the group-
by parameters and the filters that are applied to the search.
For example, a search that aggregates for services filters the data by using a local network hierarchy
item, such as DMZ area. Grouping by IP address might result in a search that contains up to 200
unique objects. If you add destination ports to the search, and each server hosts 5 - 10 services on
different ports, the new aggregate of destination.ip + destination.port can increase the
number of unique objects to 2000. If you add the source IP address to the aggregate and you have
thousands of remote IP addresses that hit each service, the aggregated view might have hundreds of
thousands of unique values. This search creates a heavy demand on the accumulator.
To review the aggregated views that put the highest demand on the accumulator:
1. On the Admin tab, click Aggregated Data Management.
2. Click the Data Written column to sort in descending order and show the largest views.
3. Review the business case for each of the largest aggregations to see whether they are still
required.
Related concepts
Incomplete report results
After you configure and run IBM QRadar reports, you might see unexpected results. A report might seem
like it does not display all the data that you require.
Explanation
If a configuration is not saved correctly, or if a configuration file is corrupted, the event collection service
(ECS) might fail to initialize. If the traffic analysis process is not started, new log sources are not
automatically discovered.
User response
Select one of the following options:
Explanation
When the message service (IMQ) or PostgreSQL database cannot start or rebuild, the managed host
cannot operate properly or communicate with the console.
User response
Contact Customer Support.
Explanation
The system is unable to start an application or process on your system.
User response
Review which components are failing. For example, QFlow Collector fails to start when no flow sources
are assigned. Use the deployment actions to remove that QFlow component.
Explanation
If the schedule contains a short start and end time or many events to forward, the event collector might
not have sufficient time to transfer the queued events. Events are stored until the next opportunity to
forward events. When the next store and forward interval occurs, the events are forwarded to the event
processor.
User response
Increase the event forwarding rate from your event collector or increase the time interval that is
configured for forwarding events.
Explanation
The managed host cannot synchronize with the console or the secondary HA appliance cannot
synchronize with the primary appliance.
User response
Contact customer support.
Explanation
Valid credentials are required to authorize automatic downloads from the update server.
User response
Select one of the following options:
• Administrators must register for an account on the IBM support website (http://www.ibm.com/
support/).
• To view the automatic update settings, on the Admin tab, click the Auto Update icon and select Change
Settings > Advanced. Administrators can confirm that the user name and password in the Settings
window are correct.
Explanation
The system attempted to update a user account with more permissions, but the user account or user role
does not exist.
User response
On the Admin tab, click Deploy Changes. Updates to user accounts or roles require that you deploy the
change.
Explanation
Servers and clients use certificates to establish communication that uses Secure Sockets Layer (SSL) or
Transport Layer Security (TLS). Certificates are issued with an expiration date that indicates how long the
certificate remains valid. This message is first shown when QRadar determines that the certificate that is
used for SAML authentication is set to expire within the next 14 days. The message is shown again at
specific intervals that lead up to the expiration date.
User response
Select one of the following options:
Certificate is expired
38750162 - The certificate named <certificate_name> has expired. Please update
the certificate as soon as possible.
Explanation
Servers and clients use certificates to establish communication that uses Secure Sockets Layer (SSL) or
Transport Layer Security (TLS). Certificates are issued with an expiration date that indicates how long the
certificate remains valid.
This message appears when the certificate that is used for SAML authentication is expired. The message
appears once a day and QRadar users cannot log in until the expired certificate is replaced or renewed.
User response
Select one of the following options:
• If you are using the QRadar_SAML certificate that is provided with QRadar, renew the certificate.
• If you are using a 3rd-party certificate, add a certificate.
For more information, see SAML single sign-on authentication in the IBM QRadar Administration Guide.
Explanation
The active system cannot communicate with the standby system because the active system is
unresponsive or failed. The standby system takes over operations from the failed active system.
User response
Review the following resolutions:
• Inspect the active HA appliance to determine whether it is powered down or experienced a hardware
failure.
• If the active system is the primary HA, restore the active system.
Click the Admin tab and click System and License Management. From the High Availability menu,
select the Restore System option.
• Review the /var/log/qradar.log file on the standby appliance to determine the cause of the failure.
• Use the ping command to check the communication between the active and standby system.
• Check the switch that connects the active and standby HA appliances.
Verify the IPtables on the active and standby appliances.
Explanation
When you remove a HA appliance, the installation process removes connections and data replication
processes between the primary and secondary appliances. If the installation process cannot remove the
HA appliance from the cluster properly, the primary system continues to work normally.
User response
Try to remove the HA appliance a second time.
Explanation
When you install a high availability (HA) appliance, the installation process links the primary and
secondary appliances. The configuration and installation process contains a time interval to determine
when an installation requires attention. The high-availability installation exceeded the six-hour time limit.
No HA protection is available until the issue is resolved.
User response
Contact Customer Support.
Explanation
The status of the secondary appliance switches to failed and the system has no HA protection.
User response
Review the following resolutions:
• Restore the secondary system.
Click the Admin tab, click System and License Management, and then click Restore System.
• Inspect the secondary HA appliance to determine whether it is powered down or experienced a
hardware failure.
• Use the ping command to check the communication between the primary and standby system.
• Check the switch that connects the primary and secondary HA appliances.
Verify the IPtables on the primary and secondary appliances.
• Review the /var/log/qradar.log file on the standby appliance to determine the cause of the failure.
Explanation
When a license expires on the console, a new license must be applied. When a license expires on a
managed host, the appliance continues to process events and flows up to the rate that is allocated from
the shared license pool.
When the license contributes EPS and FPM capacity to the shared license pool, the expiry might force the
shared license pool into a deficit where it does not have enough capacity to meet the requirements of the
deployment. In a deficit situation, functionality on the Network Activity and Log Activity tabs is blocked.
User response
1. Determine which appliance has the expired license.
a. On the Admin tab, click System and License Management.
b. In the Display box, select Licenses.
Expired licenses are shown in the License Information Messages section.
2. If the expired license is on the console, replace it.
3. If the expired license is on a managed host, review the shared license pool to ensure that the system
has enough EPS and FPM capacity.
a. If the shared license pool is over-allocated, replace the expired license with a new license that has
enough EPS and FPM to meet the system capacity requirements.
b. If the license pool has enough capacity, delete the expired license. In the License table, select the
row for the expired license (shown nested beneath the managed host summary row), and select
Actions > Delete License.
Explanation
The system detected that a license for an appliance is within 35 days of expiration.
User response
No action is required.
Explanation
The license is expired for a managed host. All data collection processes stop on the appliance.
User response
Contact your sales representative to renew your license.
Explanation
The accumulator process counts and prepares events and flows in data accumulations to assist with
searches, displaying charts, and report performance. The accumulator process aggregates data in pre-
defined time spans to create aggregate data views. An aggregate data view is a data set that is used to
draw a time series graph, create scheduled reports, or trigger anomaly detection rules.
The Console is limited to 130 active aggregate data views.
The following user actions can create a new aggregate data view:
• New anomaly detection rules.
• New reports.
• New saved searches that use time series data.
When the aggregate data view limit is reached, the notification is generated. As users attempt to create
new anomaly rules, reports, or saved searches, they are prompted in the user interface that the system is
at the limit.
User response
To resolve this issue, administrators can review the active aggregate data views on the Admin tab in the
Aggregated Data Management window. The aggregated data management feature provides information
on the reports, searches, and anomaly detection rules in use by each aggregate data view. The
administrator can review the list of aggregate data views to determine what data is most import to the
users. Aggregate data views can be disabled to allow users to create a new rule, report, or saved search
that requires an aggregate data view.
If the administrator decides to delete an aggregate data view, a summary provides an outline of the
searches, rules, or reports affected. To re-create a deleted aggregate data view, the administrator needs
only to re-enable or re-create the search, anomaly rule, or report. The system automatically creates the
aggregate data view based on the data required.
Explanation
The transaction sentry determines that an outside process, such as a database replication issue,
maintenance script, auto update, or command line process, or a transaction is causing a database lock.
Most processes cannot run for more than an hour. Repeated occurrences with the same process need to
be investigated.
User response
Select one of the following options:
Explanation
The system cancels the report that exceeded the time limit. Reports that run longer than the following
default time limits are canceled.
User response
Select one of the following options:
• Reduce the time period for your report, but schedule the report to run more frequently.
• Edit manual reports to generate on a schedule.
A manual report might rely on raw data but not have access to accumulated data. Edit your manual
report and change the report to use an hourly, daily, monthly, or weekly schedule.
Explanation
The transaction sentry determines that a managed process, such as Tomcat or event collection service
(ECS) is the cause of a database lock.
A managed process is forced to restart.
User response
To determine the process that caused the error, review the qradar.log for the word TxSentry.
Explanation
The system contains a limit to the number of log sources that can be queued for automatic discovery by
traffic analysis. If the maximum number of log sources in the queue is reached, then new log sources
cannot be added.
Events for the log source are categorized as SIM Generic and labeled as Unknown Event Log.
User response
Select one of the following options:
• Review SIM Generic log sources on the Log Activity tab to determine the appliance type from the event
payload.
• Ensure that automatic updates can download the latest DSM updates to properly identify and parse log
source events.
• Verify whether the log source is officially supported.
If your appliance is supported, manually create a log source for the events that were not automatically
discovered.
• If your appliance is not officially supported, create a universal DSM to identify and categorize your
events.
• Wait for the device to provide 1,000 events.
If the system cannot auto discover the log source after 1,000 events, it is removed from the traffic
analysis queue. Space becomes available for another log source to be automatically discovered.
Explanation
The default time limit of 1 hour for an individual process to complete a task is exceeded.
User response
Review the running process to determine whether the task is a process that can continue to run or must
be stopped.
Explanation
The system activity reporter (SAR) utility detected that your system load returned to acceptable levels.
User response
No action is required.
Explanation
The system activity reporter (SAR) utility detected that your system load is above the threshold. Your
system can experience reduced performance.
User response
Review the following options:
• In most cases, no resolution is required.
For example, when the CPU usage over 90%, the system automatically attempts to return to normal
operation.
• If this notification is recurring, increase the default value of the SAR sentinel.
Click the Admin tab, then click Global System Notifications. Increase the notification threshold.
• For system load notifications, reduce the number of processes that run simultaneously.
Stagger the start time for reports, vulnerability scans, or data imports for your log sources. Schedule
backups and system processes to start at different times to lessen the system load.
Explanation
The custom rules engine (CRE) cannot respond to a rule because the response threshold is full.
Generic rules or a system that is tuned can generate a many response actions, especially systems with the
IF-MAP option enabled. Response actions are queued. Response actions might be dropped if the queue
exceeds 2000 in the event collection system (ECS) or 1000 response actions in Tomcat.
User response
• If the IF-MAP option is enabled, verify that the connection to the IF-MAP server exists and that a
bandwidth problem is not causing rule response to queue in Tomcat.
• Tune your system to reduce the number of rules that are triggering.
Explanation
Errors were encountered while the log files were being collected. The log file collection failed.
User response
To view information about why the collection failed, follow these steps:
1. Click System and License Manager in the notification message.
Explanation
A log source extension is an XML file that includes all of the regular expression patterns that are required
to identify and categorize events from the event payload. Log source extensions might be referred to as
device extensions in error logs and some system notifications.
During normal processing, log source extensions run in the event pipeline. The values are immediately
available to the custom rules engine (CRE) and are stored on disk.
Improperly formed regular expressions (regex) can cause events to be routed directly to storage.
User response
Select one of the following options:
• Disable any DSM extension that was recently installed.
• Review the payload of the notification to determine which expensive DSM extension in the pipeline
affects performance. If possible, improve the regex statements that are associated with the device
extension.
For example, the following payload reports that the pipeline is blocked by the Checkpoint DSM:
• Ensure that the log source extension is applied only to the correct log sources.
On the Admin tab, click System Configuration > Data Sources > Log Sources. Select each log source
and click Edit to verify the log source details.
• If you are working with protocol-based log sources, reduce the event throttle to ensure that the events
do not buffer to disk. The event throttle settings are part of the protocol configuration for the log source.
• Order your log source parsers from the log sources with the most sent events to the least and disable
unused parsers.
• Verify that your Console is installed with the latest DSM versions.
• If log sources are created for devices that aren’t in your environment, remove the log sources by using
the following command:
/opt/qradar/bin/tatoggle.pl
If you have multiple event processors, copy the /opt/qradar/conf/
TrafficAnalysisConfig.xml file to the /store/configservices/staging/globalconfig/
directory. On the Admin tab, click Deploy Full Configuration for all managed hosts to obtain the
configuration file.
Explanation
The log files were successfully collected.
User response
To download the log file collection, follow these steps:
1. Click System and License Manager in the notification message.
2. Expand System Support Activities Messages.
3. Click Click here to download file.
Explanation
Traffic analysis is a process that automatically discovers and creates log sources from events. If you are at
your current log source license limit, the traffic analysis process might create the log source in the
disabled state. Disabled log sources do not collect events and do not count in your log source limit.
User response
Review the following options:
• On the Admin tab, click the Log Sources icon and disable or delete low priority log sources. Disabled log
sources do not count towards your log source license.
• Ensure that deleted log sources do not automatically rediscover. You can disable the log source to
prevent automatic discovery.
• Ensure that you do not exceed your license limit when you add log sources in bulk.
• If you require an expanded license to include more log sources, contact your sales representative.
Explanation
When events are sent from an undetected or unrecognized device, the traffic analysis component needs a
minimum of 25 events to identify a log source.
If the log source is not identified after 1,000 events, the system abandons the automatic discovery
process and generates the system notification. The system then categorizes the log source as SIM
Generic and labels the events as Unknown Event Log.
User response
Review the following options:
• Review the IP address in the system notification to identify the log source.
Explanation
Disk Sentry is responsible for monitoring system disk and storage issues. Before a backup begins, Disk
Sentry checks the available disk space to determine whether the backup can complete successfully. If the
free disk space is less than two times the size of the last backup, the backup is canceled. By default,
backups are stored in /store/backup.
User response
To resolve this issue, select one of the following options:
• Free up disk space on your appliance to allow enough space for a backup to complete in /store/
backup.
• Configure your existing backups to use a partition with free disk space.
• Configure more storage for your appliance. For more information, see the Offboard Storage Guide.
Explanation
A backup cannot start or cannot complete for one of the following reasons:
• The system is unable to clean the backup replication synchronization table.
• The system is unable to run a delete request.
• The system is unable to synchronize backup with the files that are on the disk.
• The NFS-mounted backup directory is not available or has incorrect NFS export options
(no_root_squash).
• The system cannot initialize on-demand backup.
• The system cannot retrieve configuration for the type of backup that is selected.
User response
Manually start a backup to determine whether the failure reoccurs. If multiple backups fail to start,
contact Customer Support.
Explanation
The error is commonly caused by configuration errors in the configuration source management (CSM) or if
a backup is canceled by a user.
User response
Select one of the following options:
• Review the credentials and address sets in CSM to ensure that the appliance can log in.
• Verify the protocol that is configured to connect to your network device is valid.
• Ensure that your network device and version is supported.
• Verify that your network device connects to the appliance.
• Verify that the most current adapters are installed.
Explanation
The time limit is determined by the backup priority that you assign during configuration.
User response
Select one of the following options:
• Edit the backup configuration to extend the time limit that is configured to complete the backup. Do not
extend over 24 hours.
• Edit the failed backup and change the priority level to a higher priority. Higher priority levels allocate
more system resources to completing the backup.
Explanation
The Backup Repository Path determines where backups are stored. By default, backups are stored in /
store/backup. Administrators can configure the Backup Repository Path parameter on the Backup
Recovery Configuration page on the Admin tab. If the system can't detect the Backup Repository Path,
the backup can't complete successfully. For example, the path might not be found if there is an external
storage mount failure.
Explanation
When the system attempts to use more than the allocated amount of memory, the application or service
can stop working. Out of memory issues are often caused by software, or user-defined queries and
operations that exhaust the available memory.
User response
Review the following resolutions:
• Review the error message that is written to the /var/log/qradar.log file to determine which
component failed.
• If the Ariel proxy server is searching through large amounts of data or is using a grouping option that
generates unique values in the search results, reduce the number of unique values or reduce the time
frame of the search.
• If the accumulator is generating a time series graph with many aggregated unique values, reduce the
size of the query.
• If a protocol-based log source is recently enabled, decrease the polling period to reduce the data
queried. If multiple protocol-based log sources are running at the same time, stagger the start times.
• If a rule recently changed to track unique properties over long periods of time, reduce the time frame by
half or reduce the number of matching events by adding another filter.
Explanation
An application or service ran out of memory and was restarted. Out of memory issues are commonly
caused by software issues or user-defined queries.
User response
Review the following resolutions:
• Review the error message that is written to the /var/log/qradar.log file to determine which
component failed.
• If the Ariel proxy server is searching through large amounts of data or is using a grouping option that
generates unique values in the search results, reduce the number of unique values or reduce the time
frame of the search.
• If the accumulator is generating a time series graph with many aggregated unique values, reduce the
size of the query.
Explanation
The system detected an exception when it wrote offense updates to the database.
Events are processed and stored, but they might not contribute to offenses.
User response
Conduct a soft clean of the SIM data model with Deactivate offenses unchecked.
1. Click the Admin tab.
2. On the toolbar, click Advanced > Clean SIM Model.
3. Click Soft Clean to set the offenses to inactive.
4. Ensure that Deactivate offenses is not checked.
5. Click the Are you sure you want to reset the data model? check box and click Proceed.
When you clean the SIM model, all existing offenses are closed. Cleaning the SIM model does not affect
existing events and flows.
Explanation
The system is unable to create offenses or change a dormant offense to an active offense. The default
number of active offenses that can be open on your system is limited to 2500. An active offense is any
offense that continues to receive updated event counts in the past five days or less.
User response
Select one of the following options:
• Change low security offenses from open or active to closed, or to closed and protected.
• Tune your system to reduce the number of events that generate offenses.
To prevent a closed offense from being removed by your data retention policy, protect the closed
offense.
Explanation
By default, the process limit is 2500 active offenses and 100,000 overall offenses.
If an active offense does not receive an event update within 30 minutes, the offense status changes to
dormant. If an event update occurs, a dormant offense can change to active. After five days, dormant
offenses that do not have event updates change to inactive.
User response
Select one of the following options:
• Tune your system to reduce the number of events that generate offenses.
• Adjust the offense retention policy to an interval at which data retention can remove inactive offenses.
To prevent a closed offense from being removed by your data retention policy, protect the closed
offense.
• To free disk space for important active offenses, change offenses from active to dormant.
Explanation
Aggregate data view is disabled or unavailable or a new rule requires data that is unavailable.
A dropped accumulation does not indicate lost anomaly data. The original anomaly data is maintained
because accumulations are data sets generated from stored data. The notification provides more details
about the dropped accumulation interval.
The anomaly detection engine cannot review that interval of the anomaly data for the accumulation.
User response
Update anomaly rules to use a smaller data set.
If the notification is a recurring SAR sentinel error, system performance might be the cause of the issue.
Explanation
A corrupted component that is responsible for host services on a managed host was repaired.
User response
No action is required.
Explanation
A custom property expression is disabled because the custom property expression has performance
problems. Rules, reports, or searches that use this property, and which rely on the disabled expression to
populate it, stop working properly.
The RegexMonitor feature monitors custom properties and disables any expressions that take longer than
two seconds to parse. If inefficient custom property expressions are not disabled, the parsing queue
overflows, and some events bypass parsing and do not normalize. Any searches, rules, or reports that rely
on the non-normalized events do not function properly. When inefficient custom property expressions are
disabled, parsing functions properly, and all events normalize. Only those searches, rules, and reports
that rely on the custom property that is populated by the disabled expression do not function properly.
User response
Select one of the following options:
• Review the disabled custom property to correct your regex patterns. Do not re-enable disabled custom
properties without first reviewing and optimizing the regex pattern or calculation.
• If the custom property is used for custom rules or reports, ensure that the Optimize parsing for rules,
reports, and searches check box is selected.
Explanation
Data replication ensures that managed hosts can continue to collect data if the console is unavailable.
A managed host had difficulty downloading data. If a managed host repeatedly fails to download data, the
system might experience performance or communication issues.
User response
If a managed host does not resolve the replication issue on its own, contact customer support.
Explanation
Data replication ensures that managed hosts can continue to collect data when the console is not
available.
A managed host was skipped during cleanup because it was too long since it received an update. If a
managed host fails to receive replication updates from the console, it isn't connecting properly to the
console.
User response
To resolve this issue, select one of the following options:
Explanation
The magistrate process encountered an error. Active offenses close, services restarts, and the database
tables are verified and rebuilt if necessary.
The system synchronizes to prevent data corruption. If the magistrate component detects a corrupted
state, then the database tables and files are rebuilt.
User response
The magistrate component self-repairs. If the error continues, contact Customer Support.
Explanation
The system detected an incorrect protocol configuration for a log source. Log sources that use protocols
to retrieve events from remote sources can generate an initialization error when a configuration problem
in the protocol is detected.
User response
Resolve the protocol configuration issues by following these steps:
• Review the log source to ensure that the protocol configuration is correct.
Verify authentication fields, file paths, database names for JDBC, and ensure that the system can
communicate with remote servers. Hover your mouse pointer over a log source to view more error
information.
• Review the /var/log/qradar.log file for more information about the protocol configuration error.
Explanation
For maximum performance, raid controllers cache and battery backup unit (BBU) must be configured to
use write-back cache policy. When write-through cache policy is used, storage performance degrades and
might cause system instability.
User response
Review the health of the battery backup unit. If the battery backup unit is working correctly, change the
cache policy to write-back.
Explanation
The transaction sentry restored the system to normal system health by canceling suspended database
transactions or removing database locks. To determine the process that caused the error, review the
qradar.log file for the word TxSentry.
User response
No action is required.
Explanation
When an external scanner is added, a gateway IP address is required. If the address that is configured for
the scanner is incorrect, the scanner cannot access your external network.
User response
Select one of the following options:
• Review the configuration for any external scanners to ensure that the gateway IP address is correct.
• Ensure that the external scanner can communicate through the configured IP address.
• Ensure that the firewall rules for your DMZ are not blocking communication between your appliance and
the assets you want to scan.
Explanation
A scheduled vulnerability scan failed to import vulnerability data. Scan failures are typically caused by
configuration or performance issues that result from a large volume of data to import. Scan failures can
also occur when a scan report that is downloaded by the system is in an unreadable format.
User response
Follow these steps:
1. Click the Admin tab.
2. On the navigation menu, click Data Sources.
3. Click Schedule VA Scanners.
4. From the scanner list, hover the cursor in the Status column of any scanner to display a detailed
success or failure message.
Explanation
The system cannot initialize a vulnerability scan and asset scan results cannot be imported from external
scanners. If the scan tools stop unexpectedly, the system cannot communicate with an external scanner.
The system tries the connection to the external scanner five times in 30-second intervals.
In rare cases, the discovery tools encounter an untested host or network configuration.
User response
Select one of the following options:
• Use the System and Licence Management window to review the configuration for external scanners to
ensure that the gateway IP address is correct.
• Ensure that the external scanner can communicate through the configured IP address.
• Ensure that the firewall rules for your DMZ are not blocking communication between your appliance and
the assets you want to scan.
Explanation
A scheduled vulnerability scan is unable to connect to an external scanner to begin the scan import
process.
Scan initialization issues are typically caused by credential problems or connectivity issues to the remote
scanner. Scanners that fail to initialize display detailed error messages in the hover text of a scheduled
scan with a status of failed.
User response
Follow these steps:
1. Click the Admin tab.
2. On the navigation menu, click Data Sources.
3. Click Schedule VA Scanners icon.
4. From the scanner list, hover the cursor in the Status column of any scanner to display a detailed
success or failure message.
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
Trademarks
IBM, the IBM logo, and ibm.com® are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You may reproduce these publications for your personal, noncommercial use provided that all proprietary
notices are preserved. You may not distribute, display or make derivative work of these publications, or
any portion thereof, without the express consent of IBM.
Commercial use
You may reproduce, distribute and display these publications solely within your enterprise provided that
all proprietary notices are preserved. You may not make derivative works of these publications, or
reproduce, distribute or display these publications or any portion thereof outside your enterprise, without
the express consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any information, data, software or other intellectual property
contained therein.
58 Notices
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of
the publications is detrimental to its interest or, as determined by IBM, the above instructions are not
being properly followed.
You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE
PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT,
AND FITNESS FOR A PARTICULAR PURPOSE.
Notices 59
60 IBM Security QRadar: QRadar Troubleshooting and System Notifications Guide
IBM®