Monitor Deep Dive - Database IO monitored (AWS) in Datadog

  • Updated

This article provides information on how to resolve any failing resources on the test:
Database IO monitored (AWS) in Datadog 

We will explore the more technical details of the test, examining what might cause a Database to fail.

 

How to Fix:

If you use Datadog for monitoring then:

    1. Log into Datadog and set up I/O monitoring filtered for your RDS databases and define monitors for one or more of the following metrics:
        • aws.rds.disk_queue_depth
        • aws.rds.volume_write_iops
        • aws.rds.volume_read_iops
        • aws.rds.write_iops
        • aws.rds.read_iops
        • aws.rds.diskio.writeIOsPS
        • aws.rds.diskio.readIOsPS
        • aws.rds.read_throughput
        • aws.rds.write_throughput
        • aws.rds.select_throughput
        • aws.rds.update_throughput

2. Create and configure monitors on the appropriate metric for your RDS Databases.
     If you notice that some metrics do not appear then create them before setting monitors.

3. Add the Datadog < > AWS Supported Tag to the Alarm's { Filter }, and as a tag on the Database itself.
    This is how Vanta will associate your Datadog Alarm to the AWS Database on the test.
  

If you do not use Datadog for monitoring then

    1. Click Deactivate monitoring.
    2. In the pop-up, write a short description identifying the tool used for monitoring and alerting.
    3. Upload a screenshot of the alert configuration if it is not satisfied by other tests in Vanta.


Common Reasons For Failure:

You have a filter and metric for the Datadog alarm and it appears to monitor the resources correctly.
However, your Database resources are still failing the test? Follow the steps below to troubleshoot.

1. The alarm may not meet the test's metric requirements:

Troubleshooting:

Navigate to the Database IO monitored (AWS) in Datadog Test, Click 'More' on the top right and select: " View Source Data "
mceclip0.png

A panel containing the your resource data will appear to the right. This panel contains the same data the test uses to determine your compliance.

In most cases, we test a single resource per test. However in this case, We're looking at two resources: Alarms and Databases. 
You can paste this data in a text editor like VSCode or Notepad, Or search the text directly in the web browser:


Search the text (ctrl + F - Windows, cmd + F - MacOS ) for your Alarm Name and press Enter until you locate your Database IO Datadog alarm data being used in the test:

You'll see the alarm query containing the - aws.rds.metric and  { filter }.
We test extracted metrics to determine if it meets the requirements. Compare your alarm's metric with our list of required metrics. 

      • aws.rds.disk_queue_depth
      • aws.rds.volume_write_iops
      • aws.rds.volume_read_iops
      • aws.rds.write_iops
      • aws.rds.read_iops
      • aws.rds.diskio.writeIOsPS
      • aws.rds.diskio.readIOsPS
      • aws.rds.read_throughput
      • aws.rds.write_throughput
      • aws.rds.select_throughput
      • aws.rds.update_throughput

If your alarm's extracted_metric isn't one of our required metrics, Edit or Create a new monitor in Datadog with a required metric:

 

If your metric matches one of those, the alarm's metric passing the test:

 

2. The filter may contain tags unsupported by Datadog on your cloud environment:

Troubleshooting:

In order to enable Datadog to collect metrics, tags, EventBridge events, and other data necessary to monitor your AWS environment. You may need to connect Datadog to AWS.
See more on Datadog's AWS integration and what tags are supported below:

To see if your Datadog alarm filters for a compatible tag, examine the filter in your alarm query:


Compare the filtered tags with Datadog < > AWS Supported Tags integration type:
'All' and in this case, 'RDS' integration types

 

If your tag is one of those above, then your alarm's tag filter isn't what is failing the test.

 

3. The resource( Database ) you're monitoring doesn't have a supported tag:

Troubleshooting:

If both your metric and filter are configured correctly, the next troubleshooting step is to view your database in the test's JSON " Source Data " similar to how we viewed the Alarm.

Find a failing database on the test, and copy its Name:


Search your " Source Data " for the database's name. Once you find it, you will see the database's properties, these are irrelevant to the test. Instead, look at your database's Tags:


Compare the Alarm's Datadog < > AWS Supported Tags filter, with those currently in 'tags' array. 
If the supported tag that you included in your filter is not present, the database will fail the test:


Login to AWS Console RDS -> select the database you're troubleshooting. Click the 'Tags' column:


Then add the supported tag you used in the Datadog alarm query { filter }:


The next time Vanta fetches your resources, Or a test requiring this data is refreshed,
Your database should now contain the same supported tag 'Key:Value' you filtered for on the alarm:

 

What Vanta is checking:

In this section, We'll look at what kind of data we fetch from the Datadog and AWS Integration and how we read that data to determine a 'Passing' or 'Failing' state on the test. 

Datadog:

We pull each 'Alarm' resource you've made on Datadog via the connected integration. In that Alarm, from the 'query' property, the test extracts the Metrics used along with the tags used in the { filter }. 

Alarm Resource:

Resource properties used in test:

  • "query": "query with a (metric{ and a filter } )"
    • we extract the { filter } to obtain tags used to associate with a Database 
  • "extracted_metrics": [  "metric used in query" ]
    • we extract the ( metric ) to see if it is a required metric
  1. Check if the metrics used are those required by the test
      1. Pass - Alarm used in test | Fail - Alarm will trigger a failure for any Database associated via Filter < > Tag
  2. Check if the tags used in the filter are Datadog < > AWS Supported Tags 
      1. Pass - Alarm used in test | Fail - Alarm not considered in test 

If the Alarm passes both checks then it will be used in the next part of the test.

 

AWS:

We pull in your AWS Cloud infrastructure inventory data and use that in various tests. This test uses various AWS Database resources. We examine the tags on the database, and see if it can be associated to the Alarm from Datadog.

 

AWS Database Resource:

Resource properties used in test:

  • "tags": [ "key":"value", "key":"value", "key":"value" ]
  1. Check if Database has the same tags used in the Datadog Alarm { filter
    1. Pass - Passes the test | Fail - Shows as Failing Entity on the test.

In this example, our Alarm used the correct filter:

{ dbInstanceIdentifier : test-database-dbcluster-instance, region : us-west-2 }


However, the Database had no tags matching the Alarm { filter }.
Thus, the test could not confirm that the Alarm is monitoring this specific database.

 

Example passing state:

If we were to apply the two tags, region and dbInstanceIdentifier, to our Database.
This database would now pass the test:

  •  

 

 

Was this article helpful?

Have more questions? Submit a request