The use case I would like to address in this post revolves around the need to understand which specific Amazon EC2 instances incur the highest data transfer charges, including high-level understanding of traffic flow (e.g. to/from Internet, inter-region, intra-region and etc.). For detailed analysis of traffic flow, I would recommend to use Amazon VPC Flow Logs – see Querying Amazon VPC Flow Logs in Amazon Athena (serverless query service) documentation.
First, I’d like to mention that for most cost management needs, you should first take a look at AWS Cost Explorer. It has an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage over time. Get started quickly by creating custom reports (including charts and tabular data) that analyze cost and usage data, both at a high level (e.g., total costs and usage across all accounts) and for highly-specific requests (e.g.,
m2.2xlarge costs within account Y that are tagged “project: secretProject”). Using AWS Cost Explorer, you can dive deeper into your cost and usage data to identify trends, pinpoint cost drivers, and detect anomalies.
Currently, AWS Cost Explorer doesn’t provide details for a specific resource ID – EC2 instance in our case. On the other hand, AWS Cost and Usage report tracks your AWS usage and provides estimated charges associated with your AWS account. The report contains line items for each unique combination of AWS product, usage type, and operation that your AWS account uses. You can customize the AWS Cost and Usage report to aggregate the information either by the hour or by the day.
Below I’ll show you how to use Amazon Athena to analyze the data from your AWS Cost and Usage report in Amazon S3 using standard SQL. This enables you to avoid creating your own data warehouse solutions to query AWS Cost and Usage report data.
First, follow the instructions in Uploading an AWS Cost and Usage Report to Amazon Athena to create a report in Amazon S3 and configure Amazon Athena database on top of it.
The query below will produce a list of Amazon EC2 instance IDs sorted by total data transfer usage.
cost_and_usage_report is the name I chose for Amazon Athena table.
SELECT line_item_resource_id, round(SUM(line_item_usage_amount), 2) AS sum_line_item_usage_amount FROM cost_and_usage_report WHERE line_item_product_code = 'AmazonEC2' AND product_product_family = 'Data Transfer' AND regexp_like(line_item_resource_id, 'i-') GROUP BY line_item_resource_id ORDER BY sum_line_item_usage_amount DESC LIMIT 100
The result will look something like this:
Now we can dive deeper into the details of the specific Amazon EC2 instance using the next query:
SELECT line_item_resource_id, product_transfer_type, product_from_location, product_to_location, line_item_operation, line_item_line_item_description, round(SUM(line_item_usage_amount), 2) AS sum_line_item_usage_amount FROM cost_and_usage_report WHERE line_item_resource_id = 'i-0526081c...' AND product_product_family = 'Data Transfer' GROUP BY line_item_resource_id, product_transfer_type, product_from_location, product_to_location, line_item_operation, line_item_line_item_description ORDER BY sum_line_item_usage_amount DESC LIMIT 10
The result below shows, for example, that most of the data transfer is attributed to intra-region traffic coming through the instance’s public IP address:
|i-0526081c...||IntraRegion||EU (Ireland)||EU (Ireland)||PublicIP-In||$0.010 per GB – regional data transfer – in/out/between EC2 AZs or using elastic IPs or ELB||9.18|
|i-0526081c...||IntraRegion||EU (Ireland)||EU (Ireland)||InterZone-In||$0.010 per GB – regional data transfer – in/out/between EC2 AZs or using elastic IPs or ELB||1.64|
|i-0526081c...||IntraRegion||EU (Ireland)||EU (Ireland)||PublicIP-Out||$0.010 per GB – regional data transfer – in/out/between EC2 AZs or using elastic IPs or ELB||0.78|
|i-0526081c...||IntraRegion||EU (Ireland)||EU (Ireland)||InterZone-Out||$0.010 per GB – regional data transfer – in/out/between EC2 AZs or using elastic IPs or ELB||0.2|
|i-0526081c...||AWS Outbound||EU (Ireland)||External||RunInstances||$0.000 per GB – first 1 GB of data transferred out per month||0|
|i-0526081c...||AWS Inbound||External||EU (Ireland)||RunInstances||$0.000 per GB – data transfer in per month||0|
If we would like to understand what is the exact source of the traffic at this point, we would need to get the ENI ID of the instance and query Amazon VPC Flow Logs as mentioned in the beginning of this post.