Understanding AWS VPC Flow Logs

VPC Flow Logs are a useful tool for monitoring the security of your AWS Virtual Private Cloud. But understanding and getting the most from these logs can be a bit tricky.

Why Use VPC Flow Logs?

VPC Flow Logs track all inbound and outbound traffic to and from instances in your Amazon Web Services Virtual Private Cloud.  They track both traffic that is accepted by Security Groups and Network Access Control Lists, and also traffic that is rejected.

They are critical for investigating a security incident after the fact, but can also be used to trigger an alert of suspicious activity as it happens.

In this article, we will show you how to set up VPC Flow logs and then leverage them to enhance your network monitoring and security.

How to Enable VPC Flow Logs

First, go the VPC section of the AWS Console.  Select your VPC, click the Flow Logs tab, and then click Create Flow Log.

create-vpc-flow-log-1

The next screen is a wizard to help you set up flow logs.  You can choose to collect accepted and/or rejected traffic.  Some people prefer one log for accepted and another for rejected.  I prefer both types of traffic in the same log.  The next step is to select an IAM role to allow flow logs to be published.  The easiest create the role is to click the “Set Up Permissions” link.  Finally, you need to select a Destination Log Group in Cloudwatch.  I recommend a name of “FlowLogs.”

create-vpc-flow-log-2

If you clicked “Set Up Permissions,” you will see an IAM wizard as shown below.  Let it create a new IAM role for you.  Give the role a name that will help you remember its purpose such as “FlowLogsRole.”

create-iam-role-flow-logs

Viewing VPC Flow Logs

To view your flow logs, go to AWS CloudWatch, and then select “Logs” on the left hand side of the screen.  This will give you a list of your log groups.  Select your FlowLogs group (or whatever group name you provided when you set up  VPC Flow Logs.

vpc-flow-logs-cloudwatch

The logs are grouped according to the Elastic Network Interface (ENI) attached to your EC2 instance or Elastic Load Balancer (ELB).  To find your EC2 instance’s ENI, go to EC2, select your instance, then on the description tab, find the network interfaces and click on the link (probably eth0) as shown below.  The interface ID is what you need to find the correct log within your Flow Logs.

how-to-find-ec2-elastic-network-interface

Back in your VPC Flow Logs you can search for the logs related to this network interface to see all accepted and rejected traffic.

Filtering and Understanding VPC Flow Logs

Usually, you are not interested in wading through all of the accepted and rejected traffic for your EC2 instance.  You are likely interested in a particular subset of that traffic.  That may be all rejected traffic, all traffic to or from a specific address or using a specific port.  To find that traffic, you can use filtering.

To filter traffic, start by pasting the text below into the filter field.  The text below does not filter anything, but we will see how to filter next.

[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, logstatus]

To begin filtering, simply add =value to one or more of the fields to limit your results to only those fields.  For example, perhaps I want to see all rejected traffic.  I can use

[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action=REJECT, logstatus]

The format of the filter also represents the content of the fields in the VPC Log.  For example, the fourth field in your log is the source IP address of the traffic.  That is followed by the destination IP address and then source and destination ports.

When I run the filter above, I see several external systems trying to connect on port 23 (telnet) and port 80 (http).  I’m not using telnet (of course!), and I’m not running a web server, so port 80 is closed at the security group layer.  It is likely that this traffic is malicious attempts to hack into my EC2 instance.  But, I don’t have to worry about it, because it is all being rejected.

security-groups-block-malicious-traffic

In the first row that I expanded, we can see that an attempt was made to connect to port 23, and this attempt originated from IP address 78.10.107.73.  In the second expanded row, we see an attempt to connect to port 80 from IP address 72.21.217.71.  Both attempts were rejected.

For a few more details on the fields in VPC

Exporting VPC Flow Logs

Filtering flow logs is convenient for a quick look at your network traffic.  For example if you are trying to allow two instances in different security groups to communicate and it is not working, you might be able to quickly see what traffic is allowed and rejected between them by filtering on source and/or destination addresses.

But if you want to do a more detailed analysis of your network traffic, Flow Log filtering is not the way to go.  For this, you need to export your logs and then import them into another tool such as a relational database or other analytical system.

You can export flow logs to S3, stream them to Lambda, or stream them to ElastiSearch.  To do so, go to CloudWatch, click “Logs,” select your log group and click the “Actions” button as shown below…

how-to-export-vpc-flow-logs

Triggering Alerts from VPC Flow Logs

The ability to stream CloudWatch logs to Lambda functions means it is possible to write custom logic such as alerts to notify you of security issues.  One example might be that you want to be alerted of any rejected traffic originating from within your VPC.  Rejected traffic might indicate something such as a compromised web server that is being used to probe the rest of your network.  I would not fire alerts based on rejected traffic from external sources.  Any public IP address will constantly be probed for weaknesses.  Good Security Group settings and a good Web Application Firewall will protect you from those attacks.  Rejected traffic originating from within your network, on the other hand, can be a real cause for alarm.

Lambda offers a built-in template for building a function that processes CloudWatch Logs such as VPC Flow Logs.  Filters can be applied to avoid triggering the Lambda function too often which may go a long way towards reducing your costs.  Writing and configuring this Lambda function is a subject for a future post.

 

 

Should I Buy Reserved EC2 Instances?

AWS Reserved EC2 instances offer a compelling cost savings. But if you are not careful, they may lock you in for higher costs than you really need.

Savings Through Reserved Instances

Your effective monthly cost for EC2 instances can be significantly reduced by selecting a reserved instance.  The longer your reservation period and the more you choose to pay up front, the more you save.

reserved-ec2-instance-AWS

From the chart above, you can see the us-east pricing for an m4.large instance as of May of 2016.  By committing to a one-year reserved instance, you can save 31% of your monthly cost.  By paying up front, you can save almost 43% for a one year commitment.  If you are willing to commit to a three-year contract, you can save as much as 63%.  These savings can be compelling!

Reserving the Wrong Size Instance Can Cost You More

Despite these savings, there are reasons not to buy a reserved instance.  Especially when you first start using Amazon Web Services.  If you are coming from the non-cloud world, you may be used to over-provisioning your servers.  Servers are a big purchase and they are expected to last at least three years and maybe five.  As a result, when physical servers are purchased, they are often bought in a larger size than needed to allow for growth over time and to guard against possible errors in estimating what size server is necessary.

Immediate Visibility of Performance Trends

If you are coming from an on premise model, you may not be accustomed to having immediate visibility of performance trends.  Or you may have had to wait for complicated monitoring systems to be installed and configured before you had visibility into this data.

In the cloud, AWS CloudWatch provides immediate visibility into performance trends on the CPU, disk, and network usage of your EC2 instances.  You can also set alarms to notify you if usage rises above a threshold of your choice.

The CloudWatch graphs below show an Ec2 instance that is almost certainly oversized.  In the last two weeks, the CPU usage has never exceeded 15% and has only rarely exceeded 5%.

EC2-CloudWatch-Metrics

The owner of this EC2 instance would certainly be using a smaller, less expensive instance, except that they made the exact mistake I am warning against and purchased a reserved instance prior to fully understanding their resource needs.  Because of this, they are locked into a more expensive instance than they need for a year.  Lucky for them, they chose a one year term rather than a three year term.

So What Should I Do?

If you are evaluating whether to buy a reserved instance ask yourself these questions:

  1. How certain am I of the load my application will put on this instance?
  2. Have I observed the performance and behavior of my application on a variety of instance sizes to know which one is best?
  3. Am I confident that load will remain consistent for the next one to three years?
  4. If I outgrow my EC2 instance, can I take advantage of elastic load balancing and auto-scaling to distribute the load to additional instances?
  5. Do I have time to perform this analysis, or do I need to simply make a choice and move on?

How certain am I of the load my application will put on this instance?

Are you deploying a new application to the cloud, or migrating and existing application to the cloud?  Do you have metrics of what load the application will put on the EC2 instance?  If not, before you select a reserved instance, you should consider observing the behavior of your application and confirming the right instance size before you reserve an instance.  Though you will spend more to run on an on demand instance for the initial testing and monitoring period, you might save significant money over the life of the reservation by ensuring you do not reserve an instance size large than you need.  Use the “Monitoring Tab” on the EC2 control panel to observe CPU, network and disk usage of the instance.

Have I observed the performance and behavior of my application on a variety of instance sizes to know which one is best?

Before you lock in a one or three year contract for a given instance size, try a few different sizes to see how your application performs.  To resize an EC2 instance, Follow these steps:

  • Stop the Ec2 instance
    • On the EC2 console, select your instance
    • Choose Actions -> Instance State -> Stop
  • Resize the instance
    • On the EC2 console, select your instance
    • Choose Actions -> Instance Settings -> Change Instance Type
    • Select a new instance type
  • Start the instance
    • On the EC2 console, select your instance
    • Choose Actions -> Instance State -> Start

The whole process takes only a minute or two.  If you cannot tolerate a minute or two of downtime, consider putting an Elastic Load Balancer in front of your instance.  This will allow you to add and remove instances from behind the load balancer without downtime.  The load balancer also offers the ability to monitor request latency.  This allows you to see whether a new instance size has caused a slower response time for your application.

If you choose to use the elastic load balancer, you will need to take an image of your EC2 instance and then launch a new instance of a different size.  Add the new instance to the load balancer and remove the old one to shift traffic seamlessly between instances.

Am I confident that load will remain consistent for the next one to three years?

Reserving an instance locks you in to paying to use that instance for the entire term of the reservation.  If you expect that your traffic may increase or your application may change such that you outgrow that instance type before the reservation term is complete, then reserved instances may not be the best option for you.

If I outgrow my EC2 instance, can I take advantage of elastic load balancing and auto-scaling to distribute the load to additional instances?

Leveraging elastic load balancing and auto-scaling can significantly reduce the risk that you outgrow your instance and can significantly reduce your cloud costs.  Elastic Load Balancing distributes the traffic for your application across a group of instances.  If your traffic grows, you can add more instances.  If your traffic drops, you can stop the unneeded instances.

Auto Scaling automates the process of adding and removing instances in response to traffic changes.  Combining auto scaling and load balancing can optimize your cloud costs by running extra instances only when your traffic demands them.  It will automatically terminate unneeded instances when the traffic drops.

If you go the route of load balancing and auto scaling, you should reserve only the minimum number of instances you plan to run all the time.  Let the other instances be on demand.

Do I have time to perform this analysis?

An important question to ask yourself is whether you have the time and interest in optimizing your costs.  If you are already over tasked and think you may never get around to evaluating the right size for your instance, perhaps you want to just pick a safe, oversized, instance and move on.  Perhaps the cost savings offered by selecting a smaller instance are not worth the time you would invest.  If this is the case, you may be best off reserving a one year term rather than allowing the project to linger on indefinitely all the while paying on demand prices.

Conclusion

The correct use of reserved EC2 instances can save you significantly on your cloud computing costs.  But reserving an instance before determining the size you really need can lock you in to an oversized instance and end up costing you more in the long run.  You should take the time to test your application on a variety of instance sizes to ensure you select the best size for your needs.  You should also try to use Elastic Load Balancing and Autoscaling to optimize your cloud computing costs by responding dynamically to traffic load.

 

 

Not Getting the EBS IOPS You Paid For?

In many cases, you may be surprised to see you don’t seem to be getting the IOPS you paid for on your Amazon Web Services (AWS) Elastic Block Store (EBS) volumes. Here are the reasons and what you can do to fix it.

EBS and IOPS Background

First, a bit of background.  IOPS means Input/Output Operations Per Second and is generally used as a measure of performance of computer storage such as disks or EBS volumes.  Higher IOPS values mean faster storage which can translate to better performance for your website or database.

AWS documentation tends to focus on IOPS for an EBS volume.  Your IOPS are directly tied to the size of your EBS volume.  For each 1 GB of the EBS volume, you get a baseline performance of 3 IOPS.  Volumes less than 1000 GB can occasionally support a burst speed above their baseline of up to 3000 IOPS.  Volumes larger than 1000 GB already have a baseline above 3000 IOPS and so do not need to burst to 3000 IOPS.

In many cases, people will buy a larger EBS volume to get better performance.  And, in general, this approach works.  But sometimes, it does not.

My Large EBS Volume is not Getting the Expected IOPS

Sometimes, you may create a large EBS volume with the expectation of getting a certain number of IOPS but find you are not getting that number.  The graph below shows a 3TB drive which is expected to get 9000 IOPS.  But as you can see, we are struggling to get 2500.

2016-03-09-IOPS-lg

 

Initially, we thought perhaps our application was not driving enough load to use all the IOPS, but we found our disk queues were approaching 15, which is longer than we would like.

2016-03-09-queue-length-lg

Our first clue came when we noticed the slight increase in IOPS occurred in parallel with a slight decrease in average read size.  The use case is a MongoDB database.  The reduced read size is due to MongoDB finding more data in the file cache over time and thus needing less from the file system.

2016-03-09-read-size-lg

We noticed that the IOPS multiplied by the average read size remained constant over time.  This value tended to stay right around 130MB/s.   The fact of this value holding constant, combined with the longer than desired disk queues, and the fact that more load on our application did not provide more throughput all led us to conclude we had an I/O bottleneck on this EBS volume.  But why is that?  We paid for a 3 TB volume which means 9000 IOPS.  Why are we not getting what we paid for?

IOPS are not the Only Limitation for EBS

As mentioned before, AWS documentation focuses heavily on IOPS when discussing EBS performance.  But there is more to it than that.  In addition to the IOPS limit that varies according to the size of your EBS volume, there is a hard limit on throughput per EBS volume.

If your I/O chunks are very large, you may experience a smaller number of IOPS than you provisioned because you are hitting the throughput limit of the volume.  — EBS IO Characteristics

The throughput limit of General Purpose EBS drives ranges from 128-160 MB/s depending on drive size.  While our drive should have received 160Kb, we started to suspect we were hitting a throughput limit.  Perhaps exacerbated by the fact we were not using an EBS optimized EC2 instance type.

The Read Bandwidth graph below shows we are not able to exceed a throughput of 130MB/s regardless of the load on the system.

2016-03-09-bandwidth-lg

What Can I Do?

Our options were to either spend more to get the better performance we desired, or accept the current performance level and save money by selecting a less expensive drive than the 3TB volume we were currently using.

Increase Throughput Limit with a Provisioned IOPS drive

According to the AWS Price Calculator we were paying $300/month for the large EBS volume that was not delivering the IOPS we had hoped for.  Because we were throughput bound, we needed to focus on improving throughput.  A simple option is to move to a Provisioned IOPS EBS volume.  Provisioned IOPS volumes offer double the throughput of General Purpose EBS volumes.  Instead of being limited to 160 MB/s, we could get 320 MB/s.  Based on our average read size, we are getting a bit under 2500 IOPS with the limit of 160MB/s.  We wanted to double that rate.  Provisioning 5000 IOPS should be enough to fully utilize the 320 MB/s limit and double our current throughput.  We really only needed 500GB of drive space and 5000 IOPS.  We could get a Provisioned IOPS drive that would double our performance for $387.50/month.  That would double our performance for only 30% increase in monthly cost.

If Performance is Acceptable, Rightsize the EBS Volume

Perhaps we would have liked better performance, but our budget would not allow for additional spending, and the performance we were getting was tolerable.  In this case, we need to stop spending extra money for performance we are not getting.  We provisioned a 3TB driver expecting better performance, but our large average read size and the EBS throughput limit meant we were only getting 2500 IOPS.  A smaller 835GB drive will still provide 2500 IOPS and will cost only $83.50/month.  This offers us a 72% savings for the current performance level.

Steps to Implement Our Solution

Regardless of whether we decide to spend more for a faster volume, or save money and downsize to a more cost effective volume that delivers the same performance, the general approach is the same.  We need to take the following steps:

  1. Provision a new EBS Volume
  2. Attach that volume to our EBS instance
  3. Mount the volume
  4. Copy the data
  5. Unmount the old volume
  6. Move the mount point of the new volume
  7. Detach and delete the old volume

 Provision the new EBS Volume

First, check the availability zone (AZ) of the EC2 instance you will be working with.  You must ensure you create the new EBS volume in the same AZ as the EC2 instance.  To check that, open your list of EC2 instances in your AWS Console and find the availability zone of your instance as shown below.

Screen Shot 2016-03-10 at 8.14.32 PM

Remain in the EC2 area of the AWS Console.  One the left hand column, click the link “Volumes” under “Elastic Block Store.”  Then click the large “Create Volume” button.

Fill out the form that appears based on whether you are upgrading to Provisioned IOPS or rightsizing to a smaller more cost effective volume.  The example below shows upgrading to Provisioned IOPS.  If your data is sensitive, select the option to encrypt the volume.

Make sure you choose the same availability zone as your EC2 instance!

create-ebs-volume

Attach the Volume

Select your new volume, and in the Actions menu, select “Attach”

attach-ebs-volume-1

Then select your EC2 instance from the drop down list.  You can accept the default “device name” but you may want to write it down.  It can be helpful when mounting the volume later.

attach-ebs-volume-2

Mount the Volume

To mount the volume, you have to log into your EC2 instance and issue various commands.  The exact commands will vary according to your operating system.   The links below will help you navigate this step.

Mount your EBS volume in Linux

Mount your EBS Volume in Windows

Copy the Data

The next step is to copy the data from your old drive to your new drive.  Be sure to stop any programs such as databases that might make changes to the files during the copying process.

Unmount the Old Volume

Using the appropriate tools for your OS, unmount the old volume.

On linux, this will be the unmount command.  On Windows, you will use the Disk Management tool you used to mount the new volume.

Move the Mount Point of the New Volume to Where the Old Volume Was Mounted

This step involves first unmounting your new volume from its mount point and then remounting it in the same place the old volume was mounted.  This step is necessary because various programs such as databases and web servers are looking for the data in the old location.

If you are using Linux, don’t forget to update fstab to ensure the drive is mounted at boot.

Detach and Delete the Old Volume

Once you have confirmed all your data was correctly transferred to the new volume and everything is working as expected, don’t forget to detach and terminate the old volume.  Until you terminate it, you will keep paying for it.  Those costs can build up quickly!

Go to the AWS Console, EC2 Section.  Select “Volumes” under “Elastic Block Store.” Select the old volume and on the action menu, choose “Detach Volume.”

detach-ebs-volume

Then on the action menu, choose “Delete Volume.”

Conclusion

IOPS are not the only factor limiting your EBS drive.  Overall throughput is also limited.  This can be a particular challenge if your average read or write size is large.  If you find yourself limited by throughput, you can upgrade to a provisioned IOPS drive to double your throughput.