Category: amazon web services


By nature, when working with connectable resources in the cloud, the number and IP location of those resources can change at any point. A pain point is often managing the addresses with which to connect to these instances, so I spent a short amount of time doing something about it by creating a program in WPF which automatically retreives a list of instances for an Amazon EC2 account and allows connections over Remote Desktop Protocol (RDP).

CloudTerminal v0.2

After quickly realising then that there are many additional features which would also be useful in this area, I open sourced the project at http://cloudterminal.codeplex.com Special thanks for already contributing a beautiful logo to James Tenniswood.

To prevent over-engineering the tool and never coming up with a version I can use myself, let alone releasing, I decided to put some old skool agile methodology on the project and prioritise the features by how essential they are for each release. This roadmap is then published on CodePlex. Development of features in 0.2 is complete and a working copy can be installed via click-once.

0.1
- Retreive and display list of connections from EC2
- Connect and disconnect via RDP to any instance in list

0.2
- Show instance CPU history
- Store account keys in local configuration
- Optimise UX

0.3
- Allow multiple AWS accounts
- SSH connectivity, including private key storage
- Overlay instance details / commands on list select
- Add grid of instances, shown when no connections are active

0.4
- Allow Azure accounts with more appropriate list view of instances/services

0.5
- Test TCP connectivity before connecting. Offer option to open relevant remote cloud firewall port to client IP address.
- Allow instance / image / service specific credential saving for connections.

Yes, the BgInfo tool is a little dated, but it can be useful in situations where multiple servers are being used all with very similar configurations (as should be the case for servers running from an Amazon EC2 image).

Any AWS EC2 instance is available to query information about itself using simple GET requests to the REST metadata service at http://169.254.169.254/

Add Instance Metadata to Environment Variables with Powershell

We can write a simple powershell script which fetches various pieces of this metadata and writes it to the Windows Environment Variables (at the machine level). We can then have Task Scheduler run this script at Machine Startup so that each new instance of our image will write its unique data.

$wc = new-object System.Net.WebClient;
$instanceIdResult = $wc.DownloadString("http://169.254.169.254/latest/meta-data/instance-id")
$instanceAZResult = $wc.DownloadString("http://169.254.169.254/latest/meta-data/placement/availability-zone")
$instanceAMIResult = $wc.DownloadString("http://169.254.169.254/latest/meta-data/ami-id")

[Environment]::SetEnvironmentVariable("EC2-InstanceId", "$instanceIdResult", "Machine")
[Environment]::SetEnvironmentVariable("EC2-InstanceAZ", "$instanceAZResult", "Machine")
[Environment]::SetEnvironmentVariable("EC2-InstanceAMI", "$instanceAMIResult", "Machine")

Now that we have our data, we can configure BgInfo to use it, adding a few custom fields.

Configuring a Machine Image to use BgInfo

In order to successfully configure BgInfo for an Amazon Windows AMI so that each RDP user receives the custom background image, I performed the following steps:

  1. Save BgInfo.exe to a permanent folder accessible to all users on C
  2. Run BgInfo.exe and configure data layout as desired
  3. Under Bitmap > Location… select “User’s temporary files directory”
  4. Apply
  5. File > Save As… and save the configuration to a file in the same location as bginfo.exe (eg: bginfo.bgi)
  6. Add the following Registry value:
    reg add HKU\.DEFAULT\Software\Sysinternals\BGInfo /v EulaAccepted /t REG_DWORD /d 1 /f
  7. Create a batch file in the same location with runs bginfo.exe against the saved configuration, with no GUI:
    bginfo.exe bginfo.bgi /timer:0 /NOLICPROMPT
    
  8. Create a task in Task Scheduler with the following properties:
    • Run only when user is logged on
    • Trigger: At log on (and repeat as desired)
    • Action: Execute the batch file, starting in the relevant directory
And that’s it – every time you log-in, under any account, you should see the BgInfo background with the custom EC2 variables.

Awhile ago I wrote about developing a page to bring in some rudiementary cloud-watch data to measure and compare realtime trafffic of an ELB (Elastic Load Balancer) enabled website and the performance of the providing servers.

Part of my role at Condé Nast Digital is to become fixated on be aware of the performance of our public-facing web sites, and be able to pre-empt or respond quickly to any traffic spikes or performance issues. To that end, I spend some time thinking of new ways to visualize and explore this data for both myself and my team.

In wishful style, I’ve open-sourced a web app containing these visualizations, in the hope that others contribute in the form of ideas or code, or at least get some use out of it so I can more easily justify the late nights.

The AWSMonitor project on CodePlex explains each visualization and offers a roadmap, a forum to discuss new visualizations, and the code to download and run. The app is written in ASP.Net MVC3 and uses Razor views. The views use the javascript Google Visualization API to render graphs and gauges (favouring svg versions).

There are two main ways I use the visualizations in this app daily:

Infoporn – office displays
In true wired.co.uk style, I love to have screens of realtime data on show so that everyone can easily see what’s going on, both in terms of editorial content, new features, traffic and server performance.

The /elb/random view really shines here as it displays a new site from our list of load-balancers on AWS after each interval.

In this visualization we can see:

  • A graph comparing the traffic today (blue), yesterday (red), today -7 days (yellow) and today – 14 days (green)
  • A gauge showing the average CPU utilization for each server
  • A frame containing the site’s output

Problem Investigation
AWS ELB manages server health and will take servers out if the health check target responds with an error. When this happens, I like to see exactly what’s happening on each server. The /elb/{load-balancer-name}/preview shows what the site looks like for each server

This visualization also accepts a parameter that allows us to see a specific Url for each server.

View more information and download the app at CodePlex.

When we think about version control, the most common purpose we associate with it is source code.

When we think about maintaining a farm of web servers, the initial problems to focus on solving are:

  1. Synchronisation – different nodes in the farm should not (unless explicitly told to) have different configuration or serve different content.
  2. Horizontal Scalability – I want to be able to add as many servers as required, without needing to spend time setting them up or without new builds taking more time to deploy.
  3. Reliability – no single point of failure (although this has varying levels – inability to run a new deployment is less severe than inability to serve content to users).

The scenario begs for a solution involving some kind of repository both of content (ie: runtime code) and configuration (ie: IIS setup), and most sensibly one that asks the member servers in the farm to pull content and configuration from an authoritative source, rather than having to maintain a list of servers to sequentially push to.

Synchronising Content

This sounds very similar to the feature set of many Distributed Version Control Systems. Thanks to James Freiwirth‘s investigation and code (and persistence!), we started with a set of commands in a script that would instruct a folder to fetch the latest revision of a set of files under Git version control and update to that revision. So now we could have multiple servers pulling from a central Git repository on another server and maintaining the same version between themselves. What’s more, by using Git the following features are gained:

  • It’s index-based – Git will fetch a revision and store it in its index before applying that revision to the working dir. That means, even on a slow connection, applying changes is very quick – no more half-changed working directory whilst waiting for large files to transfer. FTP, I’m talking to you!
  • It’s optimised – Git will only fetch change deltas, and it’s also very good at detecting repeated content in multiple files.
  • It’s distributed – All the history of your runtime code folder will be maintained on each server. If you lose the remote source of the repositories, not only will you not lose the data because the entire history is maintained on each node, you will still be able to push, pull, commit and roll-back between the remaining repositories.

So you could commit from anywhere to your deployment Git repository, and have all servers in your web farm pick up these changes. And then you instantly gain revision control for all deployments. If you are manually copying files to your deployment environment and, now, to the origin of your repository (some environments can’t help this – shock!) you never have to worry about overwriting files you’ll later regret, and you’re able to see exactly what’s changed and when in your server environment. Or, if you have a build server producing your website’s runtime code like us, then you can script the build output to be committed to your git (using NANT for example). James is a member of my development team, and he really changed the way we think about DVCS by introducing us to Git quite early on.

Synchronising IIS

Maintaining web server configuration across all servers is just as important as being able to synchronise content. In the past, options were limited. With IIS7 we gain the ability to store a very semantic and realtime representation of IIS configuration on any file path with the Shared Configuration feature. If we can somehow still store this information locally, but have it synchronised across all the servers then we are satisfying all 3 requirements for synchronisation, scalability and reliability.

To accomplish this, we can use the exact same method for synchronising IIS configuration that we use for content. We can set up a git repository, put in the IIS configuration, pull this down to each server and instruct each server’s IIS to point to the working copy of the revision control repository. Then, we now have history, comments and rollback ability for IIS configuration. Being able to see each IIS configuration change difference is alone an incredibly invaluable feature for our multi-site environment.

 

Practical setup (on Amazon EC2)

The final task to accomplish is to identify what process runs on all the servers to keep them always pulling the latest version of both the content and the configuration. The best we’ve used so far is simple Windows 2008 Task Scheduler powershell scripts, which James gives examples of. However, these scripts themselves can change over time since they need to know which repositories to synchronise. This calls for yet another revision controlled repository. The scheduled tasks on the servers themselves are only running stub files which define a key, to identify which farm, and therefore which sites a server needs, and then runs another powershell script retrieved from a central git repository which ensures the correct content repositories for that farm are created and up to date.

The end result is a completely autonomous (for their runtime) set of web servers, which call to central repositories in order to seek updated content and configuration.

If we then create a virtualised image with a Scheduled Task running the stub powershell script, we have the ability at any time to increase the capacity of a server farm simply by starting new servers and pointing the traffic at them. These new servers will each pull in the latest configuration and content.

Why not use the MS Web Deploy Tool?

Microsoft’s Road Map for IIS and ASP.Net includes interesting projects concerning deployment and server farm management. The Web Deploy tools is impressive in that in can synchronise IIS configuration and content (and some other server setup) even between IIS versions. However, it is very package based. We’d still need a system to either pull the latest version package down to each local server and perform a deployment of the package, or remotely push to every server we know about. This essentially starts us back at the same step I defined at the beginning of this post – needing a way to maintain a farm of servers and building something to manage the execution of those packages. There are scenarios where I do use this tool, and I’m sure that this and other tools will evolve to the point where we can get as much control and flexibility as we can achieve with ‘git-deploy’ quite soon.

In the past, EC2 virtual instances were started only from a machine image stored on S3, Amazon’s distributed object storage platform. In this model, instance storage is allocated to a new instance and the image is copied to the instance’s main instance storage device. In addition, you get a few extra storage devices depending on the instance type. The trouble is that whilst the instance may be given a lot of this instance storage, it is ephemeral – meaning that once you terminate the instance (or it experiences a failure), the data stored is permanently lost. Data that needs peristance is therefore usually stored by a user on S3, or by creating a disk from EBS, Amazon’s block-level peristent (but not distributed) storage platform.

Recently Amazon added the option to start an image from a copy of any EBS volume. Not only does this allow larger images (and therefore the introduction of Windows Server 2008) but, since the storage is persistent, instances can be stopped or started at will and the data will keep its state.

However, whilst Amazon’s command line utilities support it, very few GUI tools (including Amazon’s own web console) expose the ability to start an EBS-backed instance which also has it’s normal instance-store disks. These disks are actually very useful for scenarios where an application requires a large amount of temporary storage which you don’t want to have to pay for, or worry about volume management with the rest of your EBS volumes. The documentation for the command line also lacks a good explanation for how to do this for Windows servers. After some trial and error, I have found that these commands will work nicely:

Start a c1.medium instance from an EBS-backed private image including 1 ephemeral disk:

ec2-run-instances ami-4bebc03f -k myinstancekey -g my-security-group -b "xvdg=ephemeral0" -t c1.medium --availability-zone eu-west-1a

Start an m1.xlarge instance from an EBS-backed amazon image including 3 ephemeral disks:

ec2-run-instances ami-93ebc0e7 -k myinstancekey -g my-security-group -b "xvdg=ephemeral0" -b "xvdh=ephemeral1" -b "xvdi=ephemeral2" -t m1.xlarge --availability-zone eu-west-1a

When constructing these commands, you still need to know how many instance disks are available to each instance type, and what the windows device names are that you can attach them to (apparently limited to 10: xvdh through xvdp).

It would be great if the configuration section framework built in to .NET was not hard-coded to be FileSystem-based, but could be loaded, via a provider-framework, from any data source.

Whilst the framework was not built like this, this article explains how the encryption mechanism for configuration sections is provider-based and can be used to load data from sources other than files (a SQL Database in the article’s case). In our desire to have a farm of apps running without a difficult deployment process for configuration updates (both for server services, asp.net sites, and client tools), we wanted to deploy a number of apps but keep the configuration files on shared storage. And since our environment is completely inside Amazon EC2, it made sense to keep the configuration centrally on the redundant and scalable storage system already provided, Amazon S3.

It’s actually a very simple process to have an application load configuration remotely from S3. The main app.config file needs to be on the local filesystem with the executable, but each configuration section (built-in or custom), even the AppSettings section, can be fetched from S3 on startup, by defining a custom encryption provider and putting the settings for the provider as the fake ‘encrypted data’.

You can view the code or download the released verion for loading configuration remotely from S3 on CodePlex, here.

An example of how to use it is:

  1. Add the Natol.S3ToConfig.ProtectedConfiguration.dll file to your projects bin folder, or the GAC on the machine
  2. Define the custom configuration provider in your app.config or web.config after the configSections element:
    <configprotecteddata defaultprovider="s3ConfigSectionProvider">
    <providers><add name="s3ConfigSectionProvider" type="Natol.S3ToConfig.ProtectedConfiguration.ProtectedConfigurationProvider, Natol.S3ToConfig.ProtectedConfiguration"></add>
    </providers></configprotecteddata>

     
  3. Add a file to Amazon S3 at your desired location, giving it a key that represents the function of both your app and the context it runs in (remembering amazon s3 bucket names are globally unique). In this file, put the contents of your configuration section, eg:
    <sampleconfig><settings sampleconfigsetting="This Setting came from s3"></settings></sampleconfig>

     
  4. Replace the contents of your static configuration section with an EncyptedData element containing the location of your new configuration object in Amazon S3 and let it know our custom provider should handle the ‘decryption’, eg:
    <sampleconfig configprotectionprovider="s3ConfigSectionProvider"> <encrypteddata> <s3providerinfo s3accesskey="REPLACE_WITH_YOUR_VALUE" s3secretkey="REPLACE_WITH_YOUR_VALUE" s3bucketname="REPLACE_WITH_YOUR_VALUE" objectkey="test-s3toconfig-sampleconsoleapplication-sampleconfig"></s3providerinfo> </encrypteddata></sampleconfig>

     
  5. That’s it. Your configuration section will work exactly as normal, even calling ConfigurationManager.RefreshSection() which will then reload from S3… :-) Seriously – try the demo

Future Possibilities
It would be nice to support notification of when the configuration is updated, in the same way that the .NET Framework monitors the FileSystem-based .config files.
It would also be great to support writing back to the remote data store if the configuration has changed, depending on the per

Follow

Get every new post delivered to your Inbox.