FireFall: From the Clouds
We talk to Director of Technical Operations Jeff Berube about powering Red 5's heavyweight MMO with cloud computing.
ZAM: Has it all been smooth sailing, or have there been any problems or restrictions you encountered on the way? Likewise, were there any unexpected perks?
As good as things have been for us, there has definitely been a learning curve.
There have been a few instances of cloud instability that we needed to learn to deal with. We were caught up in the multi-day outage that US-East AWS experienced in April 2011. It is also possible to try to bring up an instance type which is just not available because they are all being used (like after the AWS outage mentioned above). Finally, there is always the possibility of your server being impacted in some way by other customer’s workloads but I can’t think of a case of that in some time. All in all, pretty minor annoyances with proper planning.
We’ve also learned a lot about designing for the cloud. Some things that are “standard” practices aren’t done easily. I mentioned the difficulty in monitoring of certain things, like network stability outside of your own instances. It isn’t easy to do simple failover as there isn’t the concept of a VIP (virtual IP) to pass between servers. (There are ways to get similar functionality in AWS Virtual Private Cloud but it is more complicated to setup and manage.) Also, by design, there is no built in data sharing between regions. (This is a reason you hear of websites and the like failing when AWS experiences an outage. It is a lot more difficult to build infrastructure that is in separate regions for failover.) As long as you take the nuances of the cloud into perspective when you are designing your architecture, it is possible to work around all these things
On the plus side, Amazon has built a really great set of tools to help developers build their own infrastructure without a lot of the headaches of managing everything. We don’t utilize everything they offer, by any means. Some things, like RDS (Relational Database Service), provide fully managed MySQL servers which are fantastic. (We deployed our own servers for our production sites because there were things we needed to be able to accomplish that are not provided by their service at this time.) Services like S3 (Simple Storage Service – kind of like NFS) and ElasticCache (managed memcached) are excellent replacements for services you would probably need to run anyways but work so well that it is better to just let them manage those services for you.
ZAM: What’s it been like working with Amazon as a partner? In what way has working with them made FireFall an even better game?
We have been really impressed with everyone we have worked with at Amazon. They are extremely knowledgeable about their area of responsibility and, when appropriate, we have had the opportunity to work directly with the Engineers working on the various services they offer.
In some cases, they have been able to provide us guidance on how to best leverage the platform for the best performance. We have a pretty non-standard use case on AWS but they definitely do all they can to make sure we are getting the very best we can from the service.
ZAM: Did the platform choice help you when expanding into Europe? Will it help if you decide to launch in other areas?
Without a doubt, AWS made expanding our service into Europe much easier. There were no new contracts to sign, no hardware purchases to make, and no servers to physically install. We decided we wanted to extend the service to the region and got right to work to make it happen.
Using the tools we built up in our development testing environment and our US production environment, we were able to quickly build up the required infrastructure. We then made sure that the information needed so that our players had all of their character information was properly replicating and, utilizing Dyn’s Global Traffic Management service, we started sending players who were closer to the new European facility to that location without needing them to pay to transfer their character or start again from scratch. (Although some companies see those services as a chance to make more money, Red 5 Studios doesn’t feel it is right to hold your characters hostage.)
Finally, we build every single location in the world to the same standard. This allows us to quickly extend the service to any location where Amazon has services available. Additionally, because we work to ensure that all of your character data is available everywhere and any of our sites can provide you all the services required to power our products, in the case of a disaster we can fail the affected customers to the next closest site and they can pick up immediately from where they were before the problem was encountered.
ZAM: Do you expect that other MMOs and online games will follow your lead?
To the best of my knowledge, the original architecture that I designed for World of Warcraft had never been used to power online gaming before we built it. I’m not sure if the things we were doing were being done at scale anywhere, actually. It was definitely a departure from anything I had ever done prior. In the last couple of years, having spoken to operations and development people working on a number of big MMOs, they are now using a design which is pretty close to what we built in 2004.
I’m sure, as we continue to prove that games as complex as ours are viable in the cloud, other companies are going to start taking a good look at what they are doing currently and planning to do with their future products. Also, the availability of people with experience working in the cloud, both in operations and development, will help to overcome those company’s initial fears. I think that last part, the right people with the right knowledge, is what will determine how quickly or widely a similar design will be adopted.
In order to fully leverage whatever infrastructure you are building requires very close cooperation between teams within the organization. There are a number of changes that have been made to our product in order to make sure that we can fully utilize the functionality provided, and manage the limitations imposed, by the cloud. Some game types may just not be a good fit for the cloud. (I can’t think of any, however).
On the other hand, if your Operations and Development teams can work together closely and each is willing to do what is best for the product, without egos, it is possible to create a fluid architecture that supports rapid iteration and “limitless” platform expansion capabilities while limiting the cost of service operation.
Also, at a certain scale, it makes sense to evaluate the expertise of your staff and whether it continues to be cost effective to operate in the cloud. There may be a point, depending on the size of the infrastructure, the ability of your team to manage the hardware required, and the requirements of your product, to move to an internally managed solution. If we reach that point, however, we have already committed to the ability to “burst” into the cloud, utilizing public cloud infrastructure in addition to our own infrastructure, in order to make sure our players always have all the hardware they need for the very best experience.
If you’re interested in trying out FireFall and judging the cloud-powered experience for yourself, this weekend is the perfect opportunity. From February 22nd to 25th, Red 5 Studios are opening their beta to all comers, as well as hosting competitions for gaming swag. For more information, head on over to their beta weekend information page.
Gareth “Gazimoff” Harmer, Senior Contributing Editor