Architecting a robust, scalable system
Posted by Jonathan Fleiser - 30 May 2016
Sandfield goes to great lengths to streamline and improve systems, increase flexibility and ultimately profitability for our customers. We are in a privileged situation in that we are often required to provide systems that are highly customised or unusual, and we are given the leeway and resources internally to explore options and provide what is best suited to our customers’ needs.
With the rapid emergence of hosted and cloud options, it has been challenging determining when customers should switch from existing or traditional providers to emerging or new suppliers promising cost savings and flexibility.
In our experience, some “cloud” providers were riding on the coat-tails of their marketing departments. Selling cloud products that were little more than untested ideals, “cloud washing” their traditional offerings to make them appear like cloud products. Others have had more success overall but their service levels and support did not meet our customers expectations.
On one hand, working with these “cloud” providers added a new layer to our infrastructure services, promising attractive new options. On the other hand, we risked getting robbed of the visibility and capability to assist.
So, when I read the perspectives of a man named James Hamilton, Vice President and Distinguished Engineer on the Amazon Web Services team, I must say that a rare, lasting impression was made. James spoke about some fundamentals at AWS, I have included a few reported by Timothy Prickett Morgan in 2014;
"Networking is a red alert situation for us right now," explained Hamilton. "The cost of networking is escalating relative to the cost of all other equipment. It is Anti-Moore. All of our gear is going down in cost, and we are dropping prices, and networking is going the wrong way. That is a super-big problem, and I like to look out a few years, and I am seeing that the size of the networking problem is getting worse constantly. At the same time that networking is going Anti-Moore, the ratio of networking to compute is going up."
This, Hamilton said, is driven partly by the fact that there is more compute in every CPU from generation to generation, thanks to Moore's Law, and also that the cost per unit of compute is also falling. Because of this increased bang for the buck, more people are doing more analytics, and analytics workloads are very network intensive, and it pushes the network even harder. So nearly five years ago, when this problem became apparent, AWS designed its own network routers and went to original design manufacturers to build the hardware, and put together a team to write the networking software stack on top of them. This is a course of action, Hamilton said laughing, where people "would get you a doctor and put you in a nice little room where you were safe and you can't hurt anyone."
The first thing that Amazon learned from its custom network gear is what it learned about servers and storage a long time ago: If you build it yourself with minimalist attitudes and only with the features you need, it is a lot cheaper. "Just the support contract for networking gear was running tens of millions of dollars."
But the surprising thing, even to Hamilton, was that network availability went up, not down. And that is because AWS switches and routers only had features that AWS needed in its network. Commercial network operating systems have to cover all of the possible usage scenarios and protocols with tens of millions of lines of code that is difficult to maintain.
"Future-proofing and versatile design is key"
AWS viewed network performance, cost and scale as an impediment 7 years ago in 2009. Rather than persist with standard industry offerings, they made a remarkable decision to build and manage their own dedicated networking infrastructure to complement their already excellent compute and storage resources. In my opinion, smaller providers and local data centers will never be in this position and will always be reliant on factors beyond their control.
This situation ultimately lost one of our local hosting providers all of our business. They were relatively well provisioned with redundant systems and best practice but their reliance on even bigger niche providers proved problematic nonetheless. We soon tired of being left red faced and without options to help our customers resume business directly.
AWS’s approach to networking infrastructure is just one example of the technology and processes required to deploy high performing, resilient, competitive hosting services on massive scale. This ties in directly with our preferred approach that if there is a better way it should be considered and preferably used; future-proofing and versatile design is key.
Now that we are using AWS and have selected a finite set of trusted tools that allow for quick and secure implementations, as well as leveraging powerful cost saving models for our customers, our passion and preference to customize and create the unusual has not been limited by anything other than imagination and budget.
Along with our customers, we pride ourselves on having foresight and being innovative. Now that we have found a true cloud services partner in AWS who shares these attributes, I can only imagine that some very exciting times lie ahead.