Cloud Con East 2008

Overall Chariot Solutions Computing Among The Clouds was a great conference.

Cloud computing is not well defined and mostly correlates to the movement of applications, services and compute resources (machines, storage, queuing services) into hosted data centers and billed based on usage. It brings with it concepts of dynamic provisioning and relinquishing of resources.

So, how do you evaluate the current cloud offerings?

Stick With Your Own Data Center If

You have a steady baseline load.

AWS, used 24×7×365 is more expensive (in 2008) than owning and hosting physical machines in terms of capex for many organizations.

You have hard SLAs

If you need more than 3 or 4 9’s, you are better off with more traditional hosting or your own data center. Currently AWS doesn’t guarantee up-time as well as many medium and large business can with their own dedicated IT staffs and data centers.

You can handle your peak loads

If you have constant processing loads or have an SLA that requires you to have enough spare capacity to handle any given peak, then you’re better served with your own data center. There is no reason to rent capacity if you already have it.

You have sensitive data

You may require more certainty about where the data gets stored and who has access to it. This is likely to be a legislative or contractual issue than organizational. Though there are cloud computing platforms and initiatives that are working towards security data data protection certifications.

On Demand (AWS EC2) Is Right For You If

You have intermittent load

Your needs scale up and then can scale back down, thus saving on keeping spare capacity on-line.

You can plan for the Peaks

If you can anticipate load spikes, then you will have time to provision resources to handle those loads.

You have no capitol budget

If you have no capitol budget but must do large scale testing or data analysis, then renting resources will be significantly cheaper than buying hardware.

You want to charge based on utilization

You have a service where you can charge in direct proportion to utilization rather than based on capacity.

Higher Level Application Stacks, PaaS Google App Engine

Keep in mind these are new and the space is still being explored, more will appear. Models will develop around these PaaS providers [Platform as a Service], more languages and frameworks will be supported – though they will mostly will be based on those that can be easily hosted, sandboxed and run-time instrumented, which is why you’re seeing Python first and Java following closely along.

These providers hide the provisioning from you. Google’s offering dynamically scales your application up and down based on utilization. This provides a significant reduction in your design and administrative overhead for web development projects.

Simplicity of application development and scalability are rarely found in existing technologies, this will be one of the more interesting segments to watch mature.

Organizations and individuals are starting to learn how they need to change the design of their services and applications to take better advantage of the Cloud. It takes a change in mind-set – the phrase was dropped “Machine Instances are the new processes” and I think that’s an appropriate framing of one of the changes in mindset for taking advantage of the mass of resources that is becoming available.

Changing your software to be more easily bootstrapped, eliminate the assumption that you have access to local, disk based services – everything is pulled remotely, use URLs, and services – don’t assume local interfaces, assume remote. Design to come up / boot faster – scaling up / down quickly you don’t get the same amortization over time for start up costs. Design with the crash fast mentality – as robust as these systems are, you should still design with the idea that the system could go away at any moment. In addition to the benefit of quickly recovering from unexpected outages, this allows you to scale down faster, not just up. Keep your persistent data in the providers data stores and use the provided queuing systems to distribute work.

Gaining Wider Acceptance

There are some things these offerings need to do to gain wider acceptance.

Harder SLAs will develop as there is more competition. Higher level tools will develop on top of the instance-based cloud offerings (EC2) to allow for more automated provisioning – this will make it easier for you, but as easy as the PaaS stacks like GAE will make it.

We’ll see tools and offerings develop that will come down towards traditional data centers to allow a simpler mixing of a traditional service with bleed over to the cloud as resources need to be scaled up but also so that you can have control over the processing of your (sensitive) data in your own protected environment but push generic activity up into the cloud as necessary.

Other notable happenings

Microsoft is creating AMIs for Windows on EC2.

Google just announced that Java will be a supported App Engine development language – previously only Python was supported.

Haskell in the corporate environment

This session seemed out of place for the event – not really Cloud oriented. Though I personally see Functional Programming being a larger industry trend and something that facilitates concurrency and parallelization. It follows from structured -> procedural -> object oriented -> functional – with respect to the time line of coming out of academia at least, not necessarily the idea of one being ‘higher level’ than the other – though so far, time has implied that with the other programming paradigms.

The presenter, Jeff Polakow, is using it extensively at his current employer.

Those kinds of firms (Wall St.) allow a lot of latitude to the technical staff, so its easier to experiment (R&D) with new technologies. It’s much harder for a company like my own to decide to take on something like this – it’s hard to find developers who know how to develop, deploy, monitor and design with these technologies.

Functional Programming trend is being pushed into industry by the shift to multi-core, the past difficulties of developing concurrent, the more wide spread need for parallel/distributed applications (concurrency is the new garbage collection – it will become something that developers no longer control manually), the need for infrastructural level automatic scaling, and the easier path to robustness that languages like Erlang offer.

In languages like Java, you have to take into consideration all the libraries (where the default development practice in many cases is to not consider the re-entrancy – you can’t make the assumption that code is thread-safe in Java) you’re using with respect to their referential transparency – it’s not the default. In the FP languages referential transparency is the default case, so you can, in general, make that assumption. The underlying stack can also make that assumption about your code as well – which is why the concurrency / distribution model is less coupled to the implementation than it is in the more imperative languages.

Horizontal Scaling with HiveDB

CafePress has a large catalog. I was surprised to hear that they have 265 million products across all their customers catalogs. They have a low margin based on the aggregate amount of data they have to store and serve up, so commercial solutions like Oracle were just not an option for them simply due to cost.

Cafe spent time analyzing their options and didn’t find anything that fit their needs (cost, performance, on-line resharding), and went down the path of creating a more scalable data storage architecture themselves.

The solution they created performs better, scales better, is more robust and has a better SLA than many of the commercial solutions (their words).

Cafe’s DAL is effectively a hibernate extension that uses MySQL to do data partitioning (pseudo-automatically) by using a set of replicated MySQL databases as a catalog to map to where the data is stored for your shard (replicated 3x). The system supports dynamic repartitioning – migration of shards away from a shard-host to get less busy data away from data that is more ‘hot’ – the busiest data sets end up on their own shard-node with everything else having been pushed away from them.

They only need to ‘lock’ is for a single user when migrating their data off the shard. This is a write-lock, not a read lock – it only keeps the user from updating their own catalog of products while the move is taking place. Most users never notice when this happens. The system as a whole doesn’t go down (their words). The MySQL catalogs are replicated (3 machines, master-master, writing to 1) and can be upgraded by taking 1 of the 3 out of the cluster at a time. The same kind of approach goes for the other sharing servers.

Panel Discussion

The panel discussion was most memorable for how Chris and Toby seemed to dominate the discussion.

Hive and Hadoop

Hive is a data storage system developed on top of Hadoop with its own query language (HiveQL), built by Facebook. The goals are a bit different from HiveDB – HiveDB is more for OLTP, while Hive is more for large-scale analytics. Being built on top of Hadoop, HiveDB is much more batch oriented. Facebook uses it for doing analytics, data-mining, and machine-learning of their user and transactional data sets (logs, user activities, etc.) to mine out aggregate and trending intelligence from the large data set.

Interesting fact: Facebook’s Hive sees 2Tb of growth per day.

Building Scalable Web Applications with Google App Engine

PaaS stacks like GAE take a more managed environment approach than the more raw or primitive services provided by AWS style on-demand services. The two fit into different use cases though and, IMO, one will not necessarily eliminate the other.

GAE takes away from you all the concerns about deployment, production architecture, system management or administration. It gives you a data store with an OO API, and a web-app development environment that you develop your application within. There are things you can’t do, for example, you can’t run arbitrary software or services on GAE like you can on the more machine-image based cloud services (AWS EC2).

What you gain from giving up those capabilities is Google’s infrastructure for scaling, it becomes your infrastructure for scaling. Your app is designed in a pseudo-functional way – the stack encourages you to design your app to perform all dynamism at put/post time and to just render/display at get time. This approach helps with the scaling of the system. Storage location transparency helps with spooling up other instances of the app in disparate data centers, etc.

This kind of stack really makes it easy to develop the most common case of web applications – it is both easy to do and it scales. This is a combination that you rarely see in a platform or technology.

I see these kinds of stacks as becoming more established and a large part of Internet based application development – I think that more organizations will offer these kind of stacks across more technologies.

My advice: You should sign up for an account and try GAE out.

Developing and Deploying Java applications on Amazon EC2

Chris Richardson has created cloud-tools, a package of utilities (and a maven plug-in) for provisioning EC2 instances, pushing your application up and executing tasks across your cluster of instances. The tools look like they make it very easy to get your Java app into EC2.

Conclusion

The main theme I took away from it is that on-demand computing is a continuing trend. Services will continue to appear and be developed that will make taking advantage of these resource pools easier and more cost effective.

The trend for physical data centers will continue to become more and more outsourced to organizations that can provide those services with greater economy of scale. Currently Amazon’s offerings are slightly more expensive than a hosted system that you own – in the case where you need up-time or have high constant utilization. More guidelines are being developed showing when the trade off is appropriate. As a trend, cloud computing is still new and not well defined – it is likely that these trade offs will shift – even as soon as over the next few years (eg: it is likely, in my opinion, that the raw cost of 24×7 allocation for SMBs will fall below the cost of ownership due to these on-demand provider’s economies of scale).

We’re past the point of asking if your organization can make use of these on-demand providers and to the point where you should be identifying the areas where you can realize savings by taking advantage of these services.

Kyle Burton, 21 Oct 2008 – Malvern PA

Tags: