CloudBase Meetup – relational databases don’t scale

Last night I attended the CloudBase Meetup at Hackney Community College that was arranged by Shawn of Tech Meetups (well done Shawn). It was a good venue if rather hard to find and escape from.

The speakers were:

Steve Caughey, Arjuna
Francoise Dechery, CloudBees
Alvin Richards, MongoDB
Richard Davis, ElasticHosts
Kjetil Olsen, Elance

Steve Caughey discussed the future of cloud service providers. He argued that utility computing will not look like other utilities such as electricity supply because user’s service requirements are “infinitely variable” and there will be a multitude of resellers will be required to meet them.

I am not totally convinced; his slide listed 7 service parameters that could be varied. If each of these has a bronze, silver and gold option the total service menu is only ~ 2000 items. Once could easily see three large players say IBM, Dell and Amazon competing to supply these services globally with an economy of scale that crushes any competition.

However it is possible that his view is correct and a vast variety of services will be provided by a multitude of providers; which brings us on nicely to CloudBees.

CloudBees delivers J2EE development and deployment services on the cloud. They provide Platform as a Service running on Amazon EC2 as Infrastructure as a Service. The headline offering is a Git repository connected to a Jenkins (nee Hudson) continuous integration and build server which then deploys onto a scalable Tomcat cluster (lots of other options exist). It all seemed very sensible and when my internal development needs a Jenkins system I may migrate from naked Amazon to their services.

That’s the J2EE bit but what about the database? Alvin Richards from MongoDB was here to help us. His talk was excessively partisan and not very useful to anyone who did not already know the design goals of NoSQL databases. However there were some useful take-aways.

he worked on the Oracle kernel for 16 years and speaking from experience he knows that eventually the only way you can scale up the write performance of a relational database is to remove transactions and foreign key relationships.

Since transactions and foreign keys are the main reason for using a conventional relational database you are better off giving up and starting again. This is what he and his colleagues have done with MongoDB. You can visualise a MongoDB database a collection of JSON documents. New writes are replicated to all nodes of the cluster eventually. If you are not familiar with NoSQL systems you may wish to read Nathan Hurst’s excellent guide.

MongoDB scales very well

He showed lots of statistics that showed that there are some very big MongoDB users and it works well and easily. As long as you don’t need transactions, or to know the value returned by the database is correct or reporting…

This brought as very neatly onto the talk by Richard Davies of ElasticHosts. ElasticHosts is a valued added cloud service provider. The value is added by Richard and his colleagues who help you build systems that will scale. This contrasts with Amazon whose only relationship is with your credit card.

Richard discussed the difficulties of optimising databases on a shared infrastructure. To cut a long story short – you can’t.

The bottleneck is disk access. A physical head has to move across a spinning platter. In a shared system this head could be jumping around all over the disk as is fulfils requests from multiple users. It is reasonable to expect that a £ 100,000 SAN will manage access better than a £ 5000 server but don’t expect too much from vanilla cloud hosting such as Amazon EBS.

To paraphrase Richard’s advice:

Use web caching where you can
Where you cannot use web caching change it so you can
Try even harder to use more caching
[Use memory caching]
Don’t ever write to disk

OK. I lied with the last one but think about that little disk head running about. Every log file, every temporary image, every database write makes that disk head move. Remove every disk write you don’t need for production purposes.

Having done this over spec. your database servers and load test them. This is essential because any attempt to scale relational database servers while they are running at peak load is likely to fail. This is not so true of NoSQL databases because they have been designed with this in mind.

I should say at this point that the power of even a simple PostgreSQL cluster (for example) is astonishing and for many applications will easily satisfy global demand. Scaling databases is a problem of success and when you have it there are plenty of people who will help you solve it.

The last talk was from Kjetil Olsen at Elance. Elance is a low friction way to outsource jobs, often to smart people in low wage countries. They have a lot of neat software tools to aid remote working and it looks very good. Just like eBay a reputation system ensures that buyers get a good deal and suppliers get paid.

I think that is important to understand that Elance is much more than a supplier of cheap labour. It is a market and as such provides discovery, pricing, payment and enforcement services. The discovery element could be particularly important to companies looking for sales agents in Latin America, for example.

Overall it was a very good meeting and I would like to thank Shawn and the speakers.

CloudBase Meetup – relational databases don’t scale

Share this:

Leave a comment Cancel reply