I was lucky enough to attend StrataEU 2017 on behalf of phData and one of the sessions was Deploying and managing Hive, Spark, and Impala in the public cloud led by Philip Langdale, Eugene Fratkin, and Jennifer Wu. I assumed this was a Cloudera Director session which we have lots of experience with, but I decided to pop my head in anyway. I was quite surprised when they were actually providing a session on Cloudera Altus, the SaaS transient Hadoop offering created to compete with AWS EMR.
I’ll provide a “First Look” of Altus below.
Creating an Altus Cluster
Despite some interesting issues with Conference Wifi, the session proceeded well. I was able to easily create an Altus cluster and run several jobs on it. Below is a screen of my cluster being created:
Unlike Cloudera Director, this is a true SaaS offering with Cloudera hosting the control plane while it creates the infrastructure inside of the user’s AWS account using secure delegation. Similar to AWS, Cloudera has control plane access to the nodes created by Altus.
Once the cluster is created, you can access CM however you’d normally access your VPC, typically VPN or Direct Connect, or via a Socks Proxy which is automatically created via the altus command:
The Jobs I Ran
This automatically opens a Google Chrome browser with CM, so you can monitor the job:
Here are two screenshots of my job in the Altus UI. My jobs:
My specific job:
Additional features are going to be released soon, including spot instances and troubleshooting analytics.
Having managed and worked on AWS EMR clusters and had this early look at Altus, here is how I’d compare Altus to AWS EMR on a scale of Good, Better, and Best:
Feature / Aspect | Cloudera Altus | AWS EMR |
Ease of Use | Best – UI/CLI is solely focused on the workload as opposed to being a general UI/CLI. | Better – functional UI and CLI. |
Cluster Start Performance | Good – both use pre-baked AMIs so Altus “feels” about the same as EMR and the underlying primitives are the same. | |
Troubleshooting | Better – provides CM which is a fantastic troubleshooting tool and provides log evacuation. Cloudera also provided a sneak peak of the next set of features in this area and it looked very interesting. | Good – EMR has a basic management tool which provides links to the various UIs and allows for s3 log evacuation |
Scaling | N/A – cluster size is fixed. I am confident more is coming here. | Best – allows many different scaling configurations both automatic and manual |
Security | Good – same as EMR today. Security is a core part of Cloudera’s traditional offering, so I’d expect more to be coming and with CM they already have the automation to do the hard bits. | Good – coarse grained but strong security. |
Breadth | Better – right now they only provide access to a subset of services like Hive, Spark (1,2), and MR but this will expand given the strength and breadth of Cloudera’s Hadoop distribution. | Better – AWS’s distribution of Impala is far out of date, but they ship Spark 2 along with Hive, Hue, HBase etc and these are up to date. |
Maturity | N/A – brand new | Best – been around a long time |
Overall, I am bullish on this offering since I feel Cloudera really understands Big Data Management better than AWS and they seem to have made excellent progress in this first release.