Installing Hue on an Azure HDInsight 3.5 Cluster

Installing Hue on an Azure HDInsight Cluster

Lately I have been working a lot with Azure HDInsight - a semi-managed Hadoop cluster on the Microsoft Azure cloud. I say semi-managed because:

  • It's running on Azure IaaS VMs and you have the ability to select the size of your head and worker nodes; however, while you can logon to the nodes you cannot manage them in the same way though the portal as Normal VMs. For starters the VMs associated with the cluster nodes are not visible in the portal (you can however see the NICs and public IPs associated with the cluster - these show up in the resource group of the VNET that you deployed the cluster to).
  • Microsoft periodically patches and re-images nodes in the cluster
This blog post discusses an issue I found with the Hue installation Custom Script Action provided by Microsoft and how to resolve it. This post has been cross-posted to the GI Architects Blog.

Custom Script Actions

As a result of the re-imaging if you perform any manual software installation or configuration post-cluster deployment - all of those changes will be lost.  Microsoft support Custom Script Actions which can be used to install software or make configuration changes in your cluster using scripts that you develop, these Custom Script Actions can be ad-hoc e.g. run once, or persisted. Persisted Script Actions will be re-run when your cluster is re-imaged.

Hue Custom Script Action

Microsoft provided a Custom Script Action for installing Hue on your HDInsight clusters. Hue is a popular web based tool for querying data in your cluster using Hive, Spark, browsing the HDFS file system etc.


The script is linked to at the bottom of this page on the Microsoft Azure Content repo here.

The script works fine for HDInsight 3.4 clusters, however, when HDI 3.5 clusters became available I found that the cluster ARM template was failing on the Custom Script Action for installing Hue. After looking through the logs in the Ambari web interface and reviewing the code in the actual script I realised the reason it was failing was because:

  • The script was written assuming the init system in use was Upstart
  • One of the changes in HDI 3.5 is that the operating system has changed from Ubuntu 14.04.5 LTS to Ubuntu 16.04.1 LTS. Ubuntu 16.x no longer uses Upstart and instead uses systemd
  • The hue and webwasb service init scripts only worked with Upstart
I decided to fix this myself by adding logic that detects the operating system release and if it's 16.x uses systemd unit files that I wrote for Hue and webwasb. I have published the script to my GitHub Azure HDInsight repo.


Comments