Hello everyone, in this post we will look at implementing storage tiers in our Hadoop HDFS cluster. This feature has been supported since Hadoop v2.3, and has gained a huge popularity at present, especially when companies doesn’t want to invest heavily on compute and storage to handle COLD Data.
Storage tiering gives us the flexibility to use high density low compute boxes to store the cold data. You could also tier your COLD data to an online archival platform (like Amazon S3 bucket, for example, with the S3 FSFuse connector). I will go through this online archival on my next post.
Before we start lets ask ourselves a few questions?
- Are our bigdata spread homogeneously on the storage?
- What is the system you follow to classify you bigdata in HDFS?
- Do you use nodes with same compute capacity for HOT ,WARM and COLD data?
- Do you spend heavily on handling COLD data storage?
- How long a HOT data needs to remain HOT? When can it go COLD? Where to store when it is COLD?
Eventhough Tiering is not uncommon with enterprise grade storage systems, most implementations of HDFS are not decoupled between compute and storage. That means we have little flexibility to scale compute and storage separately.
However there are implementations that allows us to decouple compute and storage allowing us to scale independently, and tiering would be managed by the enterprise grade storage. HDFS tiering is not necessary in such implementations. For eg: Using DELL EMC Isilon FS in place of HDFS gives us that flexibility (but understand the difference with HDFS). Also technologies like HDFS Virtualization (Eg: the DataTap feature from Bluedata) helps to achieve this, but this is outside the scope of this post. I will do a separate post later.
Now lets move on,
Storage tiering is a mechanism to classify your big data and rotate to the appropriate storage subsystem. It also gives us a flexibility to decouple cold data from the compute capacity. This means that unlike traditional Hadoop implementations, you don’t need identical hardware for your data nodes. This gives us an option of Node classification. ie high compute SSD nodes for HOT data, Medium compute 15K SAS HDD nodes for Warm data and low compute high density nodes for archival storage.
Its recommended that we use identical nodes across tiers – ie, avoid mix and match of disk types in a single node – so as to give more flexibility in the scaling up of a tier.
Also note that Storage Tiering is not supported with HDFS Federation scenarios (where you have multiple nameservices in your HDFS cluster). HDFS Federation relies in viewFS to hierarchically interconnect the namespaces which the storage tiering daemon can’t recognize.
Currently HDFS supports the following storage types:
- DISK – This is the default storage type. Useful for WARM data
- ARCHIVE – Archival storage is for very dense storage and is useful for rarely accessed data
- SSD – Useful for HOT data
- RAM_DISK – This is an in-memory storage tier to accelerate single replica writes
The disks are classified by HDFS through tags while you add them as HDFS storage to the data nodes. Make sure you tag them properly to avoid ending up with low performance disastrous situation.
Adding heterogeneous storage to the Data nodes
The steps vary depending upon the Hadoop distribution and management software used. I already got a CDH implementation, hence in this example, I use Cloudera manager.
Below is the disk configuration of a data node belonging to the SSD Tier. The disks are tagged as [SSD], hence HDFS uses these disks for HOT data based upon the storage policies set.
Once you have prepared the nodes with [SSD], [DISK] and [ARCHIVE] tags, you are now fine to set storage policies on your HDFS data.
Note that you might need to restart your data nodes for HDFS to recognize the heterogeneous storage.
Currently you have 6 predefined policies to apply on your HDFS data. As of now, you don’t have the flexibility to create your own policies.
- Hot – All replicas are stored on DISK.
- Cold – All replicas are stored ARCHIVE.
- Warm – One replica is stored on DISK and the others are stored on ARCHIVE.
- All_SSD – All replicas are stored on SSD.
- One_SSD – One replica is stored on SSD and the others are stored on DISK.
- Lazy_Persist – The replica is written to RAM_DISK and then lazily persisted to DISK.
To view the policies, you can use the below HDFS command:
Storage policy fallback
Each policy has got a fallback storage type as you can see above. Fallback storage is essential as a backup storage whenever a specific storage type is unavaiable or is full. For eg: the policy All_SSD uses DISK as the fallback storage for both blocks and replica.
Setting a Storage Policy
When a file or directory is created, its storage policy is unspecified. You need to specify storage policy to the required directories manually using the “storagepolicies -setStoragePolicy” command.
Note : Current implementation of HDFS Storage tiering is not Intelligent. You need to manually set the policies for relevant paths to tier your bigdata.
You can view your applied policies to directories using the “storagepolicies -getStoragePolicy” command.
Now that we added an All_SSD policy to our data. Similarly classify your HOT and COLD data and manually apply the policies. You could either wait for the rebalance to happen or use the “mover” tool to tier your data.
Migration using the “Mover” tool
The tool is similar to Balancer. It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. Note that it always tries to move block replicas within the same node whenever possible. If that is not possible (e.g. when a node doesn’t have the target storage type) then it will copy the block replicas to another node over the network. In our case we use dedicated nodes for the particular storage tiers.
Note that “mover” will only blocks to satisfy the storage policy. HDFS “balancer” will ensure that the block placement is satisfied as per the configured threshold. In case, rack awareness is enabled, no more than 2 copies of a block isn’t placed in the same rack. If sufficient storage type isn’t avaiable, then “mover” used fallback options to satisfy the block placement.
To confirm that the block placement is successful, you can run the hdfs fsck utility to scan for the block locations
hdfs fsck /<location> -files -blocks
and you can navigate to Name node UI and see the storage type capacity there.
Likewise, create storage policies for your COLD data and run the mover utility to tier to Archival storage.
In the next post, I will explain how to implement an online archival mechanism (AWS S3 for example).
Thanks for reading