Nutanix Objects Metadata sharding and hashing algorithms

Anirudha | Fri, 11/08/2019 - 03:31

Most of the distributed Object Storage systems metadata is stored on one to many nodes. Now when we talk about Metadata being stored on one to many nodes, the question is, how is it achieved and how does system knows which bucket/object metadata is stored on which node and how does the distribution logic works.

 

Distributed metadata

 

Above screenshot simulates, that metadata is stored on multiple nodes and each of these nodes are connected to each other over network.

While different storage system have diff advance algorithms to achieve this, but most of them work on similar logic. I am going to try explaining how this works which will give some idea on how metadata is sharded and will also give a few references to dig this in more details. And then we will take a look at how Objects is diff from other solutions.

Object storage system is meant to handle 100s of thousands of Buckets and billions of objects across the buckets. Considering such a huge scale, their metadata foot print will also grow significantly. Object storage also have tons of cool features such as “prefix based search, tagging, delimiter objects, Lifecycle policies, ACL etc…”, which generates more and more metadata to manage for the given bucket or object. As user enables different features, metadata footprint will grow. This metadata can practically grow beyond 100s GBs or beyond

Now consider if this all metadata goes to just one place, it will create hotspot which will become single point of bottleneck and failure point. You are going to see network bottleneck, storage bottleneck, bandwidth bottleneck, bringing your production workload at risk .

One of the easy way to avoid hotspots is sharding your metadata into smaller logical chunks/partitions. So instead of storing them on one node, you can divide it into multiple partitions and lets multiple metadata service/nodes host each partition. And developing advance algorithm which can distribute metadata and can understand how to efficiently find metadata for the given object. This avoids single point of failures and all other bottlenecks which we see in above case.

Dividing metadata into multiple partitions or logical chunks, is called Sharding. 

In this series we will look into these aspects and will explore how metadata is stored on distributed systems. And how Objects is different.