In January I had the chance to attend Storage Field Day 19 in Santa Clara, where we got to meet with a wide variety of startups and large enterprise storage companies. One of the more interesting companies we meet with was MinIO which has a really interesting and compelling object-based storage product.
I’ve talked about object storage here before, but it’s a very different paradigm than the traditional block based storage you may currently be using. With block storage files are split into evenly sized blocks of data (typically somewhere between 64 KB and 1 MB depending on your vendor). Data protection is provided by traditional RAID options.
Object storage on the other hand doesn’t split files into blocks. Files are stored as objects which contain the file data, metadata, and a unique identifier. There is no limit on the size or amount of the metadata associated with the file. If you have ever created a managed disk in Azure, taken a backup to URL, or used an Azure SQL Database you’ve used object based storage. In object based storage, redundancy is generally provided by maintaining three copies of the object (e.g. a write isn’t considered complete until it writes to all three copies).
Object storage is designed to solve problems of scale. One of the things I learned at Comcast was that the cost of SAN storage didn’t scale to some of the massive petabyte scale data problems we had. The management overhead, the cost, and sometimes even the storage itself does not scale. This is a problem largely for companies like Microsoft, Amazon, Google, Facebook, etc, who have massive amounts of data to store. But as data volumes grow there are lots of other firms who have very large volumes that they need to manage.
MinIO is a firm that offers such a solution. MinIO offers open source storage management software that offers extremely fast (183 GB/s reads and 171 GB/s writes). It is fully compatible with Amazon’s S3 API, which has somewhat become the de facto standard for object storage. They were working on Azure Blob Storage support when we visited.
One of the ways MinIO is able to get such good performance out of pretty standard hardware is by taking advantage of SIMD processor instructions, which all more text and number crunching to be performed per CPU instruction which dramatically increases performance. SQL Server uses this through the query processor’s use of batch mode.
MinIO’s storage can also be used as a persistent store for Kubernetes (drink), or used for systems like Spark, TensorFlow, and a replacement for Hadoop HDFS. Where you would probably use this in your environment would be to replace your file servers, or as a target for container storage, or maybe even an analytic store. Or you want to become a cloud storage provider and you need to host 50 PB of data in your data center.