Talk of the Storage town today is Computational Storage.
In the previous blog, we saw the evolution of storage architectures and emerging storage architectures. And one of the widely talked topics is computational storage. If you would have got a chance to attend SNIA SDC USA 2020, you would have seen the inclination of the session and discussions towards this trending storage architecture.
In yet another previous blog, we saw the 5Ws that you need to know to understand everything you need to know about Computational Storage. We saw that Computational Storage is providing higher capacity solutions with lower power consumption, as we use distributed processing. As a result, Computational Storage provides improved efficiency and performance.
In one of the keynote sessions at SNIA SDC USA 2020, JB Baker from ScaleFlux talked about the advantages of reducing data movement in Computational Storage. He categorized the advantages into two – Saving Time and Saving Money.
- Saving Time: With growing storage media, interfaces, and networks, and speeding bandwidth, data movement is becoming sluggish. Moving Tera Bytes and Zetta Bytes of data to perform mission-critical tasks such as transactional processing, big data analytics, and machine learning, can become time-consuming and reduce efficiencies.
- Saving Money: Supporting infrastructure to handle all this massive data movement requires consistent investment, creating a lot of challenges for all those managing the data.
So, reducing data movement might help reduce processing time and infrastructure costs. This can be a boon for the IT department and data center architects. Further, Baker gave an example of utilizing Computational Storage to reduce data movement. He said that it would be through a Data Filtering Computational Storage Service (CSS). Let’s dig deeper into the example and results explained by Baker.
Let’s take a 12TB data set that represents all the transactions, worldwide purchases that happened over the past several years. Let’s say a data analyst needs to run a query that covers just 4 months in 2016, rounding up to only 100 GB of data relevant to this query, which is <1% of the entire data set. With ordinary storage, we might have to take all 12TB and push it up through the CPU, which is an invitation to the bottleneck up there to do that filtering and then complete the query.
Instead, if we implement data filtering CSS down at the drives, it filters out the relevant data before it even leaves the drive. So we will have to move only that 1% of data relevant to the query across the PCI bus and to the CPU. This will reduce the total data movement by 99% in this case and even reduce the data processing by the CPU to finish the query, resulting in a faster query completion time. This enables more queries to run in parallel and scale more rapidly. Baker also supported the theory with the practical implementation of the above example by measuring the data movement, CPU utilization, and query completion time for ordinary storage and data filtering CSS.
The results were as below:
|Ordinary Storage||Data Filtering CSS|
|Data Movement||High bandwidth for a very long period to move massive data||High data movement, but for a very short period|
|CPU Utilization||Experienced bottleneck||CPU scaled nicely due to less data movement|
|Query Completion Time||Slower query completion||Rapid query completion (2-4 times faster than the usual)|
This example clearly explains the potential benefit of using Computational Storage and Data Filtering Service over ordinary storage.