BrandPost: Putting Containers to the Test in a Deep Learning Solution

Containerization simplifies the task of managing and distributing software. It bundles up applications and all their software dependencies in portable packages that can be moved easily from system to system.

This approach to software distribution simplifies IT operations. Software development teams can put all the pieces and parts together into a package that is ready to run on compatible hardware. IT administrators can focus on the infrastructure that will run the containerized application, without considering software issues.

Those are the upsides of containerization. But then there is a question about the potential for  performance  penalties that come with containerized applications. In the IT world, there is a perception that abstraction can lead to degradation of performance.

We put this perception to the test in the Dell EMC HPC and AI Innovation Lab. And, to cut to the chase, our tests showed that software can be containerized with no significant performance penalties.

The tests

For our tests, we used the Dell EMC Ready Solution for AI – Deep Learning with Intel. This CPU-based scale-out solution provides a flexible platform for training a wide variety of neural network models with different capabilities and performance characteristics. The platform utilizes Nauta, an open source deep learning training platform built on cloud native technologies, such as Docker and Kubernetes. Nauta provides a simplified software environment that can be easily customized to suit whatever requirements the data scientist has.

We measured and analyzed the performance of this solution using three different deep learning training use cases:

  • Image classification using convolutional neural networks
  • Language translation using multi-head attention networks
  • Product recommendation using restricted Boltzmann machines

We chose the computational and workload diversity of these use cases to highlight the flexibility of the solution for applications across different customer segments and problem types. In all three use cases, our tests demonstrated near-linear scaling in performance up to the full size of the solution. We encountered no performance penalties.

Additional tests we performed on analogous hardware in the Dell EMC Zenith cluster in our lab showed that the solution can scale all tested use cases beyond 16 compute nodes. These tests confirmed that IT organizations can scale the solution as their compute requirements grow, without taking a performance hit.

Key takeaways

In our lab tests, we demonstrated that — in addition to greater flexibility for the data scientist who is training models — the use of containers does not adversely affect the performance of the solutions we examined. In fact, we even found that, in some cases, organizations can expect better performance from containerized workloads on the solution than they could expect from the same hardware deployed in a bare metal configuration.

So, where do we go from here? Our successful tests suggest that we can explore the use of containers for other performance-critical use cases, such as high performance computing and financial transactions. And we can think more broadly about containers to answer questions like

  • Can we run parallel applications inside containers? and
  • Can we use containers to achieve simplified use and management across the computing spectrum?

These questions are worth pursuing as we go forward into a world where containers are sure to be used more broadly.

To learn more

Lucas Wilson, Ph.D., is an artificial intelligence researcher and lead data scientist in the HPC and AI Innovation Lab at Dell EMC.


Leave a Reply