Understanding GPU Evolution and Deployment in Datacenter or Cloud

Chips, Servers and Artificial Intelligence

In the current hype of Artificial Intelligence we find ourselves working a lot more with GPUs which is very fun! GPU workloads can be run on physical servers, VMs, or containers. Each approach has its own advantages and disadvantages.

Physical servers offer the highest performance and flexibility, but they are also the most expensive and complex to manage. VMs provide a good balance between performance, flexibility, resource utilization, and cost. Containers offer the best resource utilization and cost savings, but they may sacrifice some performance and isolation.

For Cloud we often recommend Containers since there are services where you can spin up, run your jobs and spin down. A Physical VM running 24-7 on cloud services can be expensive. The performance trade-off is often solved by distributing load or development.

The best deployment option for your GPU workloads will depend on your specific needs and requirements. If you need the highest possible performance and flexibility, physical servers are the way to go. If you are looking to improve resource utilization and cost savings, VMs or containers may be a better option.

Deployment	Description	Benefits	Disadvantages
Physical servers	Each GPU is directly dedicated to a single workload.	Highest performance and flexibility.	Highest cost and complexity.
VMs	GPUs can be shared among multiple workloads.	Improved resource utilization and cost savings.	Reduced performance and flexibility.
Containers	GPUs can be shared among multiple workloads running in the same container host.	Improved resource utilization and cost savings.	Reduced performance and isolation.

Additional considerations:

GPU virtualization: GPU virtualization technologies such as NVIDIA vGPU and AMD SR-IOV allow you to share a single physical GPU among multiple VMs or containers. This can improve resource utilization and reduce costs, but it may also reduce performance.
Workload type: Some GPU workloads are more sensitive to performance than others. For example, machine learning and artificial intelligence workloads often require the highest possible performance. Other workloads, such as video encoding and decoding, may be more tolerant of some performance degradation.
Licensing: Some GPU software applications are only licensed for use on physical servers. If you need to use this type of software, you will need to deploy your GPU workloads on physical servers.

Overall, the best way to choose the right deployment option for your GPU workloads is to carefully consider your specific needs and requirements.

Here is a table summarizing the key differences and benefits of each deployment options:

Characteristic	Physical server	VM	Container
Performance	Highest	High	Medium
Flexibility	Highest	High	Medium
Resource utilization	Medium	High	Highest
Cost	Highest	Medium	Lowest
Complexity	Highest	Medium	Lowest
Isolation	Highest	High	Medium