GPU Programming

GPU and other AI accelerators have enabled the rapid and efficient training and deployment of large models. In order to deliver a fast and efficient ML model, it is essential that we understand how it operates on a GPU and how to utilize the GPU for optimizations. In this post, we are going to explore the fundamentals of GPU programming and investigate how it can be employed to expedite machine learning tasks.

Pre-requisites

C++ Review
- Effective Programming In C And C++
Parallel Computing
Integration with PyTorch
- Writing CUDA Kernels for PyTorch
- Custom C++ and CUDA Extensions

References

GPU programming related news and material links
Material for gpu-mode lectures
ML Systems Onboarding Reading List
常见Kernel 优化