I am trying to understand the lifecycle of the cuda graph memory pool, e.g., as interacted with in _cuda_releasePool. In particular, the implementation of _use_cuda_memory_pool_manager seems to suggest that the pool is released at the end of the context manager. But in the call sites of this function, it doesn't seem that the pool should be released (especially when we are allocating inputs for the cuda graph region). What is going on?
You're right to notice this apparent discrepancy. Let's break down what's happening with the CUDA graph memory pool lifecycle:
- The
_use_cuda_memory_pool_managercontext manager is indeed calling_cuda_releasePoolat the end of its context. However, this doesn't necessarily mean the pool is being destroyed or fully released. - The
_cuda_releasePoolfunction is part of the CUDA caching allocator's API, and its behavior is more nuanced than just destroying the pool:- It decrements the reference count for the pool.
- If the reference c