Skip to content

RuntimeError: CUDA error: out of memory while trying mario_rl_tutorial.py #1620

Closed
@marcmolla

Description

@marcmolla

When I tried the mario_rl_tutorial.py using CUDA, I received an out of memory error:

RuntimeError: CUDA error: out of memory

Monitoring the GPU memory I saw a constant incremental of the use of its memory, specially when the Agent starts the training. As a workaround I changed the cache method of the Agent, as it seems that all the experiencies are sent to the GPU:

        if self.use_cuda:
            state = torch.tensor(state).cuda()
            next_state = torch.tensor(next_state).cuda()
            action = torch.tensor([action]).cuda()
            reward = torch.tensor([reward]).cuda()
            done = torch.tensor([done]).cuda()
 (...)

        self.memory.append((state, next_state, action, reward, done,))

Instead of that, I modified this piece for using CPU and changed the recall method, to send at this moment the batch of experiencies to GPU:

    def recall(self):
        """
        Retrieve a batch of experiences from memory
        """
        batch = random.sample(self.memory, self.batch_size)
        state, next_state, action, reward, done = map(torch.stack, zip(*batch))
        return state.cuda(), next_state.cuda(), action.squeeze().cuda(), reward.squeeze().cuda(), done.squeeze().cuda()

Not sure if this is the right approach, but it works. However, it might impact on the learning performance.

Could anyone help here?

cc @vmoens @nairbv

Metadata

Metadata

Assignees

Labels

CUDAIssues relating to CUDAdocathon-h1-2023A label for the docathon in H1 2023easy

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions