Skip to content

About your DDP code #22

@chaolongy

Description

@chaolongy

Thank you very much for your excellent work, I had the following problems in reading your training code and test code:
In 'train.py' file, val_sampler=torch.utils.data.distributed.DistributedSampler(val_data) , with dist.all_reduce(intersection), dist.all_reduce(union), dist.all_reduce(target). In 'test.py' file, val_sampler=None, with dist.all_reduce(output_3d).

My question:

  1. Why is the sampler inconsistent here?
  2. I found that performance did not change when val_sampler=None. What is the significance of dist.all_reduce() here?
  3. I found that when val_sampler=torch.utils.data.distributed.DistributedSampler(val_data) and dist.all_reduce() was not used, mIOU increased during testing. Why is this?
  4. The last question is that in the 'train.py' file, the "intersectionAndUnionGPU" function in the ‘util.py’ file is used, while in the 'test.py' file, the ‘evaluate’ function in the ‘iou.py’ file is used. What are the essential differences between the two evaluation metrics in terms of application?

I look forward to hearing from you and thank you again for your excellent work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions