About your DDP code

Thank you very much for your excellent work, I had the following problems in reading your training code and test code：
In 'train.py' file, val_sampler=torch.utils.data.distributed.DistributedSampler(val_data) , with dist.all_reduce(intersection), dist.all_reduce(union), dist.all_reduce(target). In 'test.py' file, val_sampler=None, with dist.all_reduce(output_3d). 

My question：
1. Why is the sampler inconsistent here? 
2. I found that performance did not change when val_sampler=None. What is the significance of dist.all_reduce() here? 
3. I found that when val_sampler=torch.utils.data.distributed.DistributedSampler(val_data)  and dist.all_reduce() was not used, mIOU increased during testing. Why is this?
4. The last question is that in the 'train.py' file, the "intersectionAndUnionGPU" function in the ‘util.py’ file is used, while in the 'test.py' file, the ‘evaluate’ function in the ‘iou.py’ file is used. What are the essential differences between the two evaluation metrics in terms of application?

I look forward to hearing from you and thank you again for your excellent work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About your DDP code #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About your DDP code #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions