Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance in Windows 10 #6842

Open
mrahmadt opened this issue Jun 27, 2024 · 2 comments
Open

Performance in Windows 10 #6842

mrahmadt opened this issue Jun 27, 2024 · 2 comments
Assignees
Labels
bug report Bug is reported by user, not yet confirmed by the core team needs discussion Core developers need to discuss the issue

Comments

@mrahmadt
Copy link

Hello Everyone

Not sure if I'm doing something wrong or this is the default behavior. I have a server with following specs

32 x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz (2 Sockets)
Memory 125GB
SSD Disks

I installed Proxmox.com (proxmox.com) and created Windows 10 machine (32 Processors and 64GB Memory)

Everything is working fine in Orange, but "Test & Score" takes hours to process 600M CSV file with "Random Forest", and the strange thing, it's not utalizing the full CPU/Memory of the machine!

Anything I can do to make Orange use the full CPU/Memory resources?

Screenshot 2024-06-27 at 05 15 38

Orange 3.37.0 (Orange3-3.37.0-Miniconda-x86_64.exe)

@mrahmadt mrahmadt added the bug report Bug is reported by user, not yet confirmed by the core team label Jun 27, 2024
@thocevar
Copy link
Contributor

thocevar commented Jul 5, 2024

My experiments on Windows show that running cross validation or random sampling in Test & Score does not run the threads in parallel utilizing all CPUs.

Random forest and other trivially parallelizable methods (e.g. XGBoost) could be parallelized with the n_jobs parameter even for a single training/prediction call (such as testing on test data) but are not.

The combination of both could be problematic by spawning more threads than there are processors. The only exception that I found is Logistic Regression, which always utilizes all CPUs, but probably on some lower level.

This needs further discussion.

@thocevar thocevar added the needs discussion Core developers need to discuss the issue label Jul 5, 2024
@thocevar
Copy link
Contributor

Parallelization in Test & Score was intentionally removed in 1f8d008.

The easiest way of re-introducing parallelization would be on the level of individual models (e.g. random forest), where scikit-learn takes care of it (n_jobs=-1).

@janezd janezd changed the title Performane in Windows 10 Performance in Windows 10 Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team needs discussion Core developers need to discuss the issue
Projects
None yet
Development

No branches or pull requests

2 participants