Skip to content

[Feature Request] Calculating FLOPs for computational graph operations #5013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MG2033 opened this issue Feb 2, 2018 · 28 comments
Closed

[Feature Request] Calculating FLOPs for computational graph operations #5013

MG2033 opened this issue Feb 2, 2018 · 28 comments
Labels
feature A request for a proper, new feature. high priority module: performance Issues related to performance, either of kernel code or framework glue quansight-nack High-prio issues that have been reviewed by Quansight and are judged to be not actionable. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@MG2033
Copy link

MG2033 commented Feb 2, 2018

Feature Request: Please consider adding a floating point operations calculator for computational graph operations.

We'd like to use it for the deep learning models.

cc @ezyang @gchanan @zou3519 @bdhirsh @heitorschueroff @VitalyFedyunin @ngimel

@MG2033 MG2033 changed the title Calculating FLOPs of computational graph operations Calculating FLOPs for computational graph operations Feb 2, 2018
@MG2033 MG2033 changed the title Calculating FLOPs for computational graph operations [Feature Request] Calculating FLOPs for computational graph operations Feb 2, 2018
@soumith
Copy link
Member

soumith commented Feb 2, 2018

@cpuhrsch this is something you were interested to have as well right? also @shubho

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Feb 2, 2018

@soumith, this will be important, but it's low priority for now as we're still experimenting outside of pytorch. We're setting something up similar to this, but not as part of any framework for now. Maybe we can port that code to pytorch later on, but I can't promise anything.

@shubho
Copy link

shubho commented Feb 2, 2018

Would be nice to track which networks do well on which processors - and also roll up datacenter level metrics. @cpuhrsch are you building a tool outside of PyTorch that will take in a model specification and emit these numbers for every op?

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Feb 2, 2018

@shubho for now we're doing this one op at a time. For example, we're building it out far enough to optimize a function such as torch.sum, but within C++ only. Presumably, as we work our way through this, a tool that will create these (and many more) metrics per op for a model could emerge. For now that'd be over-engineering our tooling.

@ahirner
Copy link
Contributor

ahirner commented Feb 4, 2018

I've come across this script to count ops and params for basic layers. It's reasonably advanced (most 2D convs and pooling). Maybe it compliments your porting efforts @cpuhrsch.

@warmspringwinds
Copy link
Contributor

If someone still needs this, we wrote a small script to do that:

https://github.com/warmspringwinds/pytorch-segmentation-detection/blob/master/pytorch_segmentation_detection/utils/flops_benchmark.py

Example of usage:

https://github.com/warmspringwinds/pytorch-segmentation-detection/blob/d5df5e066fe9c6078d38b26527d93436bf869b1c/pytorch_segmentation_detection/recipes/pascal_voc/segmentation/flops_counter.ipynb

@sovrasov
Copy link

Also I've improved the above mentioned scripts to support grouped convolutions and several other layers:
https://github.com/sovrasov/flops-counter.pytorch

@laoreja
Copy link

laoreja commented Feb 2, 2019

We can compute FLOPs of conv layers by hand or by existing repos easily.
But is there a way to compute FLOPs for non-traditional operations like topk, sparse sum and indexing, etc.?
Thanks!

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Feb 4, 2019

@zheng-xq - Is this something that could be of interest to you?

@ttumiel
Copy link

ttumiel commented Mar 10, 2019

Came across this library which also counts number of parameters and FLOPs used by a model given an input.

@matyasfodor
Copy link

I came across this repo: https://github.com/ceykmc/pytorch_model_summary AFAIK it's not distributed on pypi, but it gives you a pretty detailed summary.

@VitalyFedyunin VitalyFedyunin added triage review feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Apr 17, 2019
@matyasfodor
Copy link

I just found this: Autograd profiler I wonder if this can be used for calculating the FLOPs and memory usage. I tried the chrome trace export, but I'm not yet sure how to get something useful out of it.

@bhack
Copy link
Contributor

bhack commented Jun 7, 2019

Just to collect another solution https://github.com/Tramac/torchscope.

But it could be better if there will be an off the shelf solution directly in pytorch.

@ezyang
Copy link
Contributor

ezyang commented Jun 10, 2019

@pietern says, this has essentially already been implemented on the Caffe2 side; there's a lot of overlap here.

@cpuhrsch
Copy link
Contributor

cc @ilia-cher

@ruiyuanlu
Copy link

so ... any plan to provide such calculator in the next version pytorch? It would be nice to have an official tool to calculate FLOPS. 3rd party implementations might not support the extension of pytorch ops properly.

@ruiyuanlu
Copy link

well you know, something like

tf.profiler.profile(g, run_meta=run_meta, cmd='op', options=opts)

would be very helpful.

@harryherold
Copy link

@ruiyuanlu Is it also working with Keras?

@matkalinowski
Copy link

Are there any plans on implementing that idea?

@ezyang ezyang added module: performance Issues related to performance, either of kernel code or framework glue and removed triage review labels Aug 24, 2020
@ezyang
Copy link
Contributor

ezyang commented Aug 24, 2020

Bumping priority so we don't forget about this.

@mruberry mruberry added the quansight-nack High-prio issues that have been reviewed by Quansight and are judged to be not actionable. label Nov 6, 2020
@mruberry
Copy link
Collaborator

mruberry commented Nov 6, 2020

cc @ngimel, who's been doing work in this area

@ilia-cher
Copy link
Contributor

cc. @xuzhao9 is working on adding flops into profiler output

@ngimel
Copy link
Collaborator

ngimel commented Nov 6, 2020

#46506 is the profiler PR.

@ppwwyyxx
Copy link
Collaborator

In fvcore we created the first flop counter which has two following features together:

  • counts flops at operator level (thus more accurate than existing module-level counters)
  • provide module-level aggregations (thus more friendly to users)

We gave an introduction of it at this documentation: https://github.com/facebookresearch/fvcore/blob/master/docs/flop_count.md

@Jackiexiao
Copy link

Jackiexiao commented Sep 3, 2022

a simple compare with different flops count lib, in short, I recommand fvcore

https://gist.github.com/Jackiexiao/2b053cd52d977d86e07d664688c0a7ee

@nkalpakis21
Copy link

i see this has been open for awhile. is there trouble getting a dev to fix this issue? what is blocking this for so long

@zou3519
Copy link
Contributor

zou3519 commented Sep 24, 2024

Can we close this @Chillee @albanD? afaict this has been implemented (https://gist.github.com/Chillee/07b36672a0ca2d1280e42b8d10f23174)

@Chillee
Copy link
Collaborator

Chillee commented Sep 24, 2024

Yes, we have implemented a flop counter in core.

from torch.utils.flop_counter import FlopCounterMode

from torchvision.models import resnet18
model = resnet18().cuda().half()
inp = torch.randn(128, 3, 224, 224, device='cuda', dtype=torch.half)

with FlopCounterMode():
    model(inp).sum().backward()
image

@Chillee Chillee closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. high priority module: performance Issues related to performance, either of kernel code or framework glue quansight-nack High-prio issues that have been reviewed by Quansight and are judged to be not actionable. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests