Remove DataPtr extractor from CUDAFuture #48840

lw · 2020-12-04T16:11:29Z

Stack from ghstack:

Remove DataPtr extractor from CUDAFuture #48840 Remove DataPtr extractor from CUDAFuture

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.

Differential Revision: D25334355

dr-ci · 2020-12-04T18:02:09Z

💊 CI failures summary and remediations

As of commit 53c2be9 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 15 times.

torch/csrc/jit/python/python_ivalue.h

mrshenli · 2020-12-08T19:49:11Z

aten/src/ATen/cuda/CUDAFuture.h

+    at::IValue::HashAliasedIValues sub_values;
+    // Prefer getSubValues() over visit() as the latter is a silent no-op for
+    // some unsupported types, whereas the former at least fails loudly.
+    value.getSubValues(sub_values);


Would I be correct if I assume this gonna extract all nested ivalues in the given ivalue object? Curious, does it make sense to let CUDAFuture::markCompleted (or other APIs) also take a vector of tensors? Asking because RPC already have access to the tensor vector and hence does not need to recursively visit all nested values again.

Would I be correct if I assume this gonna extract all nested ivalues in the given ivalue object?

Yes, indeed.

Note that markCompleted is also called inside then() with the result of the user-provided callback, which can return an arbitrary value. (Unless we explicitly lock that down and force users to also always return lists of tensors, is this what you're suggesting?)

Unless we explicitly lock that down and force users to also always return lists of tensors, is this what you're suggesting?

No, I am thinking about an additional fast path that can take a list of tensors. So users can either pass in an ivalue or ivalue + list of tensors to avoid traverse the ivalue again. Not sure if this makes sense.

I see, yes, that could be an option. We could either add such a method to ivalue::Future, or we could add it just to CUDAFuture: this second option has the problem that the new method wouldn't be "accessible" anymore once we downcast the CUDAFuture to an ivalue::Future, but that might not matter, because markCompleted is only called by whoever creates the object, and they have access to the specialized subclass pointer.

wanchaol · 2020-12-08T19:56:22Z

torch/csrc/jit/python/python_ivalue.h

@@ -24,6 +35,11 @@ struct C10_EXPORT ConcretePyObjectHolder final : PyObjectHolder {
    return py_obj_.ptr();
  }

+  IValue toIValue() override {


this function seems not valid for all cases? in theory PyObjectHolder could hold any wild py::object, even if the py::object cann't be infered as ivalue, but this function seems assuming the underlying py::object is an "ivalue" pyobj

If you have suggestions on how to make it valid for all cases I'd love to hear it! That's exactly what I'm trying to achieve here! For now, indeed, this only works for Python objects whose type can be inferred.

If you this it helps I can rename this method to toTypeInferredIValue, to clarify that this behaves like the global function torch::jit::toTypeInferredIValue, which also raises if it fails. WDYT?

Alternatively I guess I could make this return an optional<IValue> (I would have to reimplement some of the logic of torch::jit::toTypeInferredIValue). Would that work?

can we add two apis here for PyObjectHolder?

bool isIValue(): returns true only when we could infer a type by calling tryToInferType and match.success().

toIValue(): which does what you currently do

Then the callsites in getSubValues should be guarded by isIValue(), otherwise fallback to old behaviors. what do you think about this solution?

This makes sense to me, and I did something very similar to what you just suggested: instead of isIValue I added a tryToInferType method, which returns the TypePtr (or the reason this failed). I opted to go that way in order to keep the alignment between the global functions of torch::jit and the methods of ConcretePyObjectHolder (so that it's easy to see that they do the same thing).

wanchaol · 2020-12-08T21:00:00Z

aten/src/ATen/core/ivalue.cpp

@@ -156,6 +156,11 @@ void IValue::visit(const std::function<bool (const IValue &)>& visitor) const {
      }
      break;
    }
+    case Tag::PyObject: {
+      IValue contained_value = toPyObjectHolder()->toIValue();


what if this PyObject does not contains IValue, this will let this function crash?

Indeed, it will just propagate the error raised by the other method. For getSubValues (the function just below) I don't think this is a problem: until now that function was raising for all PyObjects, now it only raises for some of them. That's an improvement.

However I agree that for visit such a behavior could be considered a regression, since before PyObjects were silently skipped over. If you want I can leave the current behavior for visit. (I only need getSubValues for my use-case).

sounds good, let's leave the current behavior for visit to make sure no regression.

I removed that regression from visit. Now, visit will still try to convert a PyObject to an IValue, and if that succeeds it will recurse into that IValue. However, if it fails, it silently skips.

I think that this might be the best of both worlds: it adds an improvement to visit and keeps it aligned with getSubValues, but it also "preserves the spirit" of visit of avoiding hard failures.

lw · 2020-12-09T16:00:44Z

torch/csrc/jit/python/pybind_utils.cpp

+namespace torch {
+namespace jit {
+
+IValue toIValue(py::handle obj, const TypePtr& type, c10::optional<int32_t> N) {


I had to move this function to a .cpp file to break a circular dependency.

@wanchaol

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]

wanchaol · 2020-12-09T19:00:49Z

aten/src/ATen/core/ivalue.cpp

+      c10::intrusive_ptr<at::ivalue::PyObjectHolder> py_obj = toPyObjectHolder();
+      auto match = py_obj->tryToInferType();
+      TORCH_INTERNAL_ASSERT(match.success(),
+            "Tracer cannot infer type of ", py_obj->toStr(), "\n:", match.reason());


I guess this error message is copied from toTypeInferredIValue, but it's actually wrong as this might not only used in tracer, can you simply say "cannot infer type of.."?

wanchaol · 2020-12-11T19:35:47Z

aten/src/ATen/core/ivalue_inl.h

@@ -664,7 +637,32 @@ struct C10_EXPORT ivalue::Object final : c10::intrusive_ptr_target {
 // see concrete implementation in python_ivalue.h
 struct ivalue::PyObjectHolder : c10::intrusive_ptr_target {
 public:
+  struct InferredType {


this seems invalid to be put inside PyObjectHolder, as it's a generic type used by any cases (might be excluding PyObjectHolder sometimes). Can we maybe put this into jit_type.h instead?

@wanchaol

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]

@wanchaol

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]

@wanchaol

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]

@wanchaol

Pull Request resolved: #48840 The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. ghstack-source-id: 118704935 Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25334355/)!

wanchaol

assuming toIValue is just a move, other changes looks good to me. Thanks!

facebook-github-bot · 2020-12-19T19:09:58Z

This pull request has been merged in 1ac05cf.

Summary: Pull Request resolved: pytorch#48840 The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In pytorch#48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. ghstack-source-id: 118704935 Test Plan: Unit tests Reviewed By: wanchaol Differential Revision: D25334355 fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3

lw mentioned this pull request Dec 4, 2020

Avoid using FutureNCCL before it's ready #48561

Closed

facebook-github-bot added the cla signed label Dec 4, 2020

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 4, 2020

lw commented Dec 4, 2020

View reviewed changes

torch/csrc/jit/python/python_ivalue.h Outdated Show resolved Hide resolved

lw mentioned this pull request Dec 8, 2020

Drop FutureNCCL in favor of vanilla CUDAFuture #49014

Closed

mrshenli reviewed Dec 8, 2020

View reviewed changes

wanchaol reviewed Dec 8, 2020

View reviewed changes

lw commented Dec 9, 2020

View reviewed changes

wanchaol reviewed Dec 11, 2020

View reviewed changes

wanchaol approved these changes Dec 17, 2020

View reviewed changes

facebook-github-bot closed this in 1ac05cf Dec 19, 2020

facebook-github-bot added the Merged label Dec 19, 2020

facebook-github-bot deleted the gh/lw/103/head branch December 23, 2020 15:17

cpuhrsch mentioned this pull request Mar 22, 2021

Export torch::jit::toIValue #54448

Closed

Remove DataPtr extractor from CUDAFuture #48840

Remove DataPtr extractor from CUDAFuture #48840

Uh oh!

Conversation

lw commented Dec 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Dec 4, 2020 • edited by facebook-github-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Dec 19, 2020

Uh oh!

Uh oh!

lw commented Dec 4, 2020 •

edited

Loading

dr-ci bot commented Dec 4, 2020 •

edited by facebook-github-bot

Loading