Skip to content

Remove DataPtr extractor from CUDAFuture #48840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

lw
Copy link
Contributor

@lw lw commented Dec 4, 2020

Stack from ghstack:

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.

Differential Revision: D25334355

@dr-ci
Copy link

dr-ci bot commented Dec 4, 2020

💊 CI failures summary and remediations

As of commit 53c2be9 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 15 times.

at::IValue::HashAliasedIValues sub_values;
// Prefer getSubValues() over visit() as the latter is a silent no-op for
// some unsupported types, whereas the former at least fails loudly.
value.getSubValues(sub_values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would I be correct if I assume this gonna extract all nested ivalues in the given ivalue object? Curious, does it make sense to let CUDAFuture::markCompleted (or other APIs) also take a vector of tensors? Asking because RPC already have access to the tensor vector and hence does not need to recursively visit all nested values again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would I be correct if I assume this gonna extract all nested ivalues in the given ivalue object?

Yes, indeed.

Note that markCompleted is also called inside then() with the result of the user-provided callback, which can return an arbitrary value. (Unless we explicitly lock that down and force users to also always return lists of tensors, is this what you're suggesting?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we explicitly lock that down and force users to also always return lists of tensors, is this what you're suggesting?

No, I am thinking about an additional fast path that can take a list of tensors. So users can either pass in an ivalue or ivalue + list of tensors to avoid traverse the ivalue again. Not sure if this makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yes, that could be an option. We could either add such a method to ivalue::Future, or we could add it just to CUDAFuture: this second option has the problem that the new method wouldn't be "accessible" anymore once we downcast the CUDAFuture to an ivalue::Future, but that might not matter, because markCompleted is only called by whoever creates the object, and they have access to the specialized subclass pointer.

@@ -24,6 +35,11 @@ struct C10_EXPORT ConcretePyObjectHolder final : PyObjectHolder {
return py_obj_.ptr();
}

IValue toIValue() override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems not valid for all cases? in theory PyObjectHolder could hold any wild py::object, even if the py::object cann't be infered as ivalue, but this function seems assuming the underlying py::object is an "ivalue" pyobj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have suggestions on how to make it valid for all cases I'd love to hear it! That's exactly what I'm trying to achieve here! For now, indeed, this only works for Python objects whose type can be inferred.

If you this it helps I can rename this method to toTypeInferredIValue, to clarify that this behaves like the global function torch::jit::toTypeInferredIValue, which also raises if it fails. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively I guess I could make this return an optional<IValue> (I would have to reimplement some of the logic of torch::jit::toTypeInferredIValue). Would that work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add two apis here for PyObjectHolder?

  1. bool isIValue(): returns true only when we could infer a type by calling tryToInferType and match.success().
  2. toIValue(): which does what you currently do

Then the callsites in getSubValues should be guarded by isIValue(), otherwise fallback to old behaviors. what do you think about this solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, and I did something very similar to what you just suggested: instead of isIValue I added a tryToInferType method, which returns the TypePtr (or the reason this failed). I opted to go that way in order to keep the alignment between the global functions of torch::jit and the methods of ConcretePyObjectHolder (so that it's easy to see that they do the same thing).

@@ -156,6 +156,11 @@ void IValue::visit(const std::function<bool (const IValue &)>& visitor) const {
}
break;
}
case Tag::PyObject: {
IValue contained_value = toPyObjectHolder()->toIValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this PyObject does not contains IValue, this will let this function crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it will just propagate the error raised by the other method. For getSubValues (the function just below) I don't think this is a problem: until now that function was raising for all PyObjects, now it only raises for some of them. That's an improvement.

However I agree that for visit such a behavior could be considered a regression, since before PyObjects were silently skipped over. If you want I can leave the current behavior for visit. (I only need getSubValues for my use-case).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, let's leave the current behavior for visit to make sure no regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that regression from visit. Now, visit will still try to convert a PyObject to an IValue, and if that succeeds it will recurse into that IValue. However, if it fails, it silently skips.

I think that this might be the best of both worlds: it adds an improvement to visit and keeps it aligned with getSubValues, but it also "preserves the spirit" of visit of avoiding hard failures.

namespace torch {
namespace jit {

IValue toIValue(py::handle obj, const TypePtr& type, c10::optional<int32_t> N) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this function to a .cpp file to break a circular dependency.

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.

Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/)

[ghstack-poisoned]
c10::intrusive_ptr<at::ivalue::PyObjectHolder> py_obj = toPyObjectHolder();
auto match = py_obj->tryToInferType();
TORCH_INTERNAL_ASSERT(match.success(),
"Tracer cannot infer type of ", py_obj->toStr(), "\n:", match.reason());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this error message is copied from toTypeInferredIValue, but it's actually wrong as this might not only used in tracer, can you simply say "cannot infer type of.."?

@@ -664,7 +637,32 @@ struct C10_EXPORT ivalue::Object final : c10::intrusive_ptr_target {
// see concrete implementation in python_ivalue.h
struct ivalue::PyObjectHolder : c10::intrusive_ptr_target {
public:
struct InferredType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems invalid to be put inside PyObjectHolder, as it's a generic type used by any cases (might be excluding PyObjectHolder sometimes). Can we maybe put this into jit_type.h instead?

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.

Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/)

[ghstack-poisoned]
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.

Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/)

[ghstack-poisoned]
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.

Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/)

[ghstack-poisoned]
lw added a commit that referenced this pull request Dec 16, 2020
Pull Request resolved: #48840

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
ghstack-source-id: 118704935

Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25334355/)!
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming toIValue is just a move, other changes looks good to me. Thanks!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1ac05cf.

@facebook-github-bot facebook-github-bot deleted the gh/lw/103/head branch December 23, 2020 15:17
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
Summary:
Pull Request resolved: pytorch#48840

The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.

This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.

In pytorch#48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.

In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
ghstack-source-id: 118704935

Test Plan: Unit tests

Reviewed By: wanchaol

Differential Revision: D25334355

fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants