-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Remove DataPtr extractor from CUDAFuture #48840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 53c2be9 (more details on the Dr. CI page):
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. This comment has been revised 15 times. |
at::IValue::HashAliasedIValues sub_values; | ||
// Prefer getSubValues() over visit() as the latter is a silent no-op for | ||
// some unsupported types, whereas the former at least fails loudly. | ||
value.getSubValues(sub_values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would I be correct if I assume this gonna extract all nested ivalues in the given ivalue object? Curious, does it make sense to let CUDAFuture::markCompleted
(or other APIs) also take a vector of tensors? Asking because RPC already have access to the tensor vector and hence does not need to recursively visit all nested values again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would I be correct if I assume this gonna extract all nested ivalues in the given ivalue object?
Yes, indeed.
Note that markCompleted
is also called inside then()
with the result of the user-provided callback, which can return an arbitrary value. (Unless we explicitly lock that down and force users to also always return lists of tensors, is this what you're suggesting?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we explicitly lock that down and force users to also always return lists of tensors, is this what you're suggesting?
No, I am thinking about an additional fast path that can take a list of tensors. So users can either pass in an ivalue or ivalue + list of tensors to avoid traverse the ivalue again. Not sure if this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes, that could be an option. We could either add such a method to ivalue::Future, or we could add it just to CUDAFuture: this second option has the problem that the new method wouldn't be "accessible" anymore once we downcast the CUDAFuture to an ivalue::Future, but that might not matter, because markCompleted is only called by whoever creates the object, and they have access to the specialized subclass pointer.
@@ -24,6 +35,11 @@ struct C10_EXPORT ConcretePyObjectHolder final : PyObjectHolder { | |||
return py_obj_.ptr(); | |||
} | |||
|
|||
IValue toIValue() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function seems not valid for all cases? in theory PyObjectHolder could hold any wild py::object, even if the py::object cann't be infered as ivalue, but this function seems assuming the underlying py::object is an "ivalue" pyobj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have suggestions on how to make it valid for all cases I'd love to hear it! That's exactly what I'm trying to achieve here! For now, indeed, this only works for Python objects whose type can be inferred.
If you this it helps I can rename this method to toTypeInferredIValue
, to clarify that this behaves like the global function torch::jit::toTypeInferredIValue
, which also raises if it fails. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively I guess I could make this return an optional<IValue>
(I would have to reimplement some of the logic of torch::jit::toTypeInferredIValue
). Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add two apis here for PyObjectHolder?
bool isIValue()
: returns true only when we could infer a type by callingtryToInferType
andmatch.success()
.toIValue()
: which does what you currently do
Then the callsites in getSubValues
should be guarded by isIValue()
, otherwise fallback to old behaviors. what do you think about this solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, and I did something very similar to what you just suggested: instead of isIValue
I added a tryToInferType
method, which returns the TypePtr (or the reason this failed). I opted to go that way in order to keep the alignment between the global functions of torch::jit
and the methods of ConcretePyObjectHolder
(so that it's easy to see that they do the same thing).
aten/src/ATen/core/ivalue.cpp
Outdated
@@ -156,6 +156,11 @@ void IValue::visit(const std::function<bool (const IValue &)>& visitor) const { | |||
} | |||
break; | |||
} | |||
case Tag::PyObject: { | |||
IValue contained_value = toPyObjectHolder()->toIValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this PyObject does not contains IValue, this will let this function crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it will just propagate the error raised by the other method. For getSubValues
(the function just below) I don't think this is a problem: until now that function was raising for all PyObjects, now it only raises for some of them. That's an improvement.
However I agree that for visit
such a behavior could be considered a regression, since before PyObjects were silently skipped over. If you want I can leave the current behavior for visit
. (I only need getSubValues
for my use-case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, let's leave the current behavior for visit
to make sure no regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that regression from visit
. Now, visit
will still try to convert a PyObject to an IValue, and if that succeeds it will recurse into that IValue. However, if it fails, it silently skips.
I think that this might be the best of both worlds: it adds an improvement to visit
and keeps it aligned with getSubValues
, but it also "preserves the spirit" of visit
of avoiding hard failures.
namespace torch { | ||
namespace jit { | ||
|
||
IValue toIValue(py::handle obj, const TypePtr& type, c10::optional<int32_t> N) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to move this function to a .cpp file to break a circular dependency.
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]
aten/src/ATen/core/ivalue.cpp
Outdated
c10::intrusive_ptr<at::ivalue::PyObjectHolder> py_obj = toPyObjectHolder(); | ||
auto match = py_obj->tryToInferType(); | ||
TORCH_INTERNAL_ASSERT(match.success(), | ||
"Tracer cannot infer type of ", py_obj->toStr(), "\n:", match.reason()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this error message is copied from toTypeInferredIValue
, but it's actually wrong as this might not only used in tracer, can you simply say "cannot infer type of.."?
aten/src/ATen/core/ivalue_inl.h
Outdated
@@ -664,7 +637,32 @@ struct C10_EXPORT ivalue::Object final : c10::intrusive_ptr_target { | |||
// see concrete implementation in python_ivalue.h | |||
struct ivalue::PyObjectHolder : c10::intrusive_ptr_target { | |||
public: | |||
struct InferredType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems invalid to be put inside PyObjectHolder, as it's a generic type used by any cases (might be excluding PyObjectHolder sometimes). Can we maybe put this into jit_type.h
instead?
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) [ghstack-poisoned]
Pull Request resolved: #48840 The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. ghstack-source-id: 118704935 Differential Revision: [D25334355](https://our.internmc.facebook.com/intern/diff/D25334355/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25334355/)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming toIValue
is just a move, other changes looks good to me. Thanks!
This pull request has been merged in 1ac05cf. |
Summary: Pull Request resolved: pytorch#48840 The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams. This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor. In pytorch#48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to. In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes. ghstack-source-id: 118704935 Test Plan: Unit tests Reviewed By: wanchaol Differential Revision: D25334355 fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3
Stack from ghstack:
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.
This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.
In #48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.
In my opinion, this approach is just brilliant! Thank @wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
Differential Revision: D25334355