Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for "Passed Directly" Customization Point #1999

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Jan 13, 2025

Provides a proposal for a public customization point for users to define to indicate if their types are passed directly to sycl kernels.

This RFC is intended to address and resolve #1939.

Here is a working proof of concept to play with: https://godbolt.org/z/jvo1esoeb (updated)

Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger
Copy link
Contributor Author

I have a working proof of concept for this in godbolt, but needs some cleanup before sharing. Let me know if that would be useful to provide here, but likely any needed details can and should be provided in the text.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the name, the approach looks pretty reasonable to me.

Comment on lines 140 to 141
Is there a better / more concise name than `is_passed_directly_to_sycl_kernels` we can use which properly conveys the
meaning to the users?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the name somewhere includes onedpl. The function is not generally relevant for SYCL. Maybe

  • is_passed_directly_to_onedpl, or
  • is_passed_directly_to_onedpl_kernels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its a fair point that since the customization point in user's code wont be in the oneapi::dpl namespace, there is little connecting it to oneDPL unless it is in the name. I was trying to be more descriptive about the semantic meaning as this is only relevant for the SYCL based dpcpp backend however that is probably less important than including oneDPL.

Thanks, I think the first of your suggestions is probably the best option so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this property only really makes sense for the SYCL backend of oneDPL, there is a sense that is_passed_directly_to_onedpl doesn't quite tell the full story of what it is for. It won't be used in the TBB or OpenMP backends of oneDPL, which you wouldn't understand from just the name alone.

On the other hand, something like is_passed_directly_to_sycl is too broad and could be confusing because the SYCL implementation won't use the function directly. Perhaps the second suggestion of is_passed_directly_to_onedpl_kernels is closer, but one could take issue with its verbosity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps is_passed_directly_to_onedpl_dpcpp is the right choice (I hate how wordy it is but I'm not sure we have a choice).
A normal user of oneDPL may better recognize "dpcpp" than "kernels", saves a couple characters too.

Copy link
Contributor

@akukanov akukanov Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actially I would prefer not to introduce more names with dpcpp, to avoid the impression that our implementation is based on DPC++ (and not on SYCL specification).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer moving away from the oneDPL implementation detail (of "passing directly") towards the semantical meaning of the trait, something like "this iterator supports implicit data transfer" or (inverting the value) "requires explicit data transfer" or maybe "is ready for use with/suitable for oneDPL device policies".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like requires_explicit_data_transfer_onedpl_device_policies with inverted values better than talking about implicit data transfer, because passed directly is more about not needing transfer in the first place. Implicit data transfer is appropriate for buffer accessors or shared USM, but not all "passed directly" types. USM device pointers or counting_iterators are passed directly, but don't have any data transfer, implicit or explicit.

Another option: is_dereferencable_in_onedpl_device_policies?

Copy link
Contributor

@rarutyun rarutyun Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (honestly) would say is_passed_directly_to_sycl_backend. We use 'backend' term in our documentation, so it should be clear enough. Please don't repeat _onedpl_ in the name, because it's already in oneapi::dpl namespace. Duplication doesn't bring clarity. I am 100% agree with the intent to not use dpcpp in the name. The rarer we use it, the better. But this is a comment about the name. I have some amount of questions to the approach itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some development from the original idea, the only public API is the trait which will include the oneapi::dpl namespace. However, users will still be overriding the customization point by defining a function in their own namespace, and I believe that name should be associated with the trait (its name + _v). Without repeating onedpl in the name of the customization point, there will be nothing tying it to oneDPL in the users code. This is the motivation for including _onedpl_.

We could consider different names for the customization point and the trait, but that may also be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use the word 'backend" in the public API (actually, not at all in the specification), and I think we should not.

There are two elements of the API to name - the trait value (also a class?) and the user-defined customization function. Their names should be related, but I am not sure if almost exact match is needed. I agree that the name of the function should refer to oneDPL; that can be achieved by adding a prefix to the function name.

Also, as far as I understand, the trait is not so much (if at all) about the iterator itself "passed" to a device, as it is about the data "underneath" the iterator being accessible from that device, so that no data copying/no intermediate buffer is required.

Can it be something like onedpl_is_iterator_device_ready() for the function and oneapi::dpl::is_iterator_device_ready[_v] for the trait?

Comment on lines 143 to 144
Should we be targeting Experimental or fully supported with this proposal?
(Do we think user feedback is required to solidify an interface / experience?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the general design of passed directly has been tested internally pretty well at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. From my perspective the only reason to keep it in experimental to start with is if we are uncertain of the exact API specifics or to find any unexpected gotchas with the approach of using a customization point generally as opposed to some other option.

Lets see what others have to say, but I'm leaning toward targeting supported, and just adding it to the specification directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is not much need for an experimental phase. But a POC with practical usage outside of oneDPL is necessary I think.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
rfcs/proposed/passed_directly_API/README.md Outdated Show resolved Hide resolved
rfcs/proposed/passed_directly_API/README.md Outdated Show resolved Hide resolved
rfcs/proposed/passed_directly_API/README.md Outdated Show resolved Hide resolved
rfcs/proposed/passed_directly_API/README.md Outdated Show resolved Hide resolved
rfcs/proposed/passed_directly_API/README.md Outdated Show resolved Hide resolved
rfcs/proposed/passed_directly_API/README.md Outdated Show resolved Hide resolved
Comment on lines 102 to 105
To make this robust, we will follow an C++17 updated version of what is discussed in
[Eric Niebler's Post](https://ericniebler.com/2014/10/21/customization-point-design-in-c11-and-beyond/), using a
callable, and using an `inline constexpr` to avoid issues with ODR and to avoid issues with resolving customization
points when not separating the call to two steps with a `using` statement first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know why you refer to the Niebler's post. He describes there the C++20 customization point objects, more or less; but that's not what you propose to do. As far as I understand, the proposed user-defined customizations will be ADL discoverable (if not, then I do not understand how to use those) - and then you need the using statement to get the default implementation in oneapi::dpl, which makes exactly the two-step customization, does not it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I am mistaken, but my understanding is that Niebler is describing a way using function pointers to allow qualified calls to also pick up the default implementation rather than just unqualified calls with the using statement.
We don't need his more complex strategy for ODR because of changes in C++17, but I think the function pointer strategy is still a benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I am mistaken, but my understanding is that Niebler is describing a way using function pointers to allow qualified calls to also pick up the default implementation rather than just unqualified calls with the using statement.

See https://godbolt.org/z/TM914E6fv for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up and added my proof of concept to the description as well for another example.

Copy link
Contributor

@akukanov akukanov Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that Niebler is describing a way using function pointers to allow qualified calls to also pick up the default implementation rather than just unqualified calls with the using statement.

I do not see where he uses any function pointers. His std::begin is a reference to an instance of struct std::__detail::__begin_fn which has a function call operator, so that std::begin(X) is a valid code. This operator internally uses an unqualified call to begin, which "default" implementation is in std::__detail and specializations found by ADL. As I said, it's more or less matches the CPO design in C++20.

And it's not what you proposed so far, as far as I can see. In this proposal, the default implementation is a free function in the oneapi::dpl namespace, and customization is a free function in the user's type namespace (it is not said explicitly that the namespace should be the same, but de-facto it should, for ADL to work). So I do not see how a qualified call will take customizations, neither how an unqualified call will take the default implementation without a using declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to improve the language to be more specific and accurate about the proposed implementation details, but I think what you describe in your first paragraph is correct and accurate to my intentions.

You are correct that an unqualified call does require a using declaration, but a qualified call should take the user customizations because a qualified call will use the function object. The function object internally makes a unqualified call from the namespace of the default implementation on behalf of the user, allowing it to either find the more specific user customization if it exists, or end up in the default otherwise. You should be able to see this tested in the proof of concept here on line 121.

Hopefully I'm not missing something, but I may be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I was wrong to mention function pointers, I meant function objects)

Signed-off-by: Dan Hoeflinger <[email protected]>
constexpr bool is_passed_directly_in_onedpl_device_policies(const T&);
```

oneDPL will provide a default implementation which will defer to the existing trait:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What header should a user include to get the default implementation? Execution? I think it would be good to specify it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a good question, and I think I agree that execution is the appropriate place.

Copy link
Contributor

@rarutyun rarutyun Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<oneapi/dpl/traits> or oneapi/dpl/type_traits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the RFC has evolved a bit to make the public API a trait rather than directly calling the customization point, I think oneapi/dpl/type_traits does make more sense than oneapi/dpl/execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneapi/dpl/type_traits makes sense to me as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should better go to <oneapi/dpl/iterator>.

type_traits is OK in principle, but, as far as I can tell, this thing is closer semantically to a concept representing an iterator category (like forward_iterator) or a requirement (like sortable) than to a generic type trait.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
only offering customization point and trait as public API

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants