-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
composing dataloaders #54
Comments
Currently we dont have a way to do what you are after. CompleteableFutures give us no way to know how deep they are nested and hence how many times dispatch must be called This is currently and unsolved problem |
CompleteableFutures might do something else completely so even if you had a way of inspecting their "nestedness", you still wouldn't know that their completion leads to another load(), or even multiple.
Both can be made optional using an extra parameter. And of course dispatchAll() should finish quickly when nothing needs to be done. |
The solution to this problem is to modify |
@vojtapol In all honesty, if you modify the DataLoader to eagerly optimize fetching, you can also modify the original resolver function in the exact same way and drop DataLoader completely. |
That would solve the immediate problem but there might be other ways to obtain id2. Then that load would have to do a similar join, not taking advantage of the already loaded data2 so that would decrease the effectiveness of the cache and require more memory. |
I noticed #46 which will solve this transparently. |
One potential solution is to dispatch all pending data loaders, wait for the futures returned from the dispatched data loaders to complete and then repeat the process to dispatch new pending data loaders that come from the
In every round of dispatch, |
@hahooy I know, I'm actually using a fully asynchronous version of this: #46 (comment)_ |
For the record just dispatching until the depth is <= 0 will work however it will have the opposite effect. It will cause fields that that COULD be batched together to be eagerly dispatched. So you have "UNDER BATCHING" in this situation. The real trick is that you need to know WHEN a good time to dispatch is and unlike say javascript there is no "nextTick" time in a JVM. Actually I have done testing on node.js and they can also UNDER BATCH based on next tick firing before fields compete. |
I don't think that is true since the code waits for all dispatchers to terminate before starting a new round. This guarantees that no new calls will be done on the dataloaders. Unless of course multiple dispatchAllAndJoin() loops run in parallel... Another inefficiency in this method is that a new round has to wait for the slowest dataloader. Some dispatches could have started earlier. I imagine some heuristics where each dataloader that has received requests waits some time before dispatching. The closer the number of requests is to the maximum batchsize, the earlier the dispatch. The optimal mapping from number of waiting keys to wait time could be hand tuned or "learned" automatically. |
Have Loaders use suspend functions instead of CompletableFutures which integrates with the scope cancellation work in LC 1.14.x Note that due to the way `java-dataloader` works, the model fetcher functions can't be async/suspend functions, as we need the dataloader.load() invocation to happen synchronously to ensure they preceed any dispatch() calls; otherwise queries can hang forever, similar to: graphql-java/java-dataloader#54 Add tests to validate the suspend function implemented loaders and existing model fetchers work properly.
See It allows the DataLoader to "tick" in the background and decide if it is going to dispatch or not. The predicates you provide can dispatch on time or depth or both. Or make up you own predicates. This may help people compose together dataloaders such that dispatch is eventually called regardless of the field tracking in graphql say |
I tried swapping the default
When executing a query that triggered the data fetcher calling the test, the server hung after calling Has anyone gotten an example this or an alternative to work? |
I've run into the same limitation and So my @Component
@Slf4j
public class ScheduledDataLoaderDispatcher {
Queue<DataLoaderRegistry> globalRegistries = new ConcurrentLinkedQueue<>();
Duration timeToDispatch;
public ScheduledDataLoaderDispatcher(@Value("${app.dataLoader.timeToDispatch:500}") Integer timeToDispatch) {
this.timeToDispatch = Duration.ofMillis(timeToDispatch);
}
public void addRegistry(DataLoaderRegistry dataLoaderRegistry) {
globalRegistries.add(dataLoaderRegistry);
}
public void removeRegistry(DataLoaderRegistry dataLoaderRegistry) {
globalRegistries.remove(dataLoaderRegistry);
}
@Scheduled(fixedRateString = "${app.dataLoader.dispatchTickRate:100}")
public void dispatchAll() {
globalRegistries.stream()
.map(DataLoaderRegistry::getDataLoaders)
.flatMap(Collection::stream)
.filter(this::isDispatchNeeded)
.forEach(DataLoader::dispatch);
}
private boolean isDispatchNeeded(DataLoader dataLoader) {
return timeToDispatch.compareTo(dataLoader.getTimeSinceDispatch()) < 0;
}
} |
This is pretty much how the JS One thing I will say about the above us - since DataLoaders are per request, your scheduler Queue will grow to to the size of the number of concurrent requests * the number of dataloaders per request. It's good that you have a |
Thank you for your reply! Correct, I also wrote an instrumentation to clean up the @Component
@RequiredArgsConstructor
public class ScheduledDataLoaderInstrumentation extends SimpleInstrumentation {
private final ScheduledDataLoaderDispatcher scheduledDataLoaderDispatcher;
@Override
public InstrumentationContext<ExecutionResult> beginExecution(InstrumentationExecutionParameters parameters) {
return new SimpleInstrumentationContext<>() {
@Override
public void onCompleted(ExecutionResult result, Throwable t) {
Optional.ofNullable(parameters.getExecutionInput())
.map(ExecutionInput::getDataLoaderRegistry)
.ifPresent(scheduledDataLoaderDispatcher::removeRegistry);
}
};
}
} |
Just checking in. Has anyone come up with a workable solution to this problem yet? |
Trying to find one. |
This not composing dataloader calls per say - however this PR may help others who want to write custom dispatchers |
I have created a variant on This will allow chained calls to complete however perhaps the batching windows will be as efficient as possible. See PR : #131 |
re opening because its not natively supported - just worked around |
Hi,
I have a datafetcher where I use 2 dataloaders in sequence: the first to translate from 1 ID to another, the second to fetch data corresponding to the second ID.
loader1.load(id1).thenCompose(id2 -> loader2.load(id2))
This hangs because dispatchAll() is not called again after loader1 completes.
I can work around that by adding that call inside the thenCompose() lambda but then it is called for every id2 which is ugly at the very least.
Is there a better way of doing this?
The text was updated successfully, but these errors were encountered: