composing dataloaders #54

sheepdreamofandroids · 2019-10-10T10:08:00Z

Hi,

I have a datafetcher where I use 2 dataloaders in sequence: the first to translate from 1 ID to another, the second to fetch data corresponding to the second ID.

loader1.load(id1).thenCompose(id2 -> loader2.load(id2))

This hangs because dispatchAll() is not called again after loader1 completes.
I can work around that by adding that call inside the thenCompose() lambda but then it is called for every id2 which is ugly at the very least.

Is there a better way of doing this?

The text was updated successfully, but these errors were encountered:

bbakerman · 2019-10-15T19:41:09Z

Currently we dont have a way to do what you are after.

CompleteableFutures give us no way to know how deep they are nested and hence how many times dispatch must be called

This is currently and unsolved problem

sheepdreamofandroids · 2019-10-24T09:19:33Z

CompleteableFutures might do something else completely so even if you had a way of inspecting their "nestedness", you still wouldn't know that their completion leads to another load(), or even multiple.
I see two possible solutions:

Whenever load() is called, start a timer to call dispatchAll() after 1 ms. If the timer is already running, delay it a bit.
Whenever a batchload completes, complete all the futures and then call dispatchAll().

Both can be made optional using an extra parameter. And of course dispatchAll() should finish quickly when nothing needs to be done.

vojtapol · 2019-11-09T15:54:16Z

The solution to this problem is to modify loader1 so that nothing needs to be loaded in the .then() block. In a traditional relational database this would mean that loader1 would do some extra joins to obtain all the required data.

kaqqao · 2019-11-10T20:50:16Z

@vojtapol In all honesty, if you modify the DataLoader to eagerly optimize fetching, you can also modify the original resolver function in the exact same way and drop DataLoader completely.

sheepdreamofandroids · 2019-11-11T07:00:22Z

@vojtapol

The solution to this problem is to modify loader1 so that nothing needs to be loaded in the .then() block.

That would solve the immediate problem but there might be other ways to obtain id2. Then that load would have to do a similar join, not taking advantage of the already loaded data2 so that would decrease the effectiveness of the cache and require more memory.

sheepdreamofandroids · 2019-11-11T08:15:09Z

I noticed #46 which will solve this transparently.

hahooy · 2020-09-07T22:21:31Z

One potential solution is to dispatch all pending data loaders, wait for the futures returned from the dispatched data loaders to complete and then repeat the process to dispatch new pending data loaders that come from the thenCompose chaining. This process can be repeated until all levels of pending data loaders are dispatched. A code example to do this would be something like:

public class Dispatcher {
    private final List<DataLoader<?, ?>> dataLoaders;

    Dispatcher(List<DataLoader<?, ?>> dataLoaders) {
        this.dataLoaders = dataLoaders;
    }

    private int depth() {
        return dataLoaders.stream()
                .mapToInt(DataLoader::dispatchDepth)
                .sum();
    }

    void dispatchAllAndJoin() {
        while (depth() > 0) {
            // Dispatch all data loaders. This will kickoff all batched tasks.
            CompletableFuture<?>[] futures = dataLoaders.stream()
                    .filter(dataLoader -> dataLoader.dispatchDepth() > 0)
                    .map(DataLoader::dispatch)
                    .toArray(CompletableFuture[]::new);
            // Wait for the futures to complete.
            CompletableFuture.allOf(futures).join();
        }
    }
}

In every round of dispatch, Dispatcher#dispatchAllAndJoin will be able to batch all tasks whose dependencies have been resolved by previous dispatches. This logic could potentially live in DataLoaderRegistry but clients can also just implement their own dispatching logic without changing the dataloader library.

sheepdreamofandroids · 2020-09-08T06:58:34Z

@hahooy I know, I'm actually using a fully asynchronous version of this: #46 (comment)_

bbakerman · 2020-09-08T10:31:33Z

For the record just dispatching until the depth is <= 0 will work however it will have the opposite effect. It will cause fields that that COULD be batched together to be eagerly dispatched. So you have "UNDER BATCHING" in this situation.

The real trick is that you need to know WHEN a good time to dispatch is and unlike say javascript there is no "nextTick" time in a JVM.

Actually I have done testing on node.js and they can also UNDER BATCH based on next tick firing before fields compete.

sheepdreamofandroids · 2020-09-09T14:23:17Z

For the record just dispatching until the depth is <= 0 will work however it will have the opposite effect. It will cause fields that that COULD be batched together to be eagerly dispatched. So you have "UNDER BATCHING" in this situation.

I don't think that is true since the code waits for all dispatchers to terminate before starting a new round. This guarantees that no new calls will be done on the dataloaders. Unless of course multiple dispatchAllAndJoin() loops run in parallel...

Another inefficiency in this method is that a new round has to wait for the slowest dataloader. Some dispatches could have started earlier.

I imagine some heuristics where each dataloader that has received requests waits some time before dispatching. The closer the number of requests is to the maximum batchsize, the earlier the dispatch. The optimal mapping from number of waiting keys to wait time could be hand tuned or "learned" automatically.

Have Loaders use suspend functions instead of CompletableFutures which integrates with the scope cancellation work in LC 1.14.x Note that due to the way `java-dataloader` works, the model fetcher functions can't be async/suspend functions, as we need the dataloader.load() invocation to happen synchronously to ensure they preceed any dispatch() calls; otherwise queries can hang forever, similar to: graphql-java/java-dataloader#54 Add tests to validate the suspend function implemented loaders and existing model fetchers work properly.

bbakerman · 2021-07-31T10:07:41Z

See org.dataloader.registries.ScheduledDataLoaderRegistry in the lastest 3.x versions for some of the answers to the above

It allows the DataLoader to "tick" in the background and decide if it is going to dispatch or not.

The predicates you provide can dispatch on time or depth or both. Or make up you own predicates.

This may help people compose together dataloaders such that dispatch is eventually called regardless of the field tracking in graphql say

softprops · 2021-10-17T02:19:20Z

I tried swapping the default DataLoaderRegistry with ScheduledDataLoaderRegistry in my graphql application and ran a simple loader1.load(id1).thenCompose(id2 -> loader2.load(id2)) test with the following configuration

DispatchPredicate depthOrTimePredicate =
        DispatchPredicate.dispatchIfDepthGreaterThan(10)
            .or(DispatchPredicate.dispatchIfLongerThan(Duration.ofMillis(200)))
ScheduledDataLoaderRegistry.newScheduledRegistry()
              .dispatchPredicate(depthOrTimePredicate)
              .schedule(Duration.ofMillis(10))
              .register(...)
              .build()

When executing a query that triggered the data fetcher calling the test, the server hung after calling loader1.load(id1) never calling loader2.load(id2)

Has anyone gotten an example this or an alternative to work?

MartinDevillers · 2021-11-08T15:18:55Z

I've run into the same limitation and ScheduledDataLoaderRegistry didn't work for me. I think ScheduledDataLoaderRegistry serves a different use case: to make the overall dispatching strategy less eager by pushing dispatch attempts into the future. This still relies on dispatchAll to be called first, which doesn't happen in the scenario with nested loaders.

So my ~~ugly hack~~ current approach is to have a separate scheduled task periodically check all inflight data loaders and forcefully dispatch them if they haven't been dispatched within a preset time window (e.g. 500ms). This works to unstuck nested data loaders, at the cost of naively triggering dispatches too early for long running data loading tasks. I am not sure what the implications of those are, but my API has been working fine so far so I'm happy 😎

@Component
@Slf4j
public class ScheduledDataLoaderDispatcher {

    Queue<DataLoaderRegistry> globalRegistries = new ConcurrentLinkedQueue<>();
    Duration timeToDispatch;

    public ScheduledDataLoaderDispatcher(@Value("${app.dataLoader.timeToDispatch:500}") Integer timeToDispatch) {
        this.timeToDispatch = Duration.ofMillis(timeToDispatch);
    }

    public void addRegistry(DataLoaderRegistry dataLoaderRegistry) {
        globalRegistries.add(dataLoaderRegistry);
    }

    public void removeRegistry(DataLoaderRegistry dataLoaderRegistry) {
        globalRegistries.remove(dataLoaderRegistry);
    }

    @Scheduled(fixedRateString = "${app.dataLoader.dispatchTickRate:100}")
    public void dispatchAll() {
        globalRegistries.stream()
                .map(DataLoaderRegistry::getDataLoaders)
                .flatMap(Collection::stream)
                .filter(this::isDispatchNeeded)
                .forEach(DataLoader::dispatch);
    }

    private boolean isDispatchNeeded(DataLoader dataLoader) {
        return timeToDispatch.compareTo(dataLoader.getTimeSinceDispatch()) < 0;
    }
}

bbakerman · 2021-11-12T05:24:34Z

This works to unstuck nested data loaders, at the cost of naively triggering dispatches too early for long running data loading tasks.

This is pretty much how the JS tick works for JavaScript data loaders. They can dispatch too early a well but never miss composed loaders because eventually control is passed back and tick will happen.

One thing I will say about the above us - since DataLoaders are per request, your scheduler Queue will grow to to the size of the number of concurrent requests * the number of dataloaders per request.

It's good that you have a removeRegistry because otherwise this would get unwieldy quick with enough load

MartinDevillers · 2021-11-12T13:23:18Z

Thank you for your reply! Correct, I also wrote an instrumentation to clean up the globalRegistries after each request. And even with that mechanism in place, the queue can grow rapidly under heavy load. 500 milliseconds is a long time when your API is processing tens or hundreds of calls simultaneously. I still have to performance test my API to make sure this setup functions under load.

@Component
@RequiredArgsConstructor
public class ScheduledDataLoaderInstrumentation extends SimpleInstrumentation {

    private final ScheduledDataLoaderDispatcher scheduledDataLoaderDispatcher;

    @Override
    public InstrumentationContext<ExecutionResult> beginExecution(InstrumentationExecutionParameters parameters) {
        return new SimpleInstrumentationContext<>() {
            @Override
            public void onCompleted(ExecutionResult result, Throwable t) {
                Optional.ofNullable(parameters.getExecutionInput())
                        .map(ExecutionInput::getDataLoaderRegistry)
                        .ifPresent(scheduledDataLoaderDispatcher::removeRegistry);

            }
        };
    }
}

softprops · 2022-03-28T06:26:47Z

Just checking in. Has anyone come up with a workable solution to this problem yet?

Alex079 · 2022-04-08T23:46:42Z

Trying to find one.

bbakerman · 2023-05-08T04:43:39Z

This not composing dataloader calls per say - however this PR may help others who want to write custom dispatchers

#128

bbakerman · 2023-09-25T06:09:51Z

I have created a variant on ScheduledDataLoaderRegistry called ticker mode that will reschedule the dispatch() calls continuously in the background.

This will allow chained calls to complete however perhaps the batching windows will be as efficient as possible.

See PR : #131

dondonz · 2024-01-03T07:15:54Z

Closing this thread after pull requests #128 and #131

bbakerman · 2024-10-29T22:58:20Z

re opening because its not natively supported - just worked around

Alex079 mentioned this issue Mar 3, 2022

Implement recursive dispatching #111

Closed

Alex079 mentioned this issue Apr 8, 2022

Complete level fetching when data loading finishes graphql-java/graphql-java#2787

Closed

dondonz closed this as completed Jan 3, 2024

bbakerman reopened this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

composing dataloaders #54

composing dataloaders #54

sheepdreamofandroids commented Oct 10, 2019

bbakerman commented Oct 15, 2019

sheepdreamofandroids commented Oct 24, 2019 •

edited

Loading

vojtapol commented Nov 9, 2019

kaqqao commented Nov 10, 2019

sheepdreamofandroids commented Nov 11, 2019 •

edited

Loading

sheepdreamofandroids commented Nov 11, 2019

hahooy commented Sep 7, 2020 •

edited

Loading

sheepdreamofandroids commented Sep 8, 2020

bbakerman commented Sep 8, 2020

sheepdreamofandroids commented Sep 9, 2020

bbakerman commented Jul 31, 2021

softprops commented Oct 17, 2021 •

edited

Loading

MartinDevillers commented Nov 8, 2021 •

edited

Loading

bbakerman commented Nov 12, 2021

MartinDevillers commented Nov 12, 2021

softprops commented Mar 28, 2022

Alex079 commented Apr 8, 2022

bbakerman commented May 8, 2023

bbakerman commented Sep 25, 2023 •

edited

Loading

dondonz commented Jan 3, 2024

bbakerman commented Oct 29, 2024

composing dataloaders #54

composing dataloaders #54

Comments

sheepdreamofandroids commented Oct 10, 2019

bbakerman commented Oct 15, 2019

sheepdreamofandroids commented Oct 24, 2019 • edited Loading

vojtapol commented Nov 9, 2019

kaqqao commented Nov 10, 2019

sheepdreamofandroids commented Nov 11, 2019 • edited Loading

sheepdreamofandroids commented Nov 11, 2019

hahooy commented Sep 7, 2020 • edited Loading

sheepdreamofandroids commented Sep 8, 2020

bbakerman commented Sep 8, 2020

sheepdreamofandroids commented Sep 9, 2020

bbakerman commented Jul 31, 2021

softprops commented Oct 17, 2021 • edited Loading

MartinDevillers commented Nov 8, 2021 • edited Loading

bbakerman commented Nov 12, 2021

MartinDevillers commented Nov 12, 2021

softprops commented Mar 28, 2022

Alex079 commented Apr 8, 2022

bbakerman commented May 8, 2023

bbakerman commented Sep 25, 2023 • edited Loading

dondonz commented Jan 3, 2024

bbakerman commented Oct 29, 2024

sheepdreamofandroids commented Oct 24, 2019 •

edited

Loading

sheepdreamofandroids commented Nov 11, 2019 •

edited

Loading

hahooy commented Sep 7, 2020 •

edited

Loading

softprops commented Oct 17, 2021 •

edited

Loading

MartinDevillers commented Nov 8, 2021 •

edited

Loading

bbakerman commented Sep 25, 2023 •

edited

Loading