Implement streaming operator and API #858
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This first draft doesn't even work properly for all but the simplest examples, but I wanted to get feedback on the design sooner rather than later. Briefly, I'm introducing a new MyriaL statement
stream(relationVar)
analogous tostore()
andsink()
. Whenever astream()
statement appears, RACO inserts aStream
pseudo-operator which is expanded into a chain of MyriaX operatorsCollectProducer->CollectConsumer->StreamingSink
(the latter is a pseudo-operator defined in MyriaX). On the MyriaX side,StreamingSink
is expanded from its encoding to aTupleSink
with aPipeSink
instance of itsDataSink
member, so we can get anInputStream
with the query results. InStreamingSink.construct()
, thePipeSink
'sInputStream
is registered with theQueryManager
under its query ID so we can retrieve it later and connect it to the HTTPResponse
object. InQueryResource.postNewStreamingQuery()
, which is mapped to the new/query/stream
endpoint, we retrieve all registeredInputStream
s fromPipeSink
s instantiated as part of aStreamingSink
in the query plan, and form aSequenceInputStream
which we pass toResponseBuilder.entity()
, so the HTTP client receives all outputs in the order in which their respectivestream()
statements appeared in the MyriaL query.This does seem to work for simple queries like this:
But it fails with the connected components sample query from myria-web, and also with sequenced
stream()
statements like this:I haven't diagnosed the CC query failure yet, but I think the sequence query failure is due to a simple deadlock. I think the Myria
Sequence
operator has to wait for each subquery to finish before running the next one, but that requires the client to consume all tuples, and somehow theSequenceInputStream
that combines subquery results isn't flushed until the entire query has finished.