Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Table Support #398

Open
domoritz opened this issue May 23, 2024 · 6 comments
Open

Multi-Table Support #398

domoritz opened this issue May 23, 2024 · 6 comments
Labels
projects Project ideas for Mosaic

Comments

@domoritz
Copy link
Member

Mosaic assumes a single table right now but often data is spread across tables. Take for example a dataset with movies and actors. The tables could be joined but then the counts for histograms over movies or actors would become confusing. This project adds support for multiple tables by defining what table a selection is over and resolving the selection to the correct joins over other tables.

@domoritz domoritz added the projects Project ideas for Mosaic label May 23, 2024
@domoritz
Copy link
Member Author

In https://github.com/cmudig/Texture @willeppy has shown the value of being able to do queries over joined tables.

@willeppy
Copy link

More info about how I did it for that project...

I extended the selection in mosaic to track the table names for the table the selection comes from and is applied to (PR: willeppy#5)

  • If these are the same, then it is a normal selection like in current mosaic.
  • If they differ, then I wrap the selection in SQL exists clause before applying the filter to join the tables. For a more general implementation, I think there would need to be a way to specify how to join. For my use case, I only ever want to filter the current table to the values that exist in the selection from the other table (ignoring duplicates in the other table).

As an example:

  • I have a main_table_reviews and word_table. main_table_reviews has a primary key "id" and column "sentiment". word_table has foreign key "id" and column "word"
  • I have applied two filters. One to a chart pulling from word_table toword == "hello" and another filter through a different chart directly to the main_table_reviews.sentiment column.
  • This generates the following query to plot the data for main_table_reviews:
SELECT
    *
FROM
    "main_table_reviews"
WHERE
    EXISTS (
        SELECT
            1
        FROM
            "word_table"
        WHERE
            "main_table_reviews"."id" = "word_table"."id"
            AND "word" = 'hello'
    )
    AND "sentiment" = 'negative';

So I think it would be possible to extend this to a more generic implementation potentially where you specify how to join charts and what key to use for joins

One note: anecdotally this makes the client interactions slower right now because I think it messes up the indexing used by mosaic, I haven't looked into this very deeply yet though

@derekperkins
Copy link

This would save a lot of data transfer for us using a standard star schema design where we currently have to pre-join the data.

@declann
Copy link

declann commented Oct 29, 2024

There are usecases for customization about how selections are resolved within single tables too.

e.g. I have different versions of numbers in a single table (version | number), and I want to be able to brush big or small values for some version currently selected and then look at corresponding numbers across other version as I change version selection ($version set by a menu selection).

So I want my $size selection to be resolved to matching ids in the data; ids are already consistent across versions.

Currently I implement this using an interval interactor with as: $size but I don't use this in any filter, rather I listen for value events and run a query to remap the interval to a set of matched ids, which I set to an $ids selection; $ids drives filters in plots. This might be a workaround for other usecases? It is messy however, it would be much nicer if declarative.

An alternative for me might have been to transform my data to version_1_number | version_2_number... but then I'd need dynamic behaviour in interactors and more: I don't think this works.

@jheer
Copy link
Member

jheer commented Oct 29, 2024

Hi @declann, I'm preparing a PR that includes a new "region" interactor that allows you to make 2D selections that map to individual fields of selected elements (like ids) rather than intervals. This might fit your use case. If so, look for that soon!

@declann
Copy link

declann commented Oct 29, 2024

Hi @declann, I'm preparing a PR that includes a new "region" interactor that allows you to make 2D selections that map to individual fields of selected elements (like ids) rather than intervals. This might fit your use case. If so, look for that soon!

That's exactly what I want! 😃

Just to add then, already I use multiple tables fine in Mosaic: a summary and a detail table. As long as the selections as I apply them to tables can find corresponding fields, everything works (selections are table-agnostic and this is good here).

Selection resolution is the more helpless bit atm (without workaround/"region" interactor), but I might be missing something.

Looking forward to trying that interactor whenever it's available to test!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
projects Project ideas for Mosaic
Projects
None yet
Development

No branches or pull requests

5 participants