Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

80 recommend movies #81

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open

80 recommend movies #81

wants to merge 53 commits into from

Conversation

jhanley634
Copy link
Collaborator

This PR puts Netflix prize data in a format where it can be conveniently queried and subsetted. Call me crazy, but I tend to think in JOINs, and I appreciate having an RDBMS sweat the memory management details for cases where not everything will conveniently fit in core. So sqlalchemy is managing some sqlite tables. They are safe to DROP, or alternatively just rm out/movies.sqlite and they're all gone and will be rebuilt on the next run.

This PR also trains a LightFM model on (some of) the prize data, and shows how to make predictions. They are, frankly, unimpressive, but I figured we need something in the code base that demonstrates how to do it, so we can all build upon it. The tests run quickly, to support an interactive edit-run cycle.


Change etl("mv_00*.txt", max_rows=1_000_000) from 1 M rows to 101 M rows to ingest everything. And then it will be preserved -- we skip ETL if a table is already populated.

The "give me two movies you like and I will recommend some more" is frankly still aspirational at this point, but we're not far from it. That's where two_movies_rec_test.py got its name.

There are some design notes in make_recommendation.py, which may serve as the basis for future PRs. Once this merges down to main, it's possible we will view the lightfm_recommendation.py module as obsolete and slated for deletion. It did offer me some inspiration for the current work.

I'm happy to accept any and all comments, but please be sensitive to what is in-scope for the current PR as opposed to a subsequent PR. It would be useful to get this merged down, with light edits, this week.

…ot entirely clear what its signature ought to be, but this seems like a decent guess
…hing". Also, DRY up the docstring so it doesn't repeat what the signature already told us.
@jhanley634 jhanley634 linked an issue Feb 6, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recommend movies
1 participant