-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
80 recommend movies #81
Open
jhanley634
wants to merge
53
commits into
main
Choose a base branch
from
80-recommend-movies
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ot entirely clear what its signature ought to be, but this seems like a decent guess
…o 80-recommend-movies
…hing". Also, DRY up the docstring so it doesn't repeat what the signature already told us.
…r an elapsed time win
…ite data is available
Open
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR puts Netflix prize data in a format where it can be conveniently queried and subsetted. Call me crazy, but I tend to think in JOINs, and I appreciate having an RDBMS sweat the memory management details for cases where not everything will conveniently fit in core. So sqlalchemy is managing some sqlite tables. They are safe to DROP, or alternatively just
rm out/movies.sqlite
and they're all gone and will be rebuilt on the next run.This PR also trains a LightFM model on (some of) the prize data, and shows how to make predictions. They are, frankly, unimpressive, but I figured we need something in the code base that demonstrates how to do it, so we can all build upon it. The tests run quickly, to support an interactive edit-run cycle.
Change
etl("mv_00*.txt", max_rows=1_000_000)
from 1 M rows to 101 M rows to ingest everything. And then it will be preserved -- we skip ETL if a table is already populated.The "give me two movies you like and I will recommend some more" is frankly still aspirational at this point, but we're not far from it. That's where
two_movies_rec_test.py
got its name.There are some design notes in
make_recommendation.py
, which may serve as the basis for future PRs. Once this merges down tomain
, it's possible we will view thelightfm_recommendation.py
module as obsolete and slated for deletion. It did offer me some inspiration for the current work.I'm happy to accept any and all comments, but please be sensitive to what is in-scope for the current PR as opposed to a subsequent PR. It would be useful to get this merged down, with light edits, this week.