-
Notifications
You must be signed in to change notification settings - Fork 34
Fix path of pickle files (broken so far) #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #273 +/- ##
=======================================
Coverage 48.43% 48.43%
=======================================
Files 17 17
Lines 1053 1053
=======================================
Hits 510 510
Misses 543 543 Continue to review full report at Codecov.
|
@@ -20,7 +20,7 @@ def fit_algo(algo_name, queryset, backup_filename): | |||
anonymized = dataset.make_anonymous_data(queryset) | |||
algo.set_parameters(anonymized.nb_users, anonymized.nb_works) | |||
algo.fit(anonymized.X, anonymized.y) | |||
if algo_name in {'svd', 'als', 'wals'}: # KNN is constantly refreshed | |||
if algo_name in {'svd', 'als'}: # KNN is constantly refreshed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why WALS is removed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer is in the commit title: “Since TensorFlow 1.0, MangakiWALS models cannot be saved anymore” ^^'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely right. My bad for not reading your commit title! But you should have summarized it in your PR message, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now an issue :) #274
mangaki/mangaki/utils/common.py
Outdated
@@ -13,12 +13,20 @@ def __init__(self): | |||
self.nb_users = None | |||
self.nb_works = None | |||
|
|||
def get_backup_path(self, filename): | |||
return os.path.join(PICKLE_DIR, self.get_backup_filename()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename is an unused parameter. Consider using @Property or use the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tu m'as fait peur. En fait ouais il faut faire attention de pas écraser.
def has_backup(self, filename=None): | ||
if filename is None: | ||
filename = self.get_backup_filename() | ||
return os.path.isfile(self.get_backup_path(filename)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about : filename or self.get_backup_filename()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trop long je pense.
@@ -29,5 +37,8 @@ def set_parameters(self, nb_users, nb_works): | |||
def get_shortname(self): | |||
return 'algo' | |||
|
|||
def get_backup_filename(self): | |||
return '%s.pickle' % self.get_shortname() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using @Property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a property, I should call it backup_filename, right? (without the get)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think that either a function or a property is fine here.
@@ -3,6 +3,7 @@ | |||
import random | |||
from collections import Counter, namedtuple | |||
from mangaki.utils.values import rating_values | |||
from mangaki.utils.common import PICKLE_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, that should go in Django Settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK mais please, futur commit lorsqu'on aura mis les tests.
dataset.load('ratings-' + backup_filename) | ||
algo = ALGOS[algo_name]() | ||
if algo.has_backup(): | ||
algo.load(algo.get_backup_filename()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's frequent to use the default filename, consider use it as a default argument for the load method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Je pense juste que je ne peux pas mettre algo.load()
parce que ça fait bizarre à lire, tu ne trouves pas ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Je pense que juste algo.load()
est un peu clunky mais pourquoi pas; si il faut avoir une fonction qui fasse ça algo.try_load_backup()
ou algo.load_backup_if_exists()
est peut-être approprié. Je pense aussi que garder le if has_backup
est OK for now.
I think this PR is a perfect example of what we talked about at the last meeting: the requested changes are mostly cosmetic and somewhat subjective, and thus I don't think they warrant a second round of review for the sake of moving forward. @jilljenn @RaitoBezarius Would you agree that in cases such as this one (and especially when the changes come from an active community member - as opposed to a new contributor), it would be more productive to potentially make such comments when they provide value overall, but still accept the changes and let the author address some or all of them at their discretion? I think it would help prevent some PRs from becoming long pointless discussions about style, but happy to hear your thoughts on this. |
Yes, that's great. Possibly the green check will motivate the PR's author to do as much extra work as they can. (Or they will just click the big green button and completely ignore the useful messages.) |
@Elarnon I definitely agree with what you said, it's just a difficult habit of mine to comment on everything I can… |
Maybe I explained myself poorly - the idea is not to prevent people from making comments they think improve the PR as a whole, but rather to try and make the PR process less resource-intensive in terms of developer time (which we have really little of). If we look at the comments in this PR we have:
In my opinion, none of these are "blockers" for this code: they are valid comments, they are small and easy changes, and I think it would make sense to say "this is good for me as long as you make those changes (except if you find them unwarranted), I trust you from there", i.e. not request another round of review. If we find out that people end up being lazy and not addressing those comments, we can address the issue at that point. Btw, I will reiterate that I am proposing those process changes only for regular contributors - but in any case this GitHub issue is probably not the right place to discuss this, let's move the discussion to Slack (yes I know I'm the one who started the discussion here, and it was a mistake :-( ). |
Alright, I didn't understand it this way, let's move the discussion on Slack. |
* requirements: move pandas, sklearn to production due to DPP recent introduction (#262) * Link to meta.mangaki.fr deleted (#266) * Add all recommendation algorithms to production (#265) * Unify KNN * Add __init__.py in test folder * Allow ANY algorithm in production * Take reviews into account * Last changes * Create Dataset class and fit_algo management command (#268) * Factorize algorithms, create Dataset class and fit_algo management command * Fix bug * Remove useless code * Add migration (#269) * Add new class RecommendationAlgorithm and refactor algorithms (#270) * Add new class RecommendationAlgorithm and refactor algorithms * Fix typo, BTW this PR fixes #267 * Add comment * Cleanup * nb_users and nb_works are instance attributes * Add a template for settings.ini (#272) * Remove first_name and last_name from Artist (#271) * Remove first_name and last_name from Artist * Remove from model as well * Enable anonymous user to rate works (#277) This is a first step towards the resolution of #59. It allows anonymous users to rate works, and keeps the state of the rated works across consecutive page loads in Django's session. For now, the usage of this information is limited: although anonymous user can rate works, they cannot get recommendations from those ratings, transfer them to a new or existing account, etc. Support for these will come in subsequent patches. * Fix path of pickle files (broken so far) (#273) * Since TensorFlow 1.0, MangakiWALS models cannot be saved anymore * Fix pickle path everywhere * Add minimal test for recommendations * Fix unused parameter * Enable anonymous users to get recommendations (#278) This is a second step towards the resolution of #59. It allows anonymous users to get recommendations (using the KNN algorithm only, since we do not have trained models for them). This is implemented by making the various algorithms take directly a list or (user_id, work_id, choice) triplets instead of building those from a queryset, as well as some glue to add the relevant triplets from anonymous users at the interface between the database and the algorithms themselves. * Update seed data (unreviewed) The seed_data.json was not updated after first_name and last_name were removed from Artist. This fixes that. * Simplify circle-ci integration (#281) * Simplify circle config * Fix matplotlib.pyplot * Speculative improvements * relative paths... * Move coverarc to mangaki * Try enabling test reports * Test reports, take 2 * Parallel run is slower * Remove the Profile.score field (#283) This provides little information (especially considering how cryptic what it does is), and the way it is computed gets in the way whenever we want to change things relative to ratings. Note that it would be easy to add this again if the needs arises as even though the value is dynamically updated in various places, it is easy to re-compute it from the sets of Suggestions and Recommendations the user submitted. * Completely remove discourse from Mangaki (#282) We are not using it anymore. Fixes #264. * Allow import of anonymous ratings on account creation (#279) Allow users without an existing account to import anonymous ratings upon account creation. The backend logic behind this is relatively simple and simply requires overriding django-allauth's signup view to move the anonymous ratings from the session to the database, associating them to the freshly created user. On the frontend side, this adds: - On the signup page, if the user has made any anonymous ratings, we add a checkbox allowing them to import their ratings (active by default), as well as a condensed view of their existing ratings. This leverages the works_no_poster.html view and allows the user to change their anonymous ratings directly on the signup page if they wish to do so. - On the login page, if the user has made any anonymous ratings, we add a warning message telling them that they will lose their anonymous ratings upon login (even though technically, they lose it upon logging out) and that they should create a new account if they want to keep those ratings. In the future, this should be replaced with a mechanism to merge existing anonymous ratings with the user's existing ratings. As a byproduct, this patch includes a slight re-design of the login and signup pages, and changes the library we use for displaying bootstrap forms from django-bootstrap-form to django-bootstrap3. django-bootstrap3 provides better hooks to customize forms and made said re-design much easier than it would have been with django-bootstrap-form (in addition, it is [faster](zostera/django-bootstrap3#160)) * Enable AniDB testing through mocking using responses library (#285) * tests: Enable AniDB testing through mocking using responses library → Minor fixes in `anidb.py` file (default arguments) → Add fixture directory and test fixture directory in settings. → Use test fixture directory to feed mocking (fresh fixture from AniDB!) → Test `AniDB.search`. → Test `AniDB.get_dict`. → Add AniDB constants inside the class for convenience purpose. → Reformatting and imports optimization. * Some April cleanup! (#289) * WorkList: use a property for `is_dpp` * management command: fix analytics * knn: fix constructor and reformat according to PEP8. * chrono: remove unused keyword argument. * imports: clean unused imports. * knn: remove the ugly [\n and put a comment on its own line * knn: moar formatting implementation * management command: remove analytics * anonymous-ratings: Review page, clear all ratings, remainder banner (#287) * anonymous-ratings: Review page, clear all ratings, remainder banner - Add a remainder banner of anonymous ratings. - Add a review page by reusing `get_profile` view. - Add a way to clear all ratings (ugly link) * chore: reformat for better readability * views: enforce POST to get CSRF token check and return proper JSON (#293) * Reformat and remove useless classes in admin (#294) * admin: reformat code and remove useless classes * admin: improve code style * perf/views: select category to prevent performance issue (#292) * work-list: fetch int_poster also to prevent duplicated queries (#295) * Update about and events (#286) * Update about and events * Add videos * Fix Soubi and Elarnon * Fix OpenGraph and Twittercard * Okay, it was a bad idea * Remove TensorFlow logging at once * Let's forget about removing logging * Factor recommendation algorithms to handle backups (#276) * Since TensorFlow 1.0, MangakiWALS models cannot be saved anymore * Fix pickle path everywhere * Add minimal test for recommendations * Fix unused parameter * Add notebook for benchmarks for cold-start * Fix KNN * Add LinearRegression for cold-start * Add MangakiProxyDPP * Cleanup compare management command * Edit experiment * Add computation of delta and update for ALS * Add BGS from Anava et al.'s WWW 2015 paper * Fix BGS, withdraw paper from RecSys * Fix NMF * Add Autoencoder with TensorFlow * Now users receive SVD if it exists, otherwise KNN * Order whole rating list by any algorithm * Finally manage those damn logs * Add logging and fix paths * Fix DPP and clean a lot of code * Remove useless print * Oops, forgot one file * Fix problems linked to merge * Add an admin interface for merging works (#299) * Add an admin interface for merging works * Factor code * Add minor changes * Fix bug, and Language to admin * Take Raito changes into account * Fix bugs and display number of ratings of each work * [Hotfix] MAL imports (#301) * mal-import: refactor the whole code into something more maintainable - Fix doctor's fields and use MALEntry. - MAL: - Refactor the whole module using a requests.Session and instantiating a MALClient which handles the API operations. - Rewrite the DB interaction to be more efficient and fix it in some edge cases. * add a default user-agent * mangaki: fix when MAL import is not available * mal: add some tests and refactor duplicated / unused code * mal: rewrite the doctests with None * mal: add missing fixtures * Hotfix, KNN did not work * Improve MAL usage in Mangaki (#302) * mal-import: refactor the whole code into something more maintainable - Fix doctor's fields and use MALEntry. - MAL: - Refactor the whole module using a requests.Session and instantiating a MALClient which handles the API operations. - Rewrite the DB interaction to be more efficient and fix it in some edge cases. * add a default user-agent * mangaki: fix when MAL import is not available * mal: add some tests and refactor duplicated / unused code * mal: rewrite the doctests with None * mal: add missing fixtures * MAL: add synonyms and prepare for genres / types → Make language a nullable field for WorkTitle. → Added the related migration. * Hotfix profile page with user recommendations * Hotfix MAL * [Hotfix] Toggling ratings (+test!) (#306) * Fix toggle rating and add test * reducing current_user_set_toggle_rating does too many queries * Upgrade to Django 1.11 * Revert "Upgrade to Django 1.11" This reverts commit 4f58bc4. Meant to ask for review ;-) * Upgrade to Django 1.11 (#315) * Add attribute research_ok to Profile (#316) * Add attribute research_ok to Profile * Handling tokens * ADD TESTS §§§ * Add HASH_NACL to Circle settings * ADD TESTS FOR REAL * Remove print * Use built-in Django functions for hashing * Add token utils * Salt is not a secret anymore * Put salt at the correct place * Fix settings * Use None instead of 0 for anonymous user identifier. Previously, we were using 0 for the identifier of an anonymous user, except that it made us use a `current_user_id` variable for tracking that and it was easy to just use (erroneously) `request.user.id` instead. Since there actually are no requirements on `current_user_id` being a numeric value, this patch makes it so that we use `None` instead, since that is the value of `user.id` for an anonymous user, and removes `current_user_id` altogether. Fixes #317. * [Hotfix] Handle recommended ghost works (#323) * Non-anonymous users don't have anonymous ratings (#324) While we keep them until the session ends for technical reasons, they shouldn't be shown. * Ajout du meta description (#325) * Add link in profile for sorting any wishlist (#326) * Fix #322, one could not see their own profile if private * Add link in profile for sorting any wishlist * Send templated mail with tokens (#327) * Fix CSS for footer navbar * Fix error handling in the research view * Send mails with tokens * I mean everyone * Add jinja2 to production requirements * Add os.access * List users in tokens mgmt command * Only require tensorflow when needed (#329) TensorFlow is only required by the WALS algorithm, and loading the library has a prohibitive overhead on lower-powered devices. Let's only load it when the WALS algorithm is needed, which for now is only when running the algorithm, which shouldn't be done on the webserver anyways. * Clarify questions for usage of ratings in Kyoto challenge (#330) * Kill Doctor management command (#333) * Better Ansible setup (#321) This gives a relift to the Ansible setup, allowing several things that were not included in the previous setup: - Possibility to provision a development machine (e.g. Vagrant box) - Possibility to easily dump & restore a database - Multi-site deployment (deployments are identified by name; except for some global nginx and postgresql configuration that shouldn't change anyways, care has been taken to ensure all other settings are properly isolated so that several deployments with different `mangaki_name` values can co-exist on the same machine) - git-free deployment; Mangaki can now be built as a PIP package that is copied to the remote machine for installation This deployment is implemented as a single playbook that was made as declarative as possible; tags and (Ansible's) environment variables should be used to run the few actions that should be run (collectstatic, dumping or loading a database, etc.) * Remove unused test.css file. (#335) Also, why did we have executable CSS files? WTF. * Make French great again (space before '!' become ) (#332) * add at good places and make french great again * typography: add a missing * Improve drastically MAL import performance (#311) * models(work,fts): add full text search fields for Work / WorkTitle * mal: optimise the hell out of the MAL import procedure - Use full text search to match all synonyms / titles to prevent duplicates. - Get all existing ratings and then, deduce which ratings to add (not fully working, some ratings still appears even though they're already in DB!). - Import your MAL at the speed of the light without any guarantee on future violations of theoretical physics. - Possible improvements: batch full text search queries and deduce a subset of works to get from MAL (also figure out a good batch_size). * migrations: add GIN indexes / triggers / ts vectors fields on Work / WorkTitle * Address PR comments and correct a test using search * mal: add some tests using hypothesis Fix also a bug with empty synonyms fields. * mal(tests): Use subtests for status code testing * mal(utils): Explicit the class behind the namedtuple * migrations: add merge migrations between research OK and MAL data * Address PR comments * House-keeping of gitignore and dependencies (#341) * gitignore: ignore build files when packaging Mangaki * djdt: bump to 1.8 * Add a data migration to fix MAL external posters (#347) Move to new MAL's CDN. * Reduce the amount of queries for ratings selection (#348) * Load all ratings only at fitting time (recommendations) (#340) * recommendations: load all ratings if only and only we will fit the algorithm (SVD, ALS, WALS, …) * Proper handling of algo_name * Add WorkCluster class for merging works (dedupe, suggestions, or merge from admin) (#307) * Add WorkCluster to admin * Add migration * Allow call to merge from WorkCluster admin * Add test for merge action * Add all_objects manager and format_html * Add migration file for manager * Butchering migrations * Make a better test for merging, in order to reduce the number of queries * Cover all merge functions with tests, reduce the number of queries which was tremendous * Resolve multiplication of migration leaves * Cut line * Use an array rather than string for source_domains on mangaki_dev group vars (#352) * Kill /users route (#354) * Kill /users route * Remove user_list template * Use timezone-aware now rather than naive datetime (#355) * [Hotfix] Make posters refreshable once again (fix get_potential_poster) (#357) * Add get_entry_from_work and fix get_potential_posters for admin * Address PR comments * Thanks Django ; I thought I didn't know how to compute in Boole's algebra. * Remove moar useless comments * Re-design the work cards (#331) * Add ribbons displaying card category (#362) Fixes #141 * Fill Language model and use it with AniDB and MAL (#338) We add a data migration to support many AniDB and MAL languages, fixing the previous inability of administrators to edit languages of a WorkTitle. Then, we make MAL import and AniDB API aware of these models so that they can attach more data to our works when operating (e.g. importing, adding new works in the DB). Finally, we add a new administrator action to refresh WorkTitle of a work linked to AniDB, also its `ext_synopsis` which will be useful for l10n of Mangaki I suppose. * Add migration for ext_lang / type of WorkTitle (#366) Missing migration from last PR. * Rewrite profile view (#364) This new rewrite gives a huge boost of performance for profiles with a lot of works (1k+). Also, it improves the privacy of profiles by not displaying information about the count of animes rated, and so on. Finally, it is paginated and a bit more mobile-friendly that the previous version. * Integrate Sentry (#350) Now Mangaki supports Sentry integration, especially for beta.mangaki.fr. It'll make it easier to track unexpected exceptions on the back-end. * Add raven>=6.1.0 in setup.py (unreviewed) * Move to INFO and add console to the root handler (#368) Fixes Sentry reporting of views exception. * Hotfix: do not show recommendations tab on anonymous profiles (#369) * Add i18n in English and Japanese (#356) Initial i18n of "about us" pages. * Do not show alternative titles on work detail page (#371) * Add AniDB to settings template (#372) * Release the 0.2 (#361) * s/jp/ja is the good way (#380) * Forgotten s/jp/ja in base.html template (unreviewed)
Dans la foulée j'ai vraiment cru que je pourrais faire un test de la route recommandation mais c'est galère parce qu'il faut :
En plus du coup ça va ralentir les tests. Notez qu'on peut juste tester MangakiZero, mais faut quand même les œuvres bidon.