8000 Fix path of pickle files (broken so far) by jilljenn · Pull Request #273 · mangaki/mangaki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix path of pickle files (broken so far) #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 28, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions mangaki/mangaki/tests/test_reco.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from django.test import TestCase
from django.core.urlresolvers import reverse
from django.contrib.auth.models import User


class RecoTest(TestCase):

def setUp(self):
User.objects.create_user(username='test', password='test')

def test_reco(self, **kwargs):
self.client.login(username='test', password='test')
reco_url = reverse('get-reco-algo-list', args=['svd', 'all'])
self.assertEqual(reco_url, '/data/reco/svd/all.json')
1 change: 1 addition & 0 deletions mangaki/mangaki/tests/test_user.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@


class UserTest(TestCase):

def setUp(self):
self.User = get_user_model()

Expand Down
2 changes: 1 addition & 1 deletion mangaki/mangaki/utils/algo.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def fit_algo(algo_name, queryset, backup_filename):
anonymized = dataset.make_anonymous_data(queryset)
algo.set_parameters(anonymized.nb_users, anonymized.nb_works)
algo.fit(anonymized.X, anonymized.y)
if algo_name in {'svd', 'als', 'wals'}: # KNN is constantly refreshed
if algo_name in {'svd', 'als'}: # KNN is constantly refreshed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why WALS is removed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is in the commit title: “Since TensorFlow 1.0, MangakiWALS models cannot be saved anymore” ^^'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely right. My bad for not reading your commit title! But you should have summarized it in your PR message, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now an issue :) #274

algo.save(backup_filename)
dataset.save('ratings-' + backup_filename)
return dataset, algo
17 changes: 15 additions & 2 deletions mangaki/mangaki/utils/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,22 @@ def __init__(self):
self.nb_users = None
self.nb_works = None

def get_backup_path(self, filename):
if filename is None:
filename = self.get_backup_filename()
return os.path.join(PICKLE_DIR, filename)

def has_backup(self, filename=None):
if filename is None:
filename = self.get_backup_filename()
return os.path.isfile(self.get_backup_path(filename))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about : filename or self.get_backup_filename() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trop long je pense.


def save(self, filename):
with open(os.path.join(PICKLE_DIR, filename), 'wb') as f:
with open(self.get_backup_path(filename), 'wb') as f:
pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)

def load(self, filename):
with open(os.path.join(PICKLE_DIR, filename), 'rb') as f:
with open(self.get_backup_path(filename), 'rb') as f:
backup = pickle.load(f)
return backup

Expand All @@ -29,5 +39,8 @@ def set_parameters(self, nb_users, nb_works):
def get_shortname(self):
return 'algo'

def get_backup_filename(self):
return '%s.pickle' % self.get_shortname()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using @Property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a property, I should call it backup_filename, right? (without the get)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think that either a function or a property is fine here.


def __str__(self):
return '[%s]' % self.get_shortname().upper()
5 changes: 3 additions & 2 deletions mangaki/mangaki/utils/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import random
from collections import Counter, namedtuple
from mangaki.utils.values import rating_values
from mangaki.utils.common import PICKLE_DIR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, that should go in Django Settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK mais please, futur commit lorsqu'on aura mis les tests.

import numpy as np
from datetime import datetime

Expand All @@ -24,11 +25,11 @@ def __init__(self):
self.datetime = datetime.now()

def save(self, filename):
with open(os.path.join('pickles', filename), 'wb') as f:
with open(os.path.join(PICKLE_DIR, filename), 'wb') as f:
pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)

def load(self, filename):
with open(os.path.join('pickles', filename), 'rb') as f:
with open(os.path.join(PICKLE_DIR, filename), 'rb') as f:
backup = pickle.load(f)
self.anonymized = backup.anonymized
self.encode_user = backup.encode_user
Expand Down
13 changes: 6 additions & 7 deletions mangaki/mangaki/utils/recommendations.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@


def get_reco_algo(user, algo_name='knn', category='all'):
chrono = Chrono(is_enabled=CHRONO_ENABLED, connection=connection)
chrono = Chrono(is_enabled=CHRONO_ENABLED)

already_rated_works = Rating.objects.filter(user=user).values_list('work_id', flat=True)

Expand Down Expand Up @@ -48,13 +48,12 @@ def get_reco_algo(user, algo_name='knn', category='all'):
chrono.save('get all %d interesting ratings' % queryset.count())

dataset = Dataset()
backup_filename = '%s.pickle' % algo_name
if os.path.isfile(os.path.join('pickles', backup_filename)): # When Algo class will be there: 'if algo.has_backup():'
algo = ALGOS[algo_name]()
algo.load(backup_filename)
dataset.load('ratings-' + backup_filename)
algo = ALGOS[algo_name]()
if algo.has_backup():
algo.load(algo.get_backup_filename())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's frequent to use the default filename, consider use it as a default argument for the load method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je pense juste que je ne peux pas mettre algo.load() parce que ça fait bizarre à lire, tu ne trouves pas ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je pense que juste algo.load() est un peu clunky mais pourquoi pas; si il faut avoir une fonction qui fasse ça algo.try_load_backup() ou algo.load_backup_if_exists() est peut-être approprié. Je pense aussi que garder le if has_backup est OK for now.

dataset.load('ratings-' + algo.get_backup_filename())
else:
dataset, algo = fit_algo(algo_name, queryset, backup_filename)
dataset, algo = fit_algo(algo_name, queryset, algo.get_backup_filename())

chrono.save('fit %s' % algo.get_shortname())

Expand Down
3 changes: 1 addition & 2 deletions mangaki/mangaki/utils/wals.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@ def __init__(self, NB_COMPONENTS=20):
self.sess = tf.InteractiveSession()

def load(self, filename):
with open(os.path.join('pickles', filename), 'rb') as f:
backup = pickle.load(f)
backup = super().load(filename)
self.M = backup.M
self.U = backup.U
self.VT = backup.VT
Expand Down
0