8000 Add WorkCluster class for merging works (dedupe, suggestions, or merge from admin) by jilljenn · Pull Request #307 · mangaki/mangaki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add WorkCluster class for merging works (dedupe, suggestions, or merge from admin) #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 22, 2017

Conversation

jilljenn
Copy link
Member
@jilljenn jilljenn commented Apr 30, 2017

Ongoing work.

@codecov
Copy link
codecov bot commented Apr 30, 2017

Codecov Report

Merging #307 into master will increase coverage by 2.72%.
The diff coverage is 85.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #307      +/-   ##
==========================================
+ Coverage   50.28%   53.01%   +2.72%     
==========================================
  Files          68       69       +1     
  Lines        3975     4101     +126     
==========================================
+ Hits         1999     2174     +175     
+ Misses       1976     1927      -49
Impacted Files Coverage Δ
mangaki/mangaki/choices.py 100% <100%> (ø) ⬆️
mangaki/mangaki/models.py 82.62% <100%> (+0.83%) ⬆️
...i/templates/admin/merge_selected_confirmation.html 100% <100%> (ø)
mangaki/mangaki/admin.py 62.15% <83.07%> (+23.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27aaee8...d000a3d. Read the comment docs.

@jilljenn
Copy link
Member Author
jilljenn commented May 9, 2017

TROP CHIANT.

  • J'ai l'impression que si on delete un WorkCluster ses Work sont supprimés aussi
  • Ajouté un champ redirect à Work qui n'est pas affiché dans l'admin, problème : lorsqu'on merge les works 1 et 2 et que 2 redirige vers 1, au lieu d'afficher « La fusion 1-2 s'est déroulée avec succès » ça n'affiche plus que « La fusion 1 s'est déroulée avec succès », bah oui 2 a été filtré dans le queryset ! —__—
  • Peut-être que le reviewer n'appréciera pas ma gestion des réponses dans les actions merge et trigger_merge ?

@RaitoBezarius
Copy link
Member

→ 2, why are you filtering out works with redirect = null?
You should make use of the redirection in the work view by checking if the field is non-null and do a permanent redirection.



@transaction.atomic # In case trouble happens
def merge_works(request, works_to_merge, admin_queryset, from_cluster):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we break down the logic of this function into smaller chunks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but if so, should I add a file merge.py to utils?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just have helper functions here in admin.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have some documentation on the parameters though (in particular, what is adin_queryset?)

@jilljenn
Copy link
Member Author

I'm filtering redirect != Null entries in the admin because:

  • I should not redirect an admin entry towards another admin entry;
  • it is boring to see duplicates in the admin list you're browsing while you just merged them.

return None

do_not_compare = ['sum_ratings', 'nb_ratings', 'nb_likes', 'nb_dislikes', 'controversy']
priority = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably make sense to store those directly on the MergeType enum.

class MergeType(Enum):
    INFO_ONLY = (0, 'black')
    JUST_CONFIRM = (1, 'green')
    CHOICE_REQUIRED = (2, 'red')

    def __init__(self, priority, row_color):
        self.priority = priority
        self.row_color = row_color

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -76,6 +191,10 @@ class WorkAdmin(admin.ModelAdmin):
'controversy',
)

def get_queryset(self, request):
qs = super().get_queryset(request)
return qs.filter(redirect__isnull=True) # Exclude phantom works
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it make sense for the default Work queryset to exclude those so-called phantom works, with explicit overrides (via a Work.all_objects or some such) when we really want to see them appear -- which would probably be only when viewing a Work and redirecting to its canonical version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I can't make it. Can you? I mean, how to add extra elements when the whole source is filtered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just define two managers Work, see https://docs.djangoproject.com/en/1.11/topics/db/managers/#modifying-a-manager-s-initial-queryset

NB: Since we use a custom QuerySet for Works, you should use from_queryset; something like the following:

class FilteredWorkManager(models.Manager):
    def get_queryset(self):
        return super().get_queryset().filter(redirect__isnull=True)

class Work(models.Model):
    ...
    objects = FilteredWorkManager.from_queryset(WorkQuerySet)()
    all_objects = WorkQuerySet.as_manager()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author
@jilljenn jilljenn May 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J'ai eu des soucis avec les ManyToMany.

If you override the get_queryset() method and filter out any rows, Django will return incorrect results. Don’t do that. A manager that filters results in get_queryset() is not appropriate for use as a base manager.

Source : Django

On fait toujours ça ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah oui zut j'avais oublié ce truc désolé :-/ That's the right fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this get_queryset function in the admin can be removed now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the behavior is inconsistent with the doc since Django seems to be using the default manager for related object accesses instead of the base manager... I'm investigating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is indeed only used for one-to-one fields due to historical reasons, which is unfortunate. I think I'd prefer using WorkCluster.objects.get(id=10).works(manager='all_objects').all() instead, since WorkCluster is "special" in the sense that it essentially the only class that should see the removed Works, we want them to appear as non-existent for other objects. Not 100% sure though, I think the manual solution works for now -- I'd appreciate a comment there explaining that we use manual filtering because ManyToManyField doesn't honor the default_manager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: # Equivalent to default_manager_name = 'all_objects', because first in the list.

return TemplateResponse(request, 'admin/merge_selected_confirmation.html', context)
response = merge_works(request, queryset, queryset, from_cluster=False)
if response is None:
self.message_user(request, "La fusion %s a bien été effectuée." % '-'.join(str(work.id) for work in queryset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes that much sense to display the IDs here. Here is how I would (mostly) write this function:

from django.utils.html import format_html  # At top of file :-)

num_to_merge = len(queryset)  # Could possibly be returned by merge_works
response = merge_works(request, queryset, queryset, from_cluster=False)
if response is None:
    final_work = queryset.get()  # Should definitely be returned by merge_works 
    self.message_user(request, format_html(
        "La fusion de {:d} œuvres vers <a href="{:s}">{:s}</a> a été effectuée." % (
            num_to_merge, final_work.get_absolute_url(), final_work.title)))
return response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def get_work_titles(self, obj):
if obj.works.exists():
return '<ul>' + ''.join('<li>%s (<a href="/admin/mangaki/work/%d/change/">%d</a>)</li>' % (work.title, work.id, work.id) for work in obj.works.all()) + '</ul>'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.html.format_html_join and https://docs.djangoproject.com/en/dev/ref/contrib/admin/#reversing-admin-urls here.

if obj.works.exists():
    def get_admin_url(work):
        return reverse('admin:mangaki_work_change', args=(work.id,))
    return (
        '<ul>' +
        format_html_join('', '<li>{} (<a href="{}">{}</a>)</li>',
            ((work.title, get_admin_url(work), work.id) for work in obj.works.all())) +
        '</ul>'
    )
else:
    return '(all deleted)'

< F438 span data-view-component="true"> Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jilljenn
Copy link
Member Author

Fixes #98.

try:
kept_rating = Rating.objects.get(work_id=final_work.id, user_id=rating_to_redirect.user_id)
if kept_rating.date and rating_to_redirect.date and kept_rating.date < rating_to_redirect.date: # Kept rating is not the latest given
kept_rating.choice = rating_to_redirect.choice # Update the kept rating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is kept_rating saved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :'(

continue
for rating_to_redirect in work_to_redirect.rating_set.all():
try:
kept_rating = Rating.objects.get(work_id=final_work.id, user_id=rating_to_redirect.user_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should optimise the queries, maybe kept_ratings = Rating.objects.filter(work_id=final_work.id, user_id__in=all_user_ids) and iterate over it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I should, sorry.

staff_to_redirect.save()
final_work.genre.add(*work_to_redirect.genre.all())
Trope.objects.filter(origin_id=work_to_redirect.id).update(origin_id=final_work.id)
for model in [WorkTitle, TaggedWork, Suggestion, Recommendation, Pairing, Reference, ColdStartRating]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: tuple looks better here.

Copy link
Member Author
@jilljenn jilljenn Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm okay.

from mangaki.models import Work, Editor, Category, Studio, WorkCluster


class RecoTest(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergeTest you meant?

Copy link
Member Author
@jilljenn jilljenn Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@jilljenn
Copy link
Member Author

Ça tarde mais je vais faire un test qui vérifie que le nombre de requêtes faites est en dessous d'un certain seuil xD

tomorrow = date.today() + timedelta(1)

anime = Category.objects.get(slug='anime')
Work.objects.bulk_create([Work(title='Sangatsu no Lion', category=anime) for _ in range(10)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlimited Duplicates Works. Shouldn't we create different animes and assume that duplicates may throw an IntegrityError ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are different. They just contain the same title because I was bored. It's okay, different anime can unfortunately have the same title :)

@jilljenn
Copy link
Member Author
jilljenn commented Jun 22, 2017

Let me explain what is happening here.

  • Ryan noticed that I was doing too many queries.
  • I made a test where an admin wants to merge 10 works with 1 rating.
  • The number of queries was 142.
  • The number of queries is now self.assertNumQueries(36).
  • Even with extra ratings with the following edge cases:
    • Rating of the canonical work should be replaced with more recent ratings on works to be merged.
    • Staff pairs (artist, role) should be deleted because already present in the canonical work.

Note that:

  • bulk_create returns objects only with PostgreSQL. If we change to SQLite for faster tests, errors will happen.
  • I am not sure what models should/are deleted when a model is deleted (link with CASCADE)?
  • Maybe ratings should have a DateTime, not a Date.
  • Ideally, related objects should be fetched using Work._meta.get_fields() (in recent versions of Django). But in these objects, the Work field does not always have the same attribute (it can be origin in Trope, album in Track), so I don't know how to do this elegantly. Should we prefetch every related object in order to prepare the renaming? Maybe this question is general enough to be asked on Stack Overflow.

This PR was really boring because:

  • We overrided the default manager because we don't want duplicates to appear in the admin. So it requires understanding the difference between _default_manager and _base_manager. (I can't recall.) Anyway, please do not switch the lines all_objects and objects in models.py.
  • It is hard to fake the date of a rating for testing. save() triggers auto_now, update_or_create does the same, so the only way to do this is update. So we should create the objects, then update them. It cost me a lot of time to understand this.

Copy link
Member
@RaitoBezarius RaitoBezarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers:

bulk_create returns objects only with PostgreSQL. If we change to SQLite for faster tests, errors will happen.

We won't use SQLite for faster tests, we should always use PostgreSQL even during tests (we use special features of PGSQL anyway.)

Maybe ratings should have a DateTime, not a Date.

I agree we should have a timestamp.

It is hard to fake the date of a rating for testing. save() triggers auto_now, update_or_create does the same, so the only way to do this is update. So we should create the objects, then update them. It cost me a lot of time to understand this.

I think you should rather make a fixture and load it in your tests, this way, you just write the data, Django loads it in the database, no need to setUp or to update.

final_work_staff = set()
kept_staff_ids = []
# Only one query: put final_work's Staff objects first in the list
for staff_id, work_id, artist_id, role_id in Staff.objects.filter(work__in=works_to_merge).annotate(belongs_to_final_work=Case(When(work_id=final_work.id, then=Value(1)), default=Value(0), output_field=IntegerField())).order_by('-belongs_to_final_work').values_list('id', 'work_id', 'artist_id', 'role_id'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really have to cut this line, it's too long.
It's okay to create variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@jilljenn jilljenn merged commit ebf090f into master Jun 22, 2017
@jilljenn jilljenn deleted the jj/add-workcluster branch June 22, 2017 13:53
@Elarnon
Copy link
Contributor
Elarnon commented Jun 22, 2017

Ideally, related objects should be fetched using Work._meta.get_fields() (in recent versions of Django). But in these objects, the Work field does not always have the same attribute (it can be origin in Trope, album in Track), so I don't know how to do this elegantly. Should we prefetch every related object in order to prepare the renaming? Maybe this question is general enough to be asked on Stack Overflow.

You can query based on the type of the field, the name is not relevant. I have some prototype work for this from September lying around somewhere, I can dig it up if needed.

We overrided the default manager because we don't want duplicates to appear in the admin. So it requires understanding the difference between _default_manager and _base_manager. (I can't recall.) Anyway, please do not switch the lines all_objects and objects in models.py.

I'm not sure why you didn't want the explicit version of assigning _default_manager by hand instead of relying on the impliciteness of "the default manager is the first defined" then.

It is hard to fake the date of a rating for testing. save() triggers auto_now, update_or_create does the same, so the only way to do this is update. So we should create the objects, then update them. It cost me a lot of time to understand this.

I believe fixtures are the right solution for this problem.

I am not sure what models should/are deleted when a model is deleted (link with CASCADE)?

Yes, this is something we should think about, and yes it has a link with CASCADE. Some of the ForeignKeys should be on_delete=PROTECT at the very least -- for instance, Ratings, because all the ratings for a Work should have been either re-pointed to the de-duplicated Work or carefully deleted before deleting a Work.

jilljenn pushed a commit that referenced this pull request Jul 2, 2017
* requirements: move pandas, sklearn to production due to DPP recent introduction (#262)

* Link to meta.mangaki.fr deleted (#266)

* Add all recommendation algorithms to production (#265)

* Unify KNN

* Add __init__.py in test folder

* Allow ANY algorithm in production

* Take reviews into account

* Last changes

* Create Dataset class and fit_algo management command (#268)

* Factorize algorithms, create Dataset class and fit_algo management command

* Fix bug

* Remove useless code

* Add migration (#269)

* Add new class RecommendationAlgorithm and refactor algorithms (#270)

* Add new class RecommendationAlgorithm and refactor algorithms

* Fix typo, BTW this PR fixes #267

* Add comment

* Cleanup

* nb_users and nb_works are instance attributes

* Add a template for settings.ini (#272)

* Remove first_name and last_name from Artist (#271)

* Remove first_name and last_name from Artist

* Remove from model as well

* Enable anonymous user to rate works (#277)

This is a first step towards the resolution of #59. It allows
anonymous users to rate works, and keeps the state of the rated works
across consecutive page loads in Django's session.

For now, the usage of this information is limited: although anonymous
user can rate works, they cannot get recommendations from those
ratings, transfer them to a new or existing account, etc. Support for
these will come in subsequent patches.

* Fix path of pickle files (broken so far) (#273)

* Since TensorFlow 1.0, MangakiWALS models cannot be saved anymore

* Fix pickle path everywhere

* Add minimal test for recommendations

* Fix unused parameter

* Enable anonymous users to get recommendations (#278)

This is a second step towards the resolution of #59. It allows
anonymous users to get recommendations (using the KNN algorithm only,
since we do not have trained models for them).

This is implemented by making the various algorithms take directly a
list or (user_id, work_id, choice) triplets instead of building those
from a queryset, as well as some glue to add the relevant triplets
from anonymous users at the interface between the database and the
algorithms themselves.

* Update seed data (unreviewed)

The seed_data.json was not updated after first_name and last_name were
removed from Artist. This fixes that.

* Simplify circle-ci integration (#281)

* Simplify circle config

* Fix matplotlib.pyplot

* Speculative improvements

* relative paths...

* Move coverarc to mangaki

* Try enabling test reports

* Test reports, take 2

* Parallel run is slower

* Remove the Profile.score field (#283)

This provides little information (especially considering how cryptic what it
does is), and the way it is computed gets in the way whenever we want to change
things relative to ratings.

Note that it would be easy to add this again if the needs arises as even though
the value is dynamically updated in various places, it is easy to re-compute it
from the sets of Suggestions and Recommendations the user submitted.

* Completely remove discourse from Mangaki (#282)

We are not using it anymore. Fixes #264.

* Allow import of anonymous ratings on account creation (#279)

Allow users without an existing account to import anonymous ratings upon
account creation.

The backend logic behind this is relatively simple and simply requires
overriding django-allauth's signup view to move the anonymous ratings from the
session to the database, associating them to the freshly created user.

On the frontend side, this adds:
 - On the signup page, if the user has made any anonymous ratings, we add a
   checkbox allowing them to import their ratings (active by default), as well
   as a condensed view of their existing ratings. This leverages the
   works_no_poster.html view and allows the user to change their anonymous
   ratings directly on the signup page if they wish to do so.

 - On the login page, if the user has made any anonymous ratings, we add a
   warning message telling them that they will lose their anonymous ratings
   upon login (even though technically, they lose it upon logging out) and that
   they should create a new account if they want to keep those ratings. In the
   future, this should be replaced with a mechanism to merge existing anonymous
   ratings with the user's existing ratings.

As a byproduct, this patch includes a slight re-design of the login and signup
pages, and changes the library we use for displaying bootstrap forms from
django-bootstrap-form to django-bootstrap3. django-bootstrap3 provides better
hooks to customize forms and made said re-design much easier than it would have
been with django-bootstrap-form (in addition, it is
[faster](zostera/django-bootstrap3#160))

* Enable AniDB testing through mocking using responses library (#285)

* tests: Enable AniDB testing through mocking using responses library

→ Minor fixes in `anidb.py` file (default arguments)
→ Add fixture directory and test fixture directory in settings.
→ Use test fixture directory to feed mocking (fresh fixture from AniDB!)
→ Test `AniDB.search`.
→ Test `AniDB.get_dict`.
→ Add AniDB constants inside the class for convenience purpose.
→ Reformatting and imports optimization.

* Some April cleanup! (#289)

* WorkList: use a property for `is_dpp`

* management command: fix analytics

* knn: fix constructor and reformat according to PEP8.

* chrono: remove unused keyword argument.

* imports: clean unused imports.

* knn: remove the ugly [\n and put a comment on its own line

* knn: moar formatting implementation

* management command: remove analytics

* anonymous-ratings: Review page, clear all ratings, remainder banner (#287)

* anonymous-ratings: Review page, clear all ratings, remainder banner

- Add a remainder banner of anonymous ratings.
- Add a review page by reusing `get_profile` view.
- Add a way to clear all ratings (ugly link)

* chore: reformat for better readability

* views: enforce POST to get CSRF token check and return proper JSON (#293)

* Reformat and remove useless classes in admin (#294)

* admin: reformat code and remove useless classes

* admin: improve code style

* perf/views: select category to prevent performance issue (#292)

* work-list: fetch int_poster also to prevent duplicated queries (#295)

* Update about and events (#286)

* Update about and events

* Add videos

* Fix Soubi and Elarnon

* Fix OpenGraph and Twittercard

* Okay, it was a bad idea

* Remove TensorFlow logging at once

* Let's forget about removing logging

* Factor recommendation algorithms to handle backups (#276)

* Since TensorFlow 1.0, MangakiWALS models cannot be saved anymore

* Fix pickle path everywhere

* Add minimal test for recommendations

* Fix unused parameter

* Add notebook for benchmarks for cold-start

* Fix KNN

* Add LinearRegression for cold-start

* Add MangakiProxyDPP

* Cleanup compare management command

* Edit experiment

* Add computation of delta and update for ALS

* Add BGS from Anava et al.'s WWW 2015 paper

* Fix BGS, withdraw paper from RecSys

* Fix NMF

* Add Autoencoder with TensorFlow

* Now users receive SVD if it exists, otherwise KNN

* Order whole rating list by any algorithm

* Finally manage those damn logs

* Add logging and fix paths

* Fix DPP and clean a lot of code

* Remove useless print

* Oops, forgot one file

* Fix problems linked to merge

* Add an admin interface for merging works (#299)

* Add an admin interface for merging works

* Factor code

* Add minor changes

* Fix bug, and Language to admin

* Take Raito changes into account

* Fix bugs and display number of ratings of each work

* [Hotfix] MAL imports (#301)

* mal-import: refactor the whole code into something more maintainable

- Fix doctor's fields and use MALEntry.
- MAL:
        - Refactor the whole module using a requests.Session and
          instantiating a MALClient which handles the API operations.
        - Rewrite the DB interaction to be more efficient and fix it in
          some edge cases.

* add a default user-agent

* mangaki: fix when MAL import is not available

* mal: add some tests and refactor duplicated / unused code

* mal: rewrite the doctests with None

* mal: add missing fixtures

* Hotfix, KNN did not work

* Improve MAL usage in Mangaki (#302)

* mal-import: refactor the whole code into something more maintainable

- Fix doctor's fields and use MALEntry.
- MAL:
        - Refactor the whole module using a requests.Session and
          instantiating a MALClient which handles the API operations.
        - Rewrite the DB interaction to be more efficient and fix it in
          some edge cases.

* add a default user-agent

* mangaki: fix when MAL import is not available

* mal: add some tests and refactor duplicated / unused code

* mal: rewrite the doctests with None

* mal: add missing fixtures

* MAL: add synonyms and prepare for genres / types

→ Make language a nullable field for WorkTitle.
→ Added the related migration.

* Hotfix profile page with user recommendations

* Hotfix MAL

* [Hotfix] Toggling ratings (+test!) (#306)

* Fix toggle rating and add test

* reducing current_user_set_toggle_rating does too many queries

* Upgrade to Django 1.11

* Revert "Upgrade to Django 1.11"

This reverts commit 4f58bc4.

Meant to ask for review ;-)

* Upgrade to Django 1.11 (#315)

* Add attribute research_ok to Profile (#316)

* Add attribute research_ok to Profile

* Handling tokens

* ADD TESTS §§§

* Add HASH_NACL to Circle settings

* ADD TESTS FOR REAL

* Remove print

* Use built-in Django functions for hashing

* Add token utils

* Salt is not a secret anymore

* Put salt at the correct place

* Fix settings

* Use None instead of 0 for anonymous user identifier.

Previously, we were using 0 for the identifier of an anonymous user,
except that it made us use a `current_user_id` variable for tracking
that and it was easy to just use (erroneously) `request.user.id`
instead.

Since there actually are no requirements on `current_user_id` being a
numeric value, this patch makes it so that we use `None` instead, since
that is the value of `user.id` for an anonymous user, and removes
`current_user_id` altogether.

Fixes #317.

* [Hotfix] Handle recommended ghost works (#323)

* Non-anonymous users don't have anonymous ratings (#324)

While we keep them until the session ends for technical reasons, they
shouldn't be shown.

* Ajout du meta description (#325)

* Add link in profile for sorting any wishlist (#326)

* Fix #322, one could not see their own profile if private

* Add link in profile for sorting any wishlist

* Send templated mail with tokens (#327)

* Fix CSS for footer navbar

* Fix error handling in the research view

* Send mails with tokens

* I mean everyone

* Add jinja2 to production requirements

* Add os.access

* List users in tokens mgmt command

* Only require tensorflow when needed (#329)

TensorFlow is only required by the WALS algorithm, and loading the
library has a prohibitive overhead on lower-powered devices. Let's only
load it when the WALS algorithm is needed, which for now is only when
running the algorithm, which shouldn't be done on the webserver anyways.

* Clarify questions for usage of ratings in Kyoto challenge (#330)

* Kill Doctor management command (#333)

* Better Ansible setup (#321)

This gives a relift to the Ansible setup, allowing several things that
were not included in the previous setup:

 - Possibility to provision a development machine (e.g. Vagrant box)
 - Possibility to easily dump & restore a database
 - Multi-site deployment (deployments are identified by name; except for
   some global nginx and postgresql configuration that shouldn't change
   anyways, care has been taken to ensure all other settings are
   properly isolated so that several deployments with different
   `mangaki_name` values can co-exist on the same machine)
 - git-free deployment; Mangaki can now be built as a PIP package that
   is copied to the remote machine for installation

This deployment is implemented as a single playbook that was made as
declarative as possible; tags and (Ansible's) environment variables
should be used to run the few actions that should be run (collectstatic,
dumping or loading a database, etc.)

* Remove unused test.css file. (#335)

Also, why did we have executable CSS files? WTF.

* Make French great again (space before '!' become &nbsp;) (#332)

* add &nbsp; at good places and make french great again

* typography: add a missing &nbsp;

* Improve drastically MAL import performance (#311)

* models(work,fts): add full text search fields for Work / WorkTitle

* mal: optimise the hell out of the MAL import procedure

- Use full text search to match all synonyms / titles to prevent
  duplicates.
- Get all existing ratings and then, deduce which ratings to add (not
  fully working, some ratings still appears even though they're already
  in DB!).
- Import your MAL at the speed of the light without any guarantee on
  future violations of theoretical physics.

- Possible improvements: batch full text search queries and deduce a
  subset of works to get from MAL (also figure out a good batch_size).

* migrations: add GIN indexes / triggers / ts vectors fields on Work / WorkTitle

* Address PR comments and correct a test using search

* mal: add some tests using hypothesis

Fix also a bug with empty synonyms fields.

* mal(tests): Use subtests for status code testing

* mal(utils): Explicit the class behind the namedtuple

* migrations: add merge migrations between research OK and MAL data

* Address PR comments

* House-keeping of gitignore and dependencies (#341)

* gitignore: ignore build files when packaging Mangaki

* djdt: bump to 1.8

* Add a data migration to fix MAL external posters (#347)

Move to new MAL's CDN.

* Reduce the amount of queries for ratings selection (#348)

* Load all ratings only at fitting time (recommendations) (#340)

* recommendations: load all ratings if only and only we will fit the algorithm (SVD, ALS, WALS, …)

* Proper handling of algo_name

* Add WorkCluster class for merging works (dedupe, suggestions, or merge from admin) (#307)

* Add WorkCluster to admin

* Add migration

* Allow call to merge from WorkCluster admin

* Add test for merge action

* Add all_objects manager and format_html

* Add migration file for manager

* Butchering migrations

* Make a better test for merging, in order to reduce the number of queries

* Cover all merge functions with tests, reduce the number of queries which was tremendous

* Resolve multiplication of migration leaves

* Cut line

* Use an array rather than string for source_domains on mangaki_dev group vars (#352)

* Kill /users route (#354)

* Kill /users route

* Remove user_list template

* Use timezone-aware now rather than naive datetime (#355)

* [Hotfix] Make posters refreshable once again (fix get_potential_poster) (#357)

* Add get_entry_from_work and fix get_potential_posters for admin

* Address PR comments

* Thanks Django ; I thought I didn't know how to compute in Boole's algebra.

* Remove moar useless comments

* Re-design the work cards (#331)

* Add ribbons displaying card category (#362)

Fixes #141

* Fill Language model and use it with AniDB and MAL (#338)

We add a data migration to support many AniDB and MAL languages, fixing the previous inability of administrators to edit languages of a WorkTitle.

Then, we make MAL import and AniDB API aware of these models so that they can attach more data to our works when operating (e.g. importing, adding new works in the DB).

Finally, we add a new administrator action to refresh WorkTitle of a work linked to AniDB, also its `ext_synopsis` which will be useful for l10n of Mangaki I suppose.

* Add migration for ext_lang / type of WorkTitle (#366)

Missing migration from last PR.

* Rewrite profile view (#364)

This new rewrite gives a huge boost of performance for profiles with a lot of works (1k+).

Also, it improves the privacy of profiles by not displaying information about the count of animes rated, and so on.

Finally, it is paginated and a bit more mobile-friendly that the previous version.

* Integrate Sentry (#350)

Now Mangaki supports Sentry integration, especially for beta.mangaki.fr.

It'll make it easier to track unexpected exceptions on the back-end.

* Add raven>=6.1.0 in setup.py (unreviewed)

* Move to INFO and add console to the root handler (#368)

Fixes Sentry reporting of views exception.

* Hotfix: do not show recommendations tab on anonymous profiles (#369)

* Add i18n in English and Japanese (#356)

Initial i18n of "about us" pages.

* Do not show alternative titles on work detail page (#371)

* Add AniDB to settings template (#372)

* Release the 0.2 (#361)

* s/jp/ja is the good way (#380)

* Forgotten s/jp/ja in base.html template (unreviewed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0