Description
System design
- Can add arbitrary number of predicates to a search
- Predicates can be one of several predefined system predicates (filesize, rating etc.) or tag predicates
- Tag searches can be exact or wildcards
- An exact search contains both a namespace and a subtag (or an empty namespace for bare subtags)
- namespace:* means 'get all hashes which have at least one tag with this namespace'
- A wildcard is indicated by an *
- A wildcard can appear in either the namespace or subtag (cha*:samu* -> character: samus)
- A tag can have multiple wildcards: character:smu -> character: samus
- A tag query can be a NOT, e.g. '!character:samus aran'. these can also be wildcarded
- A tag query can contain subnested OR queries
- Hydrus only lets you perform queries that the autocomplete has already done for you. To ensure they're wellformed?
Open questions
- How do we represent predicates as types?
- Are there common properties shared by all predicates, incl. tags and system predicates?
- How do we expand out wildcards?
- How do we merge predicates?
Plan of attack
- Wait for user to stop typing (500ms delay or whatever).
- Get the raw text in whatever textbox.
- Use that to generate the autocomplete dropdown options.
- Decide if the user is allowed to execute their bare query or if they have to pick one of the autocompletes.
- Either way, take the result and add it as a predicate to whatever predicates are currently in the search context.
- Execute the search with its predicates against the database.
Getting the autocomplete options
- Get the raw string.
- Does the text include a colon? If yes, it's a namespace query. Otherwise it's just a subtag query.
Subtag queries
- Append an implicit wildcard to the start and end of the tag ('footw' -> 'footw').
- Get all tags which match the query text, wildcards included, in their subtag or their namespace.
- If the raw query includes an explicit wildcard, add a suggestion for that raw query as a wildcard search.
- Display all the suggestions.
Namespace queries
It's a namespace query if it contains a colon.
If it ends in a colon, or a colon with one or more explicit wildcards, provide a suggestion for a namespace wildcard search ('character:anything').
Turning raw search strings into tag IDs
def GetAutocompleteTagIds( self, tag_display_type: int, leaf: ClientDBServices.FileSearchContextLeaf, search_text, exact_match, job_status = None ):
if search_text == '':
return set()
( namespace, half_complete_searchable_subtag ) = HydrusTags.SplitTag( search_text )
if half_complete_searchable_subtag == '':
return set()
if exact_match:
if '*' in namespace or '*' in half_complete_searchable_subtag:
return []
if '*' in namespace:
namespace_ids = self.GetNamespaceIdsFromWildcard( namespace )
else:
if not self.modules_tags.NamespaceExists( namespace ):
return set()
namespace_ids = ( self.modules_tags.GetNamespaceId( namespace ), )
if half_complete_searchable_subtag == '*':
if namespace == '':
# hellmode 'get all tags' search
tag_ids = self.GetAllTagIds( leaf, job_status = job_status )
else:
tag_ids = self.GetTagIdsFromNamespaceIds( leaf, namespace_ids, job_status = job_status )
else:
tag_ids = set()
with self._MakeTemporaryIntegerTable( [], 'subtag_id' ) as temp_subtag_ids_table_name:
self.GetSubtagIdsFromWildcardIntoTable( leaf.file_service_id, leaf.tag_service_id, half_complete_searchable_subtag, temp_subtag_ids_table_name, job_status = job_status )
if namespace == '':
loop_of_tag_ids = self.GetTagIdsFromSubtagIdsTable( leaf.file_service_id, leaf.tag_service_id, temp_subtag_ids_table_name, job_status = job_status )
else:
with self._MakeTemporaryIntegerTable( namespace_ids, 'namespace_id' ) as temp_namespace_ids_table_name:
loop_of_tag_ids = self.GetTagIdsFromNamespaceIdsSubtagIdsTables( leaf.file_service_id, leaf.tag_service_id, temp_namespace_ids_table_name, temp_subtag_ids_table_name, job_status = job_status )
tag_ids.update( loop_of_tag_ids )
return tag_ids
Initialising tag predicates
predicate = ClientSearch.Predicate( ClientSearch.PREDICATE_TYPE_TAG, value = tag, inclusive = inclusive)
Converting predicates
ClientSearch.py/_InitialiseTemporaryVariables:
def _InitialiseTemporaryVariables( self ):
system_predicates = [ predicate for predicate in self._predicates if predicate.GetType() in SYSTEM_PREDICATE_TYPES ]
self._system_predicates = FileSystemPredicates( system_predicates )
tag_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_TAG ]
self._tags_to_include = []
self._tags_to_exclude = []
for predicate in tag_predicates:
tag = predicate.GetValue()
if predicate.GetInclusive(): self._tags_to_include.append( tag )
else: self._tags_to_exclude.append( tag )
namespace_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_NAMESPACE ]
self._namespaces_to_include = []
self._namespaces_to_exclude = []
for predicate in namespace_predicates:
namespace = predicate.GetValue()
if predicate.GetInclusive(): self._namespaces_to_include.append( namespace )
else: self._namespaces_to_exclude.append( namespace )
wildcard_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_WILDCARD ]
self._wildcards_to_include = []
self._wildcards_to_exclude = []
for predicate in wildcard_predicates:
# this is an important convert. preds store nice looking text, but convert for the actual search
wildcard = ConvertTagToSearchable( predicate.GetValue() )
if predicate.GetInclusive(): self._wildcards_to_include.append( wildcard )
else: self._wildcards_to_exclude.append( wildcard )
self._or_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_OR_CONTAINER ]
Converting wildcard predicates to searchables:
def ConvertSubtagToSearchable( subtag ):
if subtag == '':
return ''
subtag = CollapseWildcardCharacters( subtag )
subtag = subtag.translate( IGNORED_TAG_SEARCH_CHARACTERS_UNICODE_TRANSLATE )
subtag = HydrusText.re_one_or_more_whitespace.sub( ' ', subtag )
subtag = subtag.strip()
return subtag
def CollapseWildcardCharacters( text ):
while '**' in text:
text = text.replace( '**', '*' )
return text
The actual database stuff
- Assuming we already have the tag IDs we care about (ClientDBFilesSearch.py/GetHashIdsFromTagIds)
- If there's only one tag we're searching, just do a ''SELECT hash_id FROM {} WHERE tag_id = ?;'
- Otherwise it does standard ''SELECT hash_id FROM {} WHERE EXISTS ( SELECT 1 FROM {} WHERE {}.hash_id = {}.hash_id AND EXISTS ( SELECT 1 FROM {} WHERE {}.tag_id = {}.tag_id ) );'' stuff?
ClientDBFilesSearch.py/GetHashIdsFromQuery() seems like a real important method.
Line 1594 for where it starts dealing with tags.
It... just does a query for every tag in a foreach?!
Need to analyse this in detail.
Highly illuminating, can do a somewhat brutish translation to C# as a first attempt.