Querying

System design

Can add arbitrary number of predicates to a search
Predicates can be one of several predefined system predicates (filesize, rating etc.) or tag predicates
Tag searches can be exact or wildcards
An exact search contains both a namespace and a subtag (or an empty namespace for bare subtags)
namespace:* means 'get all hashes which have at least one tag with this namespace'
A wildcard is indicated by an *
A wildcard can appear in either the namespace or subtag (cha*:samu* -> character: samus)
A tag can have multiple wildcards: character:smu -> character: samus
A tag query can be a NOT, e.g. '!character:samus aran'. these can also be wildcarded
A tag query can contain subnested OR queries
Hydrus only lets you perform queries that the autocomplete has already done for you. To ensure they're wellformed?

Open questions

How do we represent predicates as types?
Are there common properties shared by all predicates, incl. tags and system predicates?
How do we expand out wildcards?
How do we merge predicates?

Plan of attack

Wait for user to stop typing (500ms delay or whatever).
Get the raw text in whatever textbox.
Use that to generate the autocomplete dropdown options.
Decide if the user is allowed to execute their bare query or if they have to pick one of the autocompletes.
Either way, take the result and add it as a predicate to whatever predicates are currently in the search context.
Execute the search with its predicates against the database.

Getting the autocomplete options

Get the raw string.
Does the text include a colon? If yes, it's a namespace query. Otherwise it's just a subtag query.

Subtag queries

Append an implicit wildcard to the start and end of the tag ('footw' -> 'footw').
Get all tags which match the query text, wildcards included, in their subtag or their namespace.
If the raw query includes an explicit wildcard, add a suggestion for that raw query as a wildcard search.
Display all the suggestions.

Namespace queries

It's a namespace query if it contains a colon.

If it ends in a colon, or a colon with one or more explicit wildcards, provide a suggestion for a namespace wildcard search ('character:anything').

Turning raw search strings into tag IDs

    def GetAutocompleteTagIds( self, tag_display_type: int, leaf: ClientDBServices.FileSearchContextLeaf, search_text, exact_match, job_status = None ):
        
        if search_text == '':
            
            return set()
            
        
        ( namespace, half_complete_searchable_subtag ) = HydrusTags.SplitTag( search_text )
        
        if half_complete_searchable_subtag == '':
            
            return set()
            
        
        if exact_match:
            
            if '*' in namespace or '*' in half_complete_searchable_subtag:
                
                return []
                
            
        
        if '*' in namespace:
            
            namespace_ids = self.GetNamespaceIdsFromWildcard( namespace )
            
        else:
            
            if not self.modules_tags.NamespaceExists( namespace ):
                
                return set()
                
            
            namespace_ids = ( self.modules_tags.GetNamespaceId( namespace ), )
            
        
        if half_complete_searchable_subtag == '*':
            
            if namespace == '':
                
                # hellmode 'get all tags' search
                
                tag_ids = self.GetAllTagIds( leaf, job_status = job_status )
                
            else:
                
                tag_ids = self.GetTagIdsFromNamespaceIds( leaf, namespace_ids, job_status = job_status )
                
            
        else:
            
            tag_ids = set()
            
            with self._MakeTemporaryIntegerTable( [], 'subtag_id' ) as temp_subtag_ids_table_name:
                
                self.GetSubtagIdsFromWildcardIntoTable( leaf.file_service_id, leaf.tag_service_id, half_complete_searchable_subtag, temp_subtag_ids_table_name, job_status = job_status )
                
                if namespace == '':
                    
                    loop_of_tag_ids = self.GetTagIdsFromSubtagIdsTable( leaf.file_service_id, leaf.tag_service_id, temp_subtag_ids_table_name, job_status = job_status )
                    
                else:
                    
                    with self._MakeTemporaryIntegerTable( namespace_ids, 'namespace_id' ) as temp_namespace_ids_table_name:
                        
                        loop_of_tag_ids = self.GetTagIdsFromNamespaceIdsSubtagIdsTables( leaf.file_service_id, leaf.tag_service_id, temp_namespace_ids_table_name, temp_subtag_ids_table_name, job_status = job_status )
                        
                    
                
                tag_ids.update( loop_of_tag_ids )
                        
        return tag_ids

Initialising tag predicates

predicate = ClientSearch.Predicate( ClientSearch.PREDICATE_TYPE_TAG, value = tag, inclusive = inclusive)

Converting predicates

ClientSearch.py/_InitialiseTemporaryVariables:

def _InitialiseTemporaryVariables( self ):
        
        system_predicates = [ predicate for predicate in self._predicates if predicate.GetType() in SYSTEM_PREDICATE_TYPES ]
        
        self._system_predicates = FileSystemPredicates( system_predicates )
        
        tag_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_TAG ]
        
        self._tags_to_include = []
        self._tags_to_exclude = []
        
        for predicate in tag_predicates:
            
            tag = predicate.GetValue()
            
            if predicate.GetInclusive(): self._tags_to_include.append( tag )
            else: self._tags_to_exclude.append( tag )
            
        
        namespace_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_NAMESPACE ]
        
        self._namespaces_to_include = []
        self._namespaces_to_exclude = []
        
        for predicate in namespace_predicates:
            
            namespace = predicate.GetValue()
            
            if predicate.GetInclusive(): self._namespaces_to_include.append( namespace )
            else: self._namespaces_to_exclude.append( namespace )
            
        
        wildcard_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_WILDCARD ]
        
        self._wildcards_to_include = []
        self._wildcards_to_exclude = []
        
        for predicate in wildcard_predicates:
            
            # this is an important convert. preds store nice looking text, but convert for the actual search
            wildcard = ConvertTagToSearchable( predicate.GetValue() )
            
            if predicate.GetInclusive(): self._wildcards_to_include.append( wildcard )
            else: self._wildcards_to_exclude.append( wildcard )
            
        
        self._or_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_OR_CONTAINER ]

Converting wildcard predicates to searchables:

def ConvertSubtagToSearchable( subtag ):

if subtag == '':
    
    return ''
    

subtag = CollapseWildcardCharacters( subtag )

subtag = subtag.translate( IGNORED_TAG_SEARCH_CHARACTERS_UNICODE_TRANSLATE )

subtag = HydrusText.re_one_or_more_whitespace.sub( ' ', subtag )

subtag = subtag.strip()

return subtag

def CollapseWildcardCharacters( text ):

while '**' in text:
    
    text = text.replace( '**', '*' )
    
return text

The actual database stuff

Assuming we already have the tag IDs we care about (ClientDBFilesSearch.py/GetHashIdsFromTagIds)
If there's only one tag we're searching, just do a ''SELECT hash_id FROM {} WHERE tag_id = ?;'
Otherwise it does standard ''SELECT hash_id FROM {} WHERE EXISTS ( SELECT 1 FROM {} WHERE {}.hash_id = {}.hash_id AND EXISTS ( SELECT 1 FROM {} WHERE {}.tag_id = {}.tag_id ) );'' stuff?

ClientDBFilesSearch.py/GetHashIdsFromQuery() seems like a real important method.
Line 1594 for where it starts dealing with tags.
It... just does a query for every tag in a foreach?!
Need to analyse this in detail.
Highly illuminating, can do a somewhat brutish translation to C# as a first attempt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

System design

Open questions

Plan of attack

Getting the autocomplete options

Subtag queries

Namespace queries

Turning raw search strings into tag IDs

Initialising tag predicates

Converting predicates

The actual database stuff

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

System design

Open questions

Plan of attack

Getting the autocomplete options

Subtag queries

Namespace queries

Turning raw search strings into tag IDs

Initialising tag predicates

Converting predicates

The actual database stuff

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions