8000 Querying · Issue #10 · jkendall327/octans · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Querying #10
Open
Open
@jkendall327

Description

@jkendall327

System design

image

  • Can add arbitrary number of predicates to a search
  • Predicates can be one of several predefined system predicates (filesize, rating etc.) or tag predicates
  • Tag searches can be exact or wildcards
  • An exact search contains both a namespace and a subtag (or an empty namespace for bare subtags)
  • namespace:* means 'get all hashes which have at least one tag with this namespace'
  • A wildcard is indicated by an *
  • A wildcard can appear in either the namespace or subtag (cha*:samu* -> character: samus)
  • A tag can have multiple wildcards: character:smu -> character: samus
  • A tag query can be a NOT, e.g. '!character:samus aran'. these can also be wildcarded
  • A tag query can contain subnested OR queries
  • Hydrus only lets you perform queries that the autocomplete has already done for you. To ensure they're wellformed?

Open questions

  • How do we represent predicates as types?
  • Are there common properties shared by all predicates, incl. tags and system predicates?
  • How do we expand out wildcards?
  • How do we merge predicates?

Plan of attack

  1. Wait for user to stop typing (500ms delay or whatever).
  2. Get the raw text in whatever textbox.
  3. Use that to generate the autocomplete dropdown options.
  4. Decide if the user is allowed to execute their bare query or if they have to pick one of the autocompletes.
  5. Either way, take the result and add it as a predicate to whatever predicates are currently in the search context.
  6. Execute the search with its predicates against the database.

Getting the autocomplete options

  1. Get the raw string.
  2. Does the text include a colon? If yes, it's a namespace query. Otherwise it's just a subtag query.

Subtag queries

  1. Append an implicit wildcard to the start and end of the tag ('footw' -> 'footw').
  2. Get all tags which match the query text, wildcards included, in their subtag or their namespace.
  3. If the raw query includes an explicit wildcard, add a suggestion for that raw query as a wildcard search.
  4. Display all the suggestions.

Namespace queries

It's a namespace query if it contains a colon.

If it ends in a colon, or a colon with one or more explicit wildcards, provide a suggestion for a namespace wildcard search ('character:anything').

Turning raw search strings into tag IDs

    def GetAutocompleteTagIds( self, tag_display_type: int, leaf: ClientDBServices.FileSearchContextLeaf, search_text, exact_match, job_status = None ):
        
        if search_text == '':
            
            return set()
            
        
        ( namespace, half_complete_searchable_subtag ) = HydrusTags.SplitTag( search_text )
        
        if half_complete_searchable_subtag == '':
            
            return set()
            
        
        if exact_match:
            
            if '*' in namespace or '*' in half_complete_searchable_subtag:
                
                return []
                
            
        
        if '*' in namespace:
            
            namespace_ids = self.GetNamespaceIdsFromWildcard( namespace )
            
        else:
            
            if not self.modules_tags.NamespaceExists( namespace ):
                
                return set()
                
            
            namespace_ids = ( self.modules_tags.GetNamespaceId( namespace ), )
            
        
        if half_complete_searchable_subtag == '*':
            
            if namespace == '':
                
                # hellmode 'get all tags' search
                
                tag_ids = self.GetAllTagIds( leaf, job_status = job_status )
                
            else:
                
                tag_ids = self.GetTagIdsFromNamespaceIds( leaf, namespace_ids, job_status = job_status )
                
            
        else:
            
            tag_ids = set()
            
            with self._MakeTemporaryIntegerTable( [], 'subtag_id' ) as temp_subtag_ids_table_name:
                
                self.GetSubtagIdsFromWildcardIntoTable( leaf.file_service_id, leaf.tag_service_id, half_complete_searchable_subtag, temp_subtag_ids_table_name, job_status = job_status )
                
                if namespace == '':
                    
                    loop_of_tag_ids = self.GetTagIdsFromSubtagIdsTable( leaf.file_service_id, leaf.tag_service_id, temp_subtag_ids_table_name, job_status = job_status )
                    
                else:
                    
                    with self._MakeTemporaryIntegerTable( namespace_ids, 'namespace_id' ) as temp_namespace_ids_table_name:
                        
                        loop_of_tag_ids = self.GetTagIdsFromNamespaceIdsSubtagIdsTables( leaf.file_service_id, leaf.tag_service_id, temp_namespace_ids_table_name, temp_subtag_ids_table_name, job_status = job_status )
                        
                    
                
                tag_ids.update( loop_of_tag_ids )
                        
        return tag_ids

Initialising tag predicates

predicate = ClientSearch.Predicate( ClientSearch.PREDICATE_TYPE_TAG, value = tag, inclusive = inclusive)

Converting predicates

ClientSearch.py/_InitialiseTemporaryVariables:

def _InitialiseTemporaryVariables( self ):
        
        system_predicates = [ predicate for predicate in self._predicates if predicate.GetType() in SYSTEM_PREDICATE_TYPES ]
        
        self._system_predicates = FileSystemPredicates( system_predicates )
        
        tag_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_TAG ]
        
        self._tags_to_include = []
        self._tags_to_exclude = []
        
        for predicate in tag_predicates:
            
            tag = predicate.GetValue()
            
            if predicate.GetInclusive(): self._tags_to_include.append( tag )
            else: self._tags_to_exclude.append( tag )
            
        
        namespace_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_NAMESPACE ]
        
        self._namespaces_to_include = []
        self._namespaces_to_exclude = []
        
        for predicate in namespace_predicates:
            
            namespace = predicate.GetValue()
            
            if predicate.GetInclusive(): self._namespaces_to_include.append( namespace )
            else: self._namespaces_to_exclude.append( namespace )
            
        
        wildcard_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_WILDCARD ]
        
        self._wildcards_to_include = []
        self._wildcards_to_exclude = []
        
        for predicate in wildcard_predicates:
            
            # this is an important convert. preds store nice looking text, but convert for the actual search
            wildcard = ConvertTagToSearchable( predicate.GetValue() )
            
            if predicate.GetInclusive(): self._wildcards_to_include.append( wildcard )
            else: self._wildcards_to_exclude.append( wildcard )
            
        
        self._or_predicates = [ predicate for predicate in self._predicates if predicate.GetType() == PREDICATE_TYPE_OR_CONTAINER ]

Converting wildcard predicates to searchables:

def ConvertSubtagToSearchable( subtag ):

if subtag == '':
    
    return ''
    

subtag = CollapseWildcardCharacters( subtag )

subtag = subtag.translate( IGNORED_TAG_SEARCH_CHARACTERS_UNICODE_TRANSLATE )

subtag = HydrusText.re_one_or_more_whitespace.sub( ' ', subtag )

subtag = subtag.strip()

return subtag

def CollapseWildcardCharacters( text ):

while '**' in text:
    
    text = text.replace( '**', '*' )
    
return text

The actual database stuff

  • Assuming we already have the tag IDs we care about (ClientDBFilesSearch.py/GetHashIdsFromTagIds)
  • If there's only one tag we're searching, just do a ''SELECT hash_id FROM {} WHERE tag_id = ?;'
  • Otherwise it does standard ''SELECT hash_id FROM {} WHERE EXISTS ( SELECT 1 FROM {} WHERE {}.hash_id = {}.hash_id AND EXISTS ( SELECT 1 FROM {} WHERE {}.tag_id = {}.tag_id ) );'' stuff?

ClientDBFilesSearch.py/GetHashIdsFromQuery() seems like a real important method.
Line 1594 for where it starts dealing with tags.
It... just does a query for every tag in a foreach?!
Need to analyse this in detail.
Highly illuminating, can do a somewhat brutish translation to C# as a first attempt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0