8000 New indexes for states and recording_runs tables by m00dawg · Pull Request #6688 · home-assistant/core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

New indexes for states and recording_runs tables #6688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 24, 2017

Conversation

m00dawg
Copy link
Contributor
@m00dawg m00dawg commented Mar 19, 2017

Description:

Related issue (if applicable): fixes #6460

Checklist:

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • Local tests with tox run successfully. Your PR cannot be merged unless tests pass

If the code does not interact with devices:

  • Local tests with tox run successfully. Your PR cannot be merged unless tests pass

@mention-bot
Copy link

@m00dawg, thanks for your PR! By analyzing the history of the files in this pull request, we identified @kellerza, @balloob and @rhooper to be potential reviewers.

@m00dawg
Copy link
Contributor Author
m00dawg commented Mar 19, 2017

Includes the indexes for states but does not include the recorder_runs. Having trouble trying to make a compound index as a migration for that.

index.create(engine)
# Create indexes for states
create_index("states", "last_updated")
create_index("states", "created")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to get a compound index in here for recorder_runs. It's not a huge table (at least in my own DB) relative to states, but was causing a non-indexed table-scan. I really wanted to convert the create_index function to handle single or multiple indexes, but given how the Index constructor works, I was having a heck of a time. In either case, not sure how pruned indexes (e.g. ALTER TABLE ... ADD INDEX ix_string (string(8))) works but I believe it's possible.

I mention that because some of the types within tables ideally should be an ENUM for performance, but from the standpoint of flexibility are varchars. A reasonable compromise for that is to use a small pruned index which saves space but still gives decent cardinality. That's outside the scope of this PR, but sort of explaining my thought process here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing create_index does that doesn't apply to a multi-column index is it generates the name of the index based on the table and column name. If you separate the logic out below the name = line, you should be able to use it for a multi-column index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to figure it out, but you're right that does work. I'll be submitting an update to the PR shortly.

A new function was created because it makes it a little cleaner when creating
a single-field index since one doesn't have to create a list. This is mostly
when creating the name of the index so with a bit more logic it's possible
to combine it into one function. Given how often migration changes are run,
I thought that code bloat was probably a worthy trade-off for now.
@m00dawg m00dawg changed the title New indexes for states table New indexes for states and recording_runs tables Mar 19, 2017
def create_index(table_name, column_name):
"""Create an index for the specified table and column."""
table = Table(table_name, models.Base.metadata)
name = "_".join(("ix", table_name, column_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this to combine these into one function? The pattern is usually called *args if you need to google it. You can replace the debug messages with the one you wrote for compound index so the individual column names don't need to be used.

def create_index(table_name, *column_names):
   name = "_".join("_".join("ix", table_name), column_names)
    ...

Then it could be used for both index types like this

create_index("recorder_runs", "start", "end")
create_index("states", "last_updated")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index = next(...) bits are a bit new to me. Does that effectively mean that any arguments passed from the function end up getting passed to an Index constructor? Asking because if we can only specify columns, we may still have an issue if we wanted to, for instance, create a unique index where we would have to pass "Unique=true". If it's just blindly passing all the arguments though, that shouldn't be a problem? If it's only column names, that would be a problem (granted that I haven't solved in my code updates to this point yet either way).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens when we load an old database with new models objects, is that the actual Index sqlalchemy metadata objects get initialized with all the parameters we used in models.py, but the index isn't created in the underlying database engine. These indexes are present in the table.indexes list.

next is a python built-in that returns the next item from an iterator. Since we create a new iterator by searching through table.indexes for idx.name == name, there's only going to be one index in that iterator. What that means is that we're getting the metadata object for the index that was created by models.py. Once we have the reference all we have to do is call index.create.

index.create(engine)
_LOGGER.debug("Index creation done for table %s column %s",
table_name, column_name)
def create_index(table_name, column_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these methods out of the _apply_update method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd defer to @armills on that one since my PR uses mostly what was already there. I would imagine these functions would only be useful during a migration unless things were getting very complex (say by creating a temporary table and creating an index on that for later use) since, otherwise, the real places index definitions live is in the models?

The HA approach to migrations is pretty new to me so I'm still wrapping my head around it, to be fair.

def create_compound_index(table_name, column_names):
"""Create an index for the specified table and columns."""
table = Table(table_name, models.Base.metadata)
index_name = "ix_" + table_name + "_" + "_".join(column_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer string formatting over concatenation.

index_name = "ix_{}_{}".format(table_name, "_".join(…))

Although, actually I prefer @armills approach better. Extract a function:

def _generate_name(*args):
    """Generate a name based on arguments."""
    return "_".join(args)

Copy link
Contributor Author
@m00dawg m00dawg Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment about migration.py I think in order to do that we'd have to split out column names from extra arguments (e.g. unique=True). I generally prefer using something like field1_field2_idx for non-unique, field1_field2_uq for unique and field1_fk for indexes required for foreign keys (I think that may be MySQL dependent though).

Otherwise, the column names could get kinda wonky. The other weird bit I ran into I'm not sure how to solve is the column name had to match what was in models.py. if it didn't, HA would get confused and fail (I'm not sure exactly why though). So I wonder if there was a way to either grab the index name from the models.py or have a function that generated the name in both places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I described the procedure in the comment above, but basically the only thing this function is doing is looking up the index object by name. It isn't creating it or specifying any arguments such as unique that will be used by the index.

The reason we are generating the name here is so that it will match the name generated by sqlalchemy for index=True columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡

Aha ok that helps explain things quite a bit! If it's ok, I'll probably wrap the above explanation around some in-line comments since that might be helpful to future folks trying to figure out how that works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think expanding the documentation never hurts. Obviously we should leave out next since that's a built-in that people can search, but describing how the objects are pre-created would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm in terms of the index naming, would it make more sense to use literals here? If we have to manually create the index name on models.py, it seems like it might be easier to do that here? So something like create_index('recorder_runs', 'ix_recorder_runs_start_end') given what is in models.py is __table_args__ = (Index('ix_recorder_runs_start_end', 'start', 'end'),)

I'm normally not a fan of literals like that, but unless there is a way to auto-generate an index name in models.py, it might keep things a bit less error-prone for folks that don't know about the auto-naming bits going on in migration.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with that myself. We should have one function that takes the table and index name as it's inputs. We can then have a separate function like balloob described that can generate index names for single column indexes, which gets passed into the first function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, we could. Actually I wonder if we can reuse whatever function SQLAlchemy uses for that (I pulled up the docs to figure out how it names its indexes when using Index=True but it just provided the naming convention).

For more clarity, here's what I've done based on the above:

    def create_index(table_name, index_name):
        """Create an index for the specified table.

        The index name should match the name given for the index
        within the table definition described in the models"""
        table = Table(table_name, models.Base.metadata)
        _LOGGER.debug("Looking up index for table %s", table_name)
        # Look up the index object by name that was created in the models
        index = next(idx for idx in table.indexes if idx.name == index_name)
        _LOGGER.debug("Creating %s index", index_name)
        index.create(engine)
        _LOGGER.debug("Finished creating %s", index_name)

    if new_version == 1:
        #create_index("events", "time_fired")
        create_index("events", "ix_events_time_fired")
    elif new_version == 2:
        # Create compound start/end index for recorder_runs
        create_index("recorder_runs", "ix_recorder_runs_start_end")
        # Create indexes for states
        create_index("states", "ix_states_last_updated")
        create_index("states", "ix_states_created")

'domain', 'last_updated', 'entity_id'), )
'domain', 'last_updated', 'entity_id'),
Index('ix_states_entity_id_created',
'entity_id','created'),)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing whitespace after ','

@@ -40,21 +40,28 @@ def _apply_update(engine, new_version):
"""Perform operations to bring schema up to date."""
from sqlalchemy import Table
from . import models

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

@m00dawg
Copy link
Contributor Author
m00dawg commented Mar 21, 2017

Made an indexing change as the way I had it before wasn't working with the latest updates (not sure what changed there). Instead of just an index on created, it seems better for the index to be on entity_id,created due to the sub-select involved in the query that generates the pretty graphs. I wanted to prune the size of entity_id (which is a string currently) but that appears to be more of a MySQL-specific optimization and I didn't want to worry about DB-specific settings here.

@balloob I recall you wanted the create_index function in migration.py moved outside of the _apply_update function, but I could use some guidance on where it should be moved to?

@emlove
Copy link
Contributor
emlove commented Mar 21, 2017

It probably makes sense to move it up to the module level. You can define it just before _apply_update. You'll want to prefix it with an underscore, and you'll have to add an argument for engine.

Copy link
Contributor
@emlove emlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small wrapping change, but I think this looks good now.

_create_index(engine, "events", "ix_events_time_fired")
elif new_version == 2:
# Create compound start/end index for recorder_runs
_create_index(engine, "recorder_runs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this one fits on one line now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it just barely fits, good eagle eyes! made the change and push the update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it probably needed to be wrapped in a previous rev.

I don't know whether I should be happy or sad that I'm so used to the linter rules that it popped out at me. 😆

@emlove
Copy link
Contributor
emlove commented Mar 23, 2017

I also think we should wait until after the 0.41 branch is cut. We should give this one some time to sit in dev since the database will be altered.

@balloob balloob merged commit 5dfdb9e into home-assistant:dev Mar 24, 2017
@balloob
Copy link
Member
balloob commented Mar 24, 2017

Added label breaking change so that I am sure to call it out in the release notes as the migration might take some time.

@m00dawg
Copy link
Contributor Author
m00dawg commented Mar 24, 2017

Ah good call. In my test, my rather huge DB (having months worth of history) did take quite some time to do. Might be good to suggest folks look at the log file when they upgrade and restart (or consider purging the data if they don't want it for history).

@fabaff fabaff mentioned this pull request Apr 6, 2017
@home-assistant home-assistant locked and limited conversation to collaborators Jun 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing index for states table for last_updated
7 participants
0