New indexes for states and recording_runs tables #6688

m00dawg · 2017-03-19T02:39:39Z

Description:

Related issue (if applicable): fixes #6460

Checklist:

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated in home-assistant.github.io

If the code communicates with devices, web services, or third-party tools:

Local tests with tox run successfully. Your PR cannot be merged unless tests pass

If the code does not interact with devices:

Local tests with tox run successfully. Your PR cannot be merged unless tests pass

mention-bot · 2017-03-19T02:39:40Z

@m00dawg, thanks for your PR! By analyzing the history of the files in this pull request, we identified @kellerza, @balloob and @rhooper to be potential reviewers.

m00dawg · 2017-03-19T02:41:23Z

Includes the indexes for states but does not include the recorder_runs. Having trouble trying to make a compound index as a migration for that.

m00dawg · 2017-03-19T04:11:06Z

homeassistant/components/recorder/migration.py

+        index.create(engine)
+        # Create indexes for states
+        create_index("states", "last_updated")
+        create_index("states", "created")


I managed to get a compound index in here for recorder_runs. It's not a huge table (at least in my own DB) relative to states, but was causing a non-indexed table-scan. I really wanted to convert the create_index function to handle single or multiple indexes, but given how the Index constructor works, I was having a heck of a time. In either case, not sure how pruned indexes (e.g. ALTER TABLE ... ADD INDEX ix_string (string(8))) works but I believe it's possible.

I mention that because some of the types within tables ideally should be an ENUM for performance, but from the standpoint of flexibility are varchars. A reasonable compromise for that is to use a small pruned index which saves space but still gives decent cardinality. That's outside the scope of this PR, but sort of explaining my thought process here.

The only thing create_index does that doesn't apply to a multi-column index is it generates the name of the index based on the table and column name. If you separate the logic out below the name = line, you should be able to use it for a multi-column index.

Took me a while to figure it out, but you're right that does work. I'll be submitting an update to the PR shortly.

A new function was created because it makes it a little cleaner when creating a single-field index since one doesn't have to create a list. This is mostly when creating the name of the index so with a bit more logic it's possible to combine it into one function. Given how often migration changes are run, I thought that code bloat was probably a worthy trade-off for now.

…into dev

emlove · 2017-03-19T18:44:00Z

homeassistant/components/recorder/migration.py

+    def create_index(table_name, column_name):
+        """Create an index for the specified table and column."""
+        table = Table(table_name, models.Base.metadata)
+        name = "_".join(("ix", table_name, column_name))


What about something like this to combine these into one function? The pattern is usually called *args if you need to google it. You can replace the debug messages with the one you wrote for compound index so the individual column names don't need to be used.

def create_index(table_name, *column_names): name = "_".join("_".join("ix", table_name), column_names) ...

Then it could be used for both index types like this

create_index("recorder_runs", "start", "end") create_index("states", "last_updated")

The index = next(...) bits are a bit new to me. Does that effectively mean that any arguments passed from the function end up getting passed to an Index constructor? Asking because if we can only specify columns, we may still have an issue if we wanted to, for instance, create a unique index where we would have to pass "Unique=true". If it's just blindly passing all the arguments though, that shouldn't be a problem? If it's only column names, that would be a problem (granted that I haven't solved in my code updates to this point yet either way).

So what happens when we load an old database with new models objects, is that the actual Index sqlalchemy metadata objects get initialized with all the parameters we used in models.py, but the index isn't created in the underlying database engine. These indexes are present in the table.indexes list.

next is a python built-in that returns the next item from an iterator. Since we create a new iterator by searching through table.indexes for idx.name == name, there's only going to be one index in that iterator. What that means is that we're getting the metadata object for the index that was created by models.py. Once we have the reference all we have to do is call index.create.

balloob · 2017-03-19T19:41:35Z

homeassistant/components/recorder/migration.py

-            index.create(engine)
-            _LOGGER.debug("Index creation done for table %s column %s",
-                          table_name, column_name)
+    def create_index(table_name, column_name):


Can you move these methods out of the _apply_update method?

I'd defer to @armills on that one since my PR uses mostly what was already there. I would imagine these functions would only be useful during a migration unless things were getting very complex (say by creating a temporary table and creating an index on that for later use) since, otherwise, the real places index definitions live is in the models?

The HA approach to migrations is pretty new to me so I'm still wrapping my head around it, to be fair.

balloob · 2017-03-19T19:43:45Z

homeassistant/components/recorder/migration.py

+    def create_compound_index(table_name, column_names):
+        """Create an index for the specified table and columns."""
+        table = Table(table_name, models.Base.metadata)
+        index_name = "ix_" + table_name + "_" + "_".join(column_names)


Prefer string formatting over concatenation.

index_name = "ix_{}_{}".format(table_name, "_".join(…))

Although, actually I prefer @armills approach better. Extract a function:

def _generate_name(*args): """Generate a name based on arguments.""" return "_".join(args)

Per my comment about migration.py I think in order to do that we'd have to split out column names from extra arguments (e.g. unique=True). I generally prefer using something like field1_field2_idx for non-unique, field1_field2_uq for unique and field1_fk for indexes required for foreign keys (I think that may be MySQL dependent though).

Otherwise, the column names could get kinda wonky. The other weird bit I ran into I'm not sure how to solve is the column name had to match what was in models.py. if it didn't, HA would get confused and fail (I'm not sure exactly why though). So I wonder if there was a way to either grab the index name from the models.py or have a function that generated the name in both places?

I described the procedure in the comment above, but basically the only thing this function is doing is looking up the index object by name. It isn't creating it or specifying any arguments such as unique that will be used by the index.

The reason we are generating the name here is so that it will match the name generated by sqlalchemy for index=True columns.

💡

Aha ok that helps explain things quite a bit! If it's ok, I'll probably wrap the above explanation around some in-line comments since that might be helpful to future folks trying to figure out how that works.

I think expanding the documentation never hurts. Obviously we should leave out next since that's a built-in that people can search, but describing how the objects are pre-created would be helpful.

Hmm in terms of the index naming, would it make more sense to use literals here? If we have to manually create the index name on models.py, it seems like it might be easier to do that here? So something like create_index('recorder_runs', 'ix_recorder_runs_start_end') given what is in models.py is __table_args__ = (Index('ix_recorder_runs_start_end', 'start', 'end'),)

I'm normally not a fan of literals like that, but unless there is a way to auto-generate an index name in models.py, it might keep things a bit less error-prone for folks that don't know about the auto-naming bits going on in migration.py?

I'm OK with that myself. We should have one function that takes the table and index name as it's inputs. We can then have a separate function like balloob described that can generate index names for single column indexes, which gets passed into the first function.

Ah good point, we could. Actually I wonder if we can reuse whatever function SQLAlchemy uses for that (I pulled up the docs to figure out how it names its indexes when using Index=True but it just provided the naming convention).

For more clarity, here's what I've done based on the above:

def create_index(table_name, index_name): """Create an index for the specified table. The index name should match the name given for the index within the table definition described in the models""" table = Table(table_name, models.Base.metadata) _LOGGER.debug("Looking up index for table %s", table_name) # Look up the index object by name that was created in the models index = next(idx for idx in table.indexes if idx.name == index_name) _LOGGER.debug("Creating %s index", index_name) index.create(engine) _LOGGER.debug("Finished creating %s", index_name) if new_version == 1: #create_index("events", "time_fired") create_index("events", "ix_events_time_fired") elif new_version == 2: # Create compound start/end index for recorder_runs create_index("recorder_runs", "ix_recorder_runs_start_end") # Create indexes for states create_index("states", "ix_states_last_updated") create_index("states", "ix_states_created")

…into dev

houndci-bot · 2017-03-21T13:29:48Z

homeassistant/components/recorder/models.py

-                            'domain', 'last_updated', 'entity_id'), )
+                            'domain', 'last_updated', 'entity_id'),
+                      Index('ix_states_entity_id_created',
+                            'entity_id','created'),)


missing whitespace after ','

houndci-bot · 2017-03-21T13:29:49Z

homeassistant/components/recorder/migration.py

@@ -40,21 +40,28 @@ def _apply_update(engine, new_version):
    """Perform operations to bring schema up to date."""
    from sqlalchemy import Table
    from . import models
+


blank line contains whitespace

m00dawg · 2017-03-21T13:53:42Z

Made an indexing change as the way I had it before wasn't working with the latest updates (not sure what changed there). Instead of just an index on created, it seems better for the index to be on entity_id,created due to the sub-select involved in the query that generates the pretty graphs. I wanted to prune the size of entity_id (which is a string currently) but that appears to be more of a MySQL-specific optimization and I didn't want to worry about DB-specific settings here.

@balloob I recall you wanted the create_index function in migration.py moved outside of the _apply_update function, but I could use some guidance on where it should be moved to?

emlove · 2017-03-21T15:13:56Z

It probably makes sense to move it up to the module level. You can define it just before _apply_update. You'll want to prefix it with an underscore, and you'll have to add an argument for engine.

emlove

One small wrapping change, but I think this looks good now.

emlove · 2017-03-23T23:40:04Z

homeassistant/components/recorder/migration.py

+        _create_index(engine, "events", "ix_events_time_fired")
+    elif new_version == 2:
+        # Create compound start/end index for recorder_runs
+        _create_index(engine, "recorder_runs",


It looks like this one fits on one line now.

Looks like it just barely fits, good eagle eyes! made the change and push the update.

Yeah, it probably needed to be wrapped in a previous rev.

I don't know whether I should be happy or sad that I'm so used to the linter rules that it popped out at me. 😆

emlove · 2017-03-23T23:43:58Z

I also think we should wait until after the 0.41 branch is cut. We should give this one some time to sit in dev since the database will be altered.

balloob · 2017-03-24T16:05:35Z

Added label breaking change so that I am sure to call it out in the release notes as the migration might take some time.

m00dawg · 2017-03-24T16:24:30Z

Ah good call. In my test, my rather huge DB (having months worth of history) did take quite some time to do. Might be good to suggest folks look at the log file when they upgrade and restart (or consider purging the data if they don't want it for history).

New indexes for states table

894699b

homeassistant added integration: recorder platform: recorder.migration cla-signed labels Mar 19, 2017

Added recorder_runs indexes

f9a0f23

m00dawg commented Mar 19, 2017

View reviewed changes

m00dawg changed the title ~~New indexes for states table~~ New indexes for states and recording_runs tables Mar 19, 2017

Merge branch 'dev' of https://github.com/home-assistant/home-assistant …

7d7c4e4

…into dev

homeassistant added the cla-signed label Mar 19, 2017

balloobbot added the platform label Mar 19, 2017

emlove reviewed Mar 19, 2017

View reviewed changes

balloob requested changes Mar 19, 2017

View reviewed changes

m00dawg added 2 commits March 21, 2017 07:53

Merge branch 'dev' of https://github.com/home-assistant/home-assistant …

38aa348

…into dev

Adjusted indexes, POC for ref indexes by name.

d397ccb

homeassistant added the cla-signed label Mar 21, 2017

houndci-bot reviewed Mar 21, 2017

9E7A View reviewed changes

Corrected lint errors

c8e37e9

Fixed pydocstyle error

b3b1360

homeassistant added the cla-signed label Mar 21, 2017

Moved create_index function outside apply_update

37e3e5d

emlove approved these changes Mar 23, 2017

View reviewed changes

Moved to single line (just barely)

7cd55c9

balloob approved these changes Mar 24, 2017

View reviewed changes

balloob merged commit 5dfdb9e into home-assistant:dev Mar 24, 2017

balloob added the breaking-change label Mar 24, 2017

fabaff mentioned this pull request Apr 6, 2017

0.42 #6956

Merged

home-assistant locked and limited conversation to collaborators Jun 24, 2017

ghost added integration: migration integration: models and removed platform: recorder.migration labels Mar 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New indexes for states and recording_runs tables #6688

New indexes for states and recording_runs tables #6688

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

New indexes for states and recording_runs tables #6688

New indexes for states and recording_runs tables #6688

Uh oh!

Conversation

Description:

Checklist:

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!