Initial support for concatenated archives, and memory fixes #88

jhol · 2014-03-31T16:24:26Z

This branch does the following...

Memory management fixes

Fixes some memory management issues. Currently cereal has a mixture of strings handled with std::string and strings handled with char*. The use of raw-pointers leads to some counter-intuitive behaviour. For example:

if (foo) {
  const std::string s("test");
  a.setNextName(s.c_str());
}

Typically the caller would expect this to work, because setNextName would take a copy of the string - but it doesn't. Therefore, in this case there will be a memory error after s goes out of scope. The first 5 patches address this issue by replacing some of cereal's raw pointers with std::strings.

Support for concatenated archives

The JSON and XML input archives currently read in the whole of an input stream until they hit the end of the file. In some cases this is undesirable. For example a file may contain more data after an archive, or it may be received via some kind of socket stream, or via a UNIX pipe. In these cases it is incorrect to read to the end of the file. The correct behavior is to read to the end of the XML or JSON document.

This branch addresses this issue by adding a test case to show this issue. It writes two archives into a single stream, then checks that they are both read out correctly.

Following the test case, I have modified XMLInputArchive to add support.

Currently if two XML archives are written to an output stream the following output will be generated:

<?xml version="1.0" encoding="utf-8"?>
<cereal>
        <value0>Apples

Pears</value0>
</cereal>

<?xml version="1.0" encoding="utf-8"?>
<cereal>
        <value0>Oranges

Lemons</value0>
</cereal>

Here we are outputting one archive with the string Apples\n\nPears and another with the string Oranges\n\nLemons. cereal's XML output will insert an empty new-line after each XML document.

Currently RapidXML does not provide a mechanism by which it can read only up to the end of the document, so my modification reads until it finds an empty line, then attempts to parse the accumulated XML. If the parse fails, this indicates that we have hit an empty line inside the XML document and the we should continue reading. The drawback of this approach is that the document must be re-parsed every time the input stream emits a blank line, which might become problematic in an XML document containing many empty lines.

There is still some work remaining to fix JSONInputArchive similarly to pass the test. It may be possible to do this via RapidJSON's internal stream support, or by counting the the number of { and } braces, until the tally returns to zero.

This tests the case where multiple archives are concatenated together in a single stream. For example this can happen if multiple archives are being streamed from a pipe. The JSON and XML variants of the test are disabled, because they don't pass the test.

AzothAmmo · 2014-04-01T06:58:57Z

We might be moving away from both RapidXML and RapidJSON in the next release (see #82 and #83), which would affect both changes you've proposed here, though I'm not entirely sure how significant the impact would be.

Can you describe the situation in which you ran into issues using the current setNextName implementation?

jhol · 2014-04-01T08:52:07Z

The setNextName issue we encountered, occurs when we're outputting to XML, and you create a temporary C-string (e.g. something from std::string::c_str()), that gets deleted before XMLOuputArchive::NodeInfo::getValueName gets executed. At this point, the name string has been de-allocated by the application, but the ghost data may still be hanging around in memory - which is why this issue is mostly invisible, unless you run the application inside efence or valgrind. Also, none of the test suites call this function right now.

The resolution is to copy the string inside setNextName, which is the is a more conventional pattern IMO. If setNextName isn't going to copy the string, there should be a big warning in the documentation that the user should keep the string around.

With that change in place for XMLOutputArchive, I applied a similar modification to XMLInputArchive, JSONInputArchive and JSONOutputArchive. Perhaps the change does not need to be applied to the input archives - in which case those patches can be excluded.

This branch is kind-of like a buffet - take from it what you want.

With respect to the concatenation patches. I would be keen to see the test case included at the very least. The patch to XMLInputArchive are also quite low impact, and not unreasonable in the interim especially if RapidXML is going to be replaced soon.

Finally, 179d839 should be applied straight away. This is a simple fix for a memory bug caused by a temporary string going out of scope.

AzothAmmo · 2014-04-01T18:20:49Z

I merged 179d839 into develop.

jhol · 2014-04-03T12:37:29Z

@AzothAmmo thanks for pulling that commit. - though what should we do about the others?

Is there somewhere regular contributors meet? the mailing list or an IRC channel so that we can discuss these things?

AzothAmmo · 2014-04-03T18:10:55Z

We recently made a mailing list: https://groups.google.com/forum/#!forum/cerealcpp. It hasn't been used much yet.

As far as the other issues, I think they should be investigated in tandem with migrating to PugiXML, especially your desire to be able to have more than one archive in a given stream. I think the migration itself will likely not be a large amount of work.

jhol added 9 commits March 31, 2014 16:53

XMLOutputArchive: Do not take c_str from a temporary string

179d839

Made XMLOutputArchive::NodeInfo::name a std::string

5ef2e50

Made XMLInputArchive::NodeInfo::name a std::string

a5d52c7

Made JSONOutputArchive::itsNextName a std::string

3000a17

Made JSONInputArchive::itsNextName a std: 8000 :string

b7f8480

XMLInputArchive: Removed commented code

c8b5ee5

JSONInputArchive: Corrected a typo

7853bea

XMLInputArchive: Support concatenated documents

0c7d5ad

AzothAmmo added the discussion label Apr 1, 2014

AzothAmmo referenced this pull request Apr 1, 2014

adding size() to speed up copy

f6d01d5

AzothAmmo added this to the v2.0.0 milestone May 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial support for concatenated archives, and memory fixes #88

Initial support for concatenated archives, and memory fixes #88

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Initial support for concatenated archives, and memory fixes #88

Are you sure you want to change the base?

Initial support for concatenated archives, and memory fixes #88

Uh oh!

Conversation

Memory management fixes

Support for concatenated archives

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!