8000 Issue when serializing double · Issue #202 · USCiLab/cereal · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Issue when serializing double #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
henrywoo opened this issue Jul 5, 2015 · 7 comments
Open

Issue when serializing double #202

henrywoo opened this issue Jul 5, 2015 · 7 comments

Comments

@henrywoo
Copy link
henrywoo commented Jul 5, 2015

For example x is a double and equal to 3545.45, but after serialization, it become 3545.4500000000007. This makes trouble when I use the serialized data(xml/json) in other places.

"x": 3545.4500000000007

I found this is a common issue for all double. Is there any way to avoid this?

I want the result is exactly like:

"x": 3545.45

Or I can control the format like printf. If I want to three decimals, it is like:

"x": 3545.450

Thanks in advance.

@AzothAmmo
Copy link
Contributor

Currently the only way you can address this in cereal without modifying the JSON code is by adjusting the precision parameter for JSONOutputArchive::Options, though this controls the total number of digits and not just the number of digits used after the decimal point. RapidJSON uses the %#g specifier by default for performing the conversion, where # is the precision.

If you want different behavior you would need to adjust the printf format string double_format in external/rapidjson/writer.h.

For XML output, we use an std::ostringstream to generate the string representation, so you would need to modify that interface in order to get the behavior you want. Again the only thing cereal exposes right now is the precision (number of digits used) for the output.

@m7thon
Copy link
Contributor
m7thon commented Jul 14, 2015

I had the same problem and updated rapidjson to solve the problem. This now includes a new flag rapidjson::kParseFullPrecisionFlag to parse double in full precision. See Tencent/rapidjson#120 for details.

EDIT: I also don't specify the output precision at all (removed this from json.hpp). I think this now defaults to full precision, but actually outputs the number in a clever way, using as many decimal places as required.

@AzothAmmo
Copy link
Contributor

Good to know, another reason for us to get a move on with #82.

@jbakosi
Copy link
jbakosi commented Jul 20, 2015

Can this be a problem using the binary archive? Or that will use a bit-wise identical representation of floats?

@AzothAmmo
Copy link
Contributor

The binary archives serialize the bitwise representation so this is not an issue there.

@m7thon
Copy link
Contributor
m7thon commented Jul 20, 2015

Serializing using a binary archive should save the raw binary representation of the double (base64 encoded), and therefore give a bit-wise identical representation when unserializing.

Saving as a decimal representation involves a conversion from a base-2 representation to a base-10 representation, which is necessarily inexact for some numbers, and therefore involves rounding. The same is true for the conversion back from base-10 to base-2. This can be avoided by using a hexadecimal representation for floats, but this is not yet a standard format.

For serialization purposes, I would argue that one wants the transformation

double -> serialize -> unserialize

to result in exactly the same double. This can be achieved using a decimal representation, but is tricky:

  1. when serializing, the decimal representation needs to be chosen carefully. Either
    1. with sufficiently high precision. This is inefficient, since e.g. the double 1/10 would be represented as 0.10000000000000001 (printf("%.17g", 1.0/10))
    2. or one needs to find the shortest base-10 representation that is closer to the given double than any other double. For the above example, this will be 0.1.
  2. when unserializing, one needs to find the double (base-2 representation) that is closest to the given base-10 representation. Doing this exactly is a surprisingly tricky problem!

Both of these (1.ii and 2) are done correctly in recent versions of rapidjson when using the kParseFullPrecisionFlag flag. This does involve a performance penalty, so it is not the default in rapidjson.

@ajneu
Copy link
ajneu 91F1 commented Oct 15, 2015

Yip, doing a roundtrip (serialize and then deserialize) of double fails:

#include <cereal/archives/json.hpp>
#include <fstream>
#include <iomanip>
#include <cassert>
#include <sstream>

struct MyRecord
{
   double z;                    // here is the double
   template <class Archive>
   void serialize( Archive & ar )
   {
      ar( z );
   }
};

int main()
{
   double arr[] = {2.2};
   for (const auto v : arr) {
      MyRecord rec{v};

      // serialize
      {
         std::ofstream os("out.cereal");
         cereal::JSONOutputArchive archive_out( os );

         archive_out( rec );
      }

      // deserialize
      {
         MyRecord rec2;
         {
            std::ifstream is("out.cereal");
            cereal::JSONInputArchive archive_in( is );
            cereal::JSONOutputArchive archive_out( std::cout );

            archive_in( rec2 );
            archive_out( rec2 );
         }

         // print status
         std::cout << std::endl;
         std::cout << "original       : " << std::setprecision(std::numeric_limits<double>::max_digits10)
                   << rec.z
                   << "\t(printed with std::setprecision(std::numeric_limits<double>::max_digits10))" << std::endl;                   
         std::cout << "after roundtrip: " << std::setprecision(std::numeric_limits<double>::max_digits10)
                   << rec2.z
                   << "\t(printed with std::setprecision(std::numeric_limits<double>::max_digits10))" << std::endl;

         assert(rec.z == rec2.z);  // FAILS: round-trip does not work!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      }
      std::cout << std::endl;
   }

   return 0;
}

It does however seem to work when using
https://github.com/m7thon/cereal/tree/current-rapidjson-and-json-improvements

Note: a correct way of serializing floats and double is by using max_digits10. Here's a brief example:

      struct MyRecord
      {
         double z;                    // here is the double
      }; 

      MyRecord rec{2.2};

      {
         std::stringstream ss;
         ss << std::setprecision(std::numeric_limits<double>::max_digits10)
            << rec.z;                     // serialize from double to string

         decltype(rec.z) val;
         ss >> val;                       // deserialize back to double

         assert(val == rec.z);            // round-trip perfect
      }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
0