VRCJson.TryDeserializeFromJson stores Integers as Doubles | Voters

VRCJson.TryDeserializeFromJson stores Integers as Doubles

Ximmer

Integer fields in json data are being stored as Doubles in DataToken when using VRCJson.TryDeserializeFromJson in Udon#
Example Code:
DataToken data;
string json = "{ \"MyInt\": 12345 }";
if(VRCJson.TryDeserializeFromJson(json, out data))
{
    Debug.Log("Int" + ((DataDictionary)data)["MyInt"].Int.ToString());
}
causes exception:
[UdonSharp] Assets/JsonTest.cs(29,61): Udon runtime exception detected!
  An exception occurred during EXTERN to 'VRCSDK3DataDataToken.__get_Int__SystemInt32'.
      Parameter Addresses: 0x00000024, 0x00000025
  Attempted to access Double token as Int
Recommend parsing the json data correctly or providing getters/casts with auto type conversion.
Casts would be the preferable auto type conversion.
While getters like .Int .Double .Float would throw if the type mismatches
If I'm casting, I'm expecting the type to cast properly.
Doing an (Int)((Double)(my_data_token))
 is pretty dumb from a typecasting standpoint.
If casting an invalid format, like a dict/list to an int, then I would expect an exception.
strings I don't have a great opinion on. I think it would nice to run a Int64.Parse when casting a string to an int, but it would be acceptable to just throw a "cannot convert string to int" and expect the json to be formated properly

May 17, 2023

Ximmer

I understand the points that are being made.
But also realize that a double beyond 53 bits of mantissa can no longer represent all possible integers below it's maxim.
The utility of such large doubles is very low and is better served with custom math wrappers where passing the value as a string would be preferred for custom number parsing.
The JSON specification even calls out the range [-(2^53)+1, (2^53)-1] explicitly as a good interoperable range.
In this example precision is lost by treating the value as a double:
Console.WriteLine ((long)(double)8000000000000000001);
This code will output the value 8000000000000000000 because double does not actually have enough precision to store an int64
I shall accept the limitations imposed by the VRCJson implementation. but working with large values should be fully understood before imposing limitations on implementation.
If Doubles are meant to store int64's then they would need to have the precision available to store the in64 which they do not.
I just don't feel that values without an explicit decimal point should be treated as floating point where I can lose precision.
I will make due if the VRC Devs are not willing to budge on the matter. As I only intend my values to reach the millions and not get into the quintillions that something like IdleHome may reach with it's exponential value increases.
Thank you for the explanation on the design choices.

Phasedragon

Ximmer: I understand that this is frustrating, but it's a limitation of storing values in strings without specifying the data type. There's only so much that you can assume, and there's only so much you can do. If it's any consolation, we are planning to release an update adding byte array serialization in the future, which will preserve both the data type and values much more accurately, while also taking up less space than JSON. This is the real solution to your problem.

Phasedragon

This is working as expected. It is not incorrect, it is an unfortunate side effect of trying to put data from a non-typed language (Javascript) into a typed language (C#).
First things first, in your example code, you don't need to pull the int first and then ToString that. You should just call ToString on the token directly, as that will always be valid no matter what the token contains. This is documented in the DataTokens page. https://docs.vrchat.com/docs/data-tokens.
As for actually trying to pull an int for the purposes of using it as an int, the crux of the issue here is that JSON can store values above int maxvalue, and if the number is above int maxvalue when you try to cast to int, an error will be thrown one way or another. A lot of the design around Data containers/VRCJSON has had careful consideration about who should be responsible for an error, and this is no exception. The overall goal of this is to give you all the tools you need to make robust scripts that don't crash or lose data. It's easy to just avoid losing data if you crash everywhere, and it's easy to avoid crashing if you just lose data everywhere. It is very difficult to walk a fine line between these two things, and especially difficult to make a system where the default pathway encourages everyone to walk that line as well.
The way it currently works is that the getters for all numbers will work if the number inside the token can be safely upcasted to the requested type with no exceptions. Unfortunately, JSON numbers can go all the way up to double max, and doubles cannot be safely casted to anything else. If we were to allow the getters to downcast to a different type, then we would have to either A) allow it to crash if the value is outside the range of the requested type, or B) clamp it to be within the range of the requested type. 
Ultimately, the best way to handle this is to give the user all the information they need in order to come to a decision on their own. We've done this by providing the "DataToken.IsNumber" property, along with "DataToken.Number". These can be used on all numerical types, and you could use this to do something like 
if (token.IsNumber && token.Number > int.MinValue && token.Number < int.MaxValue)
{
int number = (int)token.Number;
}
If you absolutely need it as an integer, then this is the safest way to do it, because it allows you to avoid crashing and also handle the situation if it's not within the range of an int. Simply allowing the user to pull token.Int on any token would not encourage them to handle this situation properly, and they may create code that works just fine in their minimal test cases, but breaks in the real world.

Ximmer

Phasedragon: Or it could be parsed as both a Double and an Int64 and the appropriate IsDouble/IsInt could return the status as to whether the value was parsed correctly, providing exception safety.
If the original value contains a decimal point then IsInt would return false as the value is obviously not intended to be an integer.
This is really basic stuff and unless the implementation under the hood is written in C99 using the union type I can't see why an implementation like this wasn't considered. Or perhaps it was?
TokenType can become Number and IsInt/IsDouble can be used to determine the type of number, and as already stated double will cover all the numerical types for those lazy enough to only want to do if (token.IsNumber) then token.Double.
And for those that need int64 precision they can do if (token.IsNumber && token.IsInt) then token.Int
As for the issue with converting a token that contains both a double and integer representation when being converted back into JSON, a simple if (IntPart > 9007199254740992 || IntPart < -9007199254740992) then write the integer value, otherwise write the decimal

Phasedragon

Ximmer: DataTokens cannot contain two values at the same time. Once it's been requested, it needs to decide what type it is. Automatically picking a different type depending on what the value is would cause other issues, like TryGetValue<Double> returning false because the type isn't what you expect.