Via the Boost development list I came across Thomas Jensen's TinyJSON parser. As I've also been spending time on writing a JSON parser using the Boost tools and figure we might be able to learn something from each other's approaches.
Firstly though I think our goals are slightly different. I'm writing a JSON parser to fit in with the requirements of using JSON within FOST.3™ whereas his is a more general header only library. It would be hard to take my JSON parser without also taking a lot of the FOST.3™ foundation classes — there are good reasons for that which I'll get to in a moment.
In terms of what comes out of the parser the biggest difference is that I produce a JSON object based on Boost.Variant and he produces one based on Boost.Any. I think he's right that using Boost.Variant will introduce some extra complexity, but I think the simplification of accessing the final structure and the better type safety are both well worth it, but I'm not sure that it is compatible with his aims.
I split the JSON object itself into two parts. The first is a variant structure which is able to handle the simple values and is based on this Boost.Variant¹ [1t_null
is simply a type representing the empty value called Null
in FOST.3™.]:
boost::variant< t_null, bool, int64_t, double, wstring >
The complete class looks like this (I've cut some members for brevity):
class F3UTIL_DECLSPEC Variant { boost::variant< t_null, bool, int64_t, double, wstring > m_v; public: Variant() : m_v( Null ) {} explicit Variant( bool b ) : m_v( b ) {} explicit Variant( char c ) : m_v( int64_t( c ) ) {} explicit Variant( int i ) : m_v( int64_t( i ) ) {} explicit Variant( unsigned int i ) : m_v( int64_t( i ) ) {} explicit Variant( long l ) : m_v( int64_t( l ) ) {} explicit Variant( unsigned long l ) : m_v( int64_t( l ) ) {} explicit Variant( int64_t i ) : m_v( i ) {} explicit Variant( float f ) : m_v( double( f ) ) {} explicit Variant( double d ) : m_v( d ) {} explicit Variant( const char *s ) : m_v( widen( s ) ) {} explicit Variant( const wchar_t *s ) : m_v( wstring( s ) ) {} explicit Variant( const wstring &s ) : m_v( s ) {} bool isnull() const; template< typename T > Nullable< T > get() const { const T *p = boost::get< T >( &m_v ); if ( p ) return *p; else return Null; } bool operator ==( const Variant &v ) const; bool operator !=( const Variant &v ) const { return !( *this == v ); } template< typename T > Variant &operator =( T t ) { m_v = Variant( t ); return *this; } template< typename T > typename T::result_type apply_visitor( T &t ) const { return boost::apply_visitor( t, m_v ); } };
This includes a number of type promoting constructors and forwarders for Boost's get
(the use of Nullable
is a standard FOST.3™ idiom) and the static visitor.
The actual JSON object is created from this base (again I've cut some members):
class F3UTIL_DECLSPEC Json { public: typedef FSLib::Variant atom_t; typedef std::vector< boost::shared_ptr< Json > > array_t; typedef FSLib::wstring key_t; typedef std::map< key_t, boost::shared_ptr< Json > > object_t; typedef boost::variant< atom_t, array_t, object_t > element_t; BOOST_STATIC_ASSERT( sizeof( array_t::size_type ) == sizeof( object_t::size_type ) ); Json(); template< typename T > explicit Json( const T &t ) : m_element( atom_t( t ) ) { } explicit Json( const atom_t &a ) : m_element( a ) { } Json( const array_t &a ) : m_element( a ) { } Json( const object_t &o ) : m_element( o ) { } explicit Json( const element_t &e ) : m_element( e ) { } template< typename T > Nullable< T > get() const { const atom_t *p = boost::get< atom_t >( &m_element ); if ( p ) return ( *p ).get< T >(); else return Null; } template< typename T > Json &operator =( const T &t ) { m_element = atom_t( t ); return *this; } Json &operator =( const array_t &a ) { m_element = a; return *this; } Json &operator =( const object_t &o ) { m_element = o; return *this; } bool operator ==( const Json &r ) const; bool operator !=( const Json &r ) const { return !( *this == r ); } template< typename T > typename T::result_type apply_visitor( T &t ) const { return boost::apply_visitor( t, m_element ); } private: element_t m_element; };
I wouldn't be at all surprised if all of this machinery was far too much for Thomas. The problem here is that it moves TinyJSON away from just a JSON parser to being a full blown JSON API — not quite so tiny any more.
Neither can I see a way of making this more lightweight by avoiding the wrapper class because you can't do this:
typedef boost::variant< t_null, int, double, std::vector< Json* >, std::map< string, Json* > > Json;
This sort of recursion is only possible with a full blown struct or class which means a load of constructors and forwarders and realistically a whole load of other machinery.
Even harder is correct Unicode support. The first thing to realise about Unicode in JSON is that it uses UTF-16. If you're on Windows this isn't such a big deal, but for various other platforms this is likely to cause some difficulties :/
Here is the string parser that I use:
struct string_closure : boost::spirit::closure< string_closure, FSLib::wstring, std::vector< utf16 >, utf16 > { member1 text; member2 buffer; member3 character; }; const struct json_string_parser : public grammar< json_string_parser, string_closure::context_t > { template< typename scanner_t > struct definition { definition( json_string_parser const& self ) { top = string[ self.text = arg1 ]; string = chlit< wchar_t >( L'"' ) >> *( ( chlit< wchar_t >( L'\\' ) >> L'\"' )[ push_back( string.buffer, L'"' ) ] | ( chlit< wchar_t >( L'\\' ) >> L'\\' )[ push_back( string.buffer, L'\\' ) ] | ( chlit< wchar_t >( L'\\' ) >> L'/' )[ push_back( string.buffer, L'/' ) ] | ( chlit< wchar_t >( L'\\' ) >> L'b' )[ push_back( string.buffer, utf16( 0x08 ) ) ] | ( chlit< wchar_t >( L'\\' ) >> L'f' )[ push_back( string.buffer, utf16( 0x0c ) ) ] | ( chlit< wchar_t >( L'\\' ) >> L'n' )[ push_back( string.buffer, L'\n' ) ] | ( chlit< wchar_t >( L'\\' ) >> L'r' )[ push_back( string.buffer, L'\r' ) ] | ( chlit< wchar_t >( L'\\' ) >> L't' )[ push_back( string.buffer, L'\t' ) ] | ( chlit< wchar_t >( L'\\' ) >> L'u' >> uint_parser< utf16, 16, 4, 4 >()[ push_back( string.buffer, arg1 ) ] ) | ( anychar_p[ string.character = arg1 ] - ( chlit< wchar_t >( L'"' ) | chlit< wchar_t >( L'\\' ) ) )[ push_back( string.buffer, string.character ) ] ) >> chlit< wchar_t >( L'"' )[ string.text = string.buffer /* this is hard */ ]; } rule< scanner_t, string_closure::context_t > string; rule< scanner_t > top; rule< scanner_t > const &start() const { return top; } }; } json_string_p;
This parser uses Boost.Phoenix and closures which I think makes it a little easier to follow — but of course I would say that :)
There are a couple of things to notice:
std::vector< utf16 >
.Because JSON is UTF-16 the second piont becomes even harder to deal with if you try to mix the buffer character type with a different final string character type. I'm lucky because I have access to all of FOST.3™'s Unicode support and the FSLib::wstring
is a std::wstring
like class which has explicit Unicode support and can be constructed and assigned to directly from a UTF-16 buffer.
Consider the following JSON strings:
"\u2014" "\u5b6b\u5b50" "\xd834\xdd1e"
Here they are decoded:
"—" "孫子" "𝄞"
The first one is just an mdash, the second is Sun Tzu's name in Chinese, but the last is hard. If you're not using a good browser you probably won't even see it. It's a treble cleff and is a single Unicode code point which has to be represented as two UTF-16 code points. This needs to be converted to a four bytes in UTF-8 not six (F0 9D 84 9E) — the UTF-16 to UTF-8 converter has to go via UTF-32 to get this right.
Whether Thomas wants to deal with this, or how it should be dealt with in a lightweight library is really an open question.
What I think Thomas can do is to use my string parser above and parameterise it on a conversion function that can be used to convert from the UTF-16 buffer to the required string type. He can keep library light by providing a fairly simple implementation that throws an exception on anything non-ASCII, but also allow for better Unicode handling when users supply a more capable (and heavier weight) implementation.