SWI-Prolog string handling has evolved over time. The functions that
create atoms or strings using char*
or wchar_t*
are "old school"; similarly with functions that get the string as
char*
or wchar_t*
. The PL_get_unify_put_[nw]chars()
family is more friendly when it comes to different input, output,
encoding and exception handling.
Roughly, the modern API is PL_get_nchars(), PL_unify_chars() and PL_put_chars() on terms. There is only half of the API for atoms as PL_new_atom_mbchars() and PL-atom_mbchars(), which take an encoding, length and char*.
However, there is no native "string" type in C++; the char*
strings can be automatically cast to string. If a C++ interface provides
only std::string
arguments or return values, that can
introduce some inefficiency; therefore, many of the functions and
constructors allow either a char*
or std::string
as a value (also wchar_t*
or std::wstring
.
For return values, char*
is dangerous because it can
point to local or stack memory. For this reason, wherever possible, the
C++ API returns a std::string
, which contains a copy of the
the string. This can be slightly less efficient that returning a
char*
, but it avoids some subtle and pervasive bugs that
even address sanitizers can't detect.12If
we wish to minimize the overhead of passing strings, this can be done by
passing in a pointer to a string rather than returning a string value;
but this is more cumbersome and modern compilers can often optimize the
code to avoid copying the return value.
Many of the classes have a as_string() method - this might be changed
in future to to_string(), to be consistent with
std::to_string()
. However, the method names such as
as_int32_t() were chosen istntead of to_int32_t() because they imply
that the representation is already an int32_t
, and not that
the value is converted to a int32_t
. That is, if the value
is a float, int32_t
will fail with an error rather than
(for example) truncating the floating point value to fit into a 32-bit
integer.