The C++ API remains a work in progress.
SWI-Prolog string handling has evolved over time. The functions that
create atoms or strings using char*
or wchar_t*
are "old school"; similarly with functions that get the string as
char*
or wchar_t*
. The PL_get_unify_put_[nw]chars()
family is more friendly when it comes to different input, output,
encoding and exception handling.
Roughly, the modern API is PL_get_nchars(), PL_unify_chars() and PL_put_chars() on terms. There is only half of the API for atoms as PL_new_atom_mbchars() and PL-atom_mbchars(), which take an encoding, length and char*.
However, there is no native "string" type in C++; the char*
strings can be automatically cast to string. If a C++ interface provides
only std::string
arguments or return values, that can
introduce some inefficiency; therefore, many of the functions and
constructors allow either a char*
or std::string
as a value (also wchar_t*
or std::wstring
.
For return values, char*
is dangerous because it can
point to local or stack memory. For this reason, wherever possible, the
C++ API returns a std::string
, which contains a copy of the
the string. This can be slightly less efficient that returning a
char*
, but it avoids some subtle and pervasive bugs that
even address sanitizers can't detect.12If
we wish to minimize the overhead of passing strings, this can be done by
passing in a pointer to a string rather than returning a string value;
but this is more cumbersome and modern compilers can often optimize the
code to avoid copying the return value.
Many of the classes have a as_string() method - this might be changed
in future to to_string(), to be consistent with
std::to_string()
. However, the method names such as
as_int32_t() were chosen istntead of to_int32_t() because they imply
that the representation is already an int32_t
, and not that
the value is converted to a int32_t
. That is, if the value
is a float, int32_t
will fail with an error rather than
(for example) truncating the floating point value to fit into a 32-bit
integer.
Many of the "opaque object handles", such as atom_t
,
term_t
, and functor_t
are integers.13Typically uintptr_t
values, which the C standard defines as “an unsigned integer type
with the property that any valid pointer to void can be converted to
this type, then converted back to pointer to void, and the result will
compare equal to the original pointer.'' As such, there is
no compile-time detection of passing the wrong handle to a function.
This leads to a problem with classes such as PlTerm
-
C++ overloading cannot be used to distinguish, for example, creating a
term from an atom versus creating a term from an integer. There are
number of possible solutions, including:
struct
instead of an
integer.It is impractical to change the C code, both because of the amount of edits that would be required and also because of the possibility that the changes would inhibit some optimizations.
There isn't much difference between subclasses versus tags; but as a matter of design, it's better to specify things as constants than as (theoretically) variables, so the decision was to use subclasses.