📜 ⬆️ ⬇️

About string formatting in modern C ++

Good day! In this article, I would like to talk about the existing possibilities of string formatting in modern C ++, show my achievements that I have been using in real projects for several years, and also compare the performance of different approaches to string formatting.


String formatting is an operation that allows you to get the resulting string from a template string and a set of arguments. The template string contains the text in which placeholders are included, in place of which arguments are substituted.


For clarity, a small example:


int apples = 5; int oranges = 7; std::string str = format("I have %d apples and %d oranges, so I have %d fruits", apples, oranges, apples + oranges); std::cout << str << std::endl; 

Here:
String pattern: I have% d apples and% d oranges, so I have% d fruits
Placeholders:% d,% d,% d
Arguments: apples, oranges, apples + oranges


When executing the example, we get the resulting string


 I have 5 apples and 7 oranges, so I have 12 fruits 

Now let's see what C ++ gives us for string formatting.


Legacy C


String formatting in C is accomplished using the Xprintf family of functions. With the same success, we can use these functions in C ++:


 char buf[100]; int res = snprintf(buf, sizeof(buf), "I have %d apples and %d oranges, so I have %d fruits", apples, oranges, apples + oranges); std::string str = "error!"; if (res >= 0 && res < sizeof(buf)) str = buf; std::cout << str << std::endl; 

This is a pretty good way of formatting, despite the seeming clumsiness:



But, of course, it was not without flaws:



Std :: to_string () function


Starting with C ++ 11, the std :: to_string () function has appeared in the standard library, which allows you to convert the passed value into a string. The function does not work with all types of arguments, but only with the following:



Usage example:


 std::string str = "I have " + std::to_string(apples) + " apples and " + std::to_string(oranges) + " oranges, so I have " + std::to_string(apples + oranges) + " fruits"; std::cout << str << std::endl; 

Class std :: stringstream


The std :: stringstream class is the main way of string formatting provided by C ++:


 std::stringstream ss; ss << "I have " << apples << " apples and " << oranges << " oranges, so I have " << apples + oranges << " fruits"; std::string str = ss.str(); std::cout << str << std::endl; 

Strictly speaking, the use of std :: stringstream is not fully string formatted, since instead of placeholders, we insert arguments into the pattern string. This is permissible in the simplest cases, but in more complex cases, the readability of the code significantly worsens:


 ss << "A[" << i1 << ", " << j1 << "] + A[" << i2 << ", " << j2 << "] = " << A[i1][j1] + A[i2][j2]; 

compare with:


 std::string str = format("A[%d, %d] + A[%d, %d] = %d", i1, j1, i2, j2, A[i1][j1] + A[i2][j2]); 

The std :: sringstream object allows you to implement several interesting wrappers that you may need in the future.


Convert "anything" to a string:


 template<typename T> std::string to_string(const T &t) { std::stringstream ss; ss << t; return ss.str(); } 

 std::string str = to_string("5"); 

Convert a string to "anything":


 template<typename T> T from_string(const std::string &str) { std::stringstream ss(str); T t; ss >> t; return t; } template<> std::string from_string(const std::string &str) { return str; } 

 int x = from_string<int>("5"); 

Converting a string to "anything" with a check:


 template<typename T> T from_string(const std::string &str, bool &ok) { std::stringstream ss(str); T t; ss >> t; ok = !ss.fail(); return t; } template<> std::string from_string(const std::string &str, bool &ok) { ok = true; return str; } 

 bool ok = false; int x = from_string<int>("x5", ok); if (!ok) ... 

You can also write a pair of wrappers for conveniently using std :: stringstream in one string.


Using the std :: stringstream object for each argument:


 class fstr final : public std::string { public: fstr(const std::string &str = "") { *this += str; } template<typename T> fstr &operator<<(const T &t) { *this += to_string(t); return *this; } }; 

 std::string str = fstr() << "I have " << apples << " apples and " << oranges << " oranges, so I have " << apples + oranges << " fruits"; 

Using one std :: stringstream object for the entire string:


 class sstr final { public: sstr(const std::string &str = "") : ss_(str) { } template<typename T> sstr &operator<<(const T &t) { ss_ << t; return *this; } operator std::string() const { return ss_.str(); } private: std::stringstream ss_; }; 

 std::string str = sstr() << "I have " << apples << " apples and " << oranges << " oranges, so I have " << apples + oranges << " fruits"; 

Looking ahead, it turns out that the performance of std :: to_string is 3-4 times higher than that of to_string, implemented using std :: stringstream. Therefore, it will be logical to use std :: to_string for the appropriate types, and for all others to use the template to_string:


 std::string to_string(int x) { return std::to_string(x); } std::string to_string(unsigned int x) { return std::to_string(x); } std::string to_string(long x) { return std::to_string(x); } std::string to_string(unsigned long x) { return std::to_string(x); } std::string to_string(long long x) { return std::to_string(x); } std::string to_string(unsigned long long x) { return std::to_string(x); } std::string to_string(float x) { return std::to_string(x); } std::string to_string(double x) { return std::to_string(x); } std::string to_string(long double x) { return std::to_string(x); } std::string to_string(const char *x) { return std::string(x); } std::string to_string(const std::string &x) { return x; } template<typename T> std::string to_string(const T &t) { std::stringstream ss; ss << t; return ss.str(); } 

Library boost :: format


The boost library set is a powerful tool that perfectly complements the tools of the C ++ language and the standard library. String formatting is represented by the boost :: format library.


The following are supported as typical placeholders:


 std::string str = (boost::format("I have %d apples and %d oranges, so I have %d fruits") % apples % oranges % (apples + oranges)).str(); 

so and ordinal:


 std::string str = (boost::format("I have %1% apples and %2% oranges, so I have %3% fruits") % apples % oranges % (apples + oranges)).str(); 

The only disadvantage of boost :: format is poor performance, it is the slowest way to string formatting. Also this method is not applicable if third-party libraries cannot be used in the project.


So, it turns out that C ++ and the standard library do not provide us with convenient means of string formatting, so we will write something of our own.


Wrapper over vsnprintf


Let's try to write a wrapper over the Xprintf function, allocating enough memory and passing an arbitrary number of parameters.


To allocate memory, we will use the following strategy:


  1. First, we allocate as much memory as will be sufficient in most cases.
  2. try to call the format function
  3. if the call fails, allocate more memory and repeat the previous step

For the transfer of parameters we will use the stdarg mechanism and the vsnprintf function.


 std::string format(const char *fmt, ...) { va_list args; va_start(args, fmt); std::vector<char> v(1024); while (true) { va_list args2; va_copy(args2, args); int res = vsnprintf(v.data(), v.size(), fmt, args2); if ((res >= 0) && (res < static_cast<int>(v.size()))) { va_end(args); va_end(args2); return std::string(v.data()); } size_t size; if (res < 0) size = v.size() * 2; else size = static_cast<size_t>(res) + 1; v.clear(); v.resize(size); va_end(args2); } } 

 std::string str = format("I have %d apples and %d oranges, so I have %d fruits", apples, oranges, apples + oranges); 

Here it is worth explaining a couple of nuances. The return value of Xprintf functions depends on the platform, on some platforms, in case of failure, -1 is returned, in this case we double the buffer. On other platforms, the length of the resulting string is returned (excluding the null character), in which case we can immediately allocate as much memory as necessary. More information about the behavior of Xprintf functions on various platforms can be found here . Also, on some platforms, vsnprintf () "spoils" the argument list, so we copy it before the call.


I started using this function even before the appearance of C ++ 11 and with minor changes I continue to use it until today. The main disadvantage of using is the lack of support for std :: string as arguments, so you should not forget to add .c_str () to all string arguments:


 std::string country = "Great Britain"; std::string capital = "London"; std::cout << format("%s is a capital of %s", capital.c_str(), country.c_str()) << std::endl; 

Template with a variable number of arguments (Variadic Template)


In C ++, starting from C ++ 11, it became possible to use templates with a variable number of arguments (Variadic Templates).


Such patterns can be used when passing arguments to the format function. Also, we no longer need to worry about argument types, since we can use the template to_string, which was implemented earlier. Therefore, we will use ordinal placeholders.


To get all the arguments, we separate the first argument, convert it to a string, memorize and recursively repeat this operation. If there are no arguments or when they are finished (the end point of the recursion), we perform the parsing of the pattern string, argument substitution, and we get the resulting string.


Thus, we have everything to fully implement the formatting function: parsing a template string, collecting and converting all parameters into a string, substituting parameters into a template string, and obtaining the resulting string:


 std::string vtformat_impl(const std::string &fmt, const std::vector<std::string> &strs) { static const char FORMAT_SYMBOL = '%'; std::string res; std::string buf; bool arg = false; for (int i = 0; i <= static_cast<int>(fmt.size()); ++i) { bool last = i == static_cast<int>(fmt.size()); char ch = fmt[i]; if (arg) { if (ch >= '0' && ch <= '9') { buf += ch; } else { int num = 0; if (!buf.empty() && buf.length() < 10) num = atoi(buf.c_str()); if (num >= 1 && num <= static_cast<int>(strs.size())) res += strs[num - 1]; else res += FORMAT_SYMBOL + buf; buf.clear(); if (ch != FORMAT_SYMBOL) { if (!last) res += ch; arg = false; } } } else { if (ch == FORMAT_SYMBOL) { arg = true; } else { if (!last) res += ch; } } } return res; } template<typename Arg, typename ... Args> inline std::string vtformat_impl(const std::string& fmt, std::vector<std::string>& strs, Arg&& arg, Args&& ... args) { strs.push_back(to_string(std::forward<Arg>(arg))); return vtformat_impl(fmt, strs, std::forward<Args>(args) ...); } inline std::string vtformat(const std::string& fmt) { return fmt; } template<typename Arg, typename ... Args> inline std::string vtformat(const std::string& fmt, Arg&& arg, Args&& ... args) { std::vector<std::string> strs; return vtformat_impl(fmt, strs, std::forward<Arg>(arg), std::forward<Args>(args) ...); } 

The algorithm turned out to be quite effective, it works in one pass through the format string. If, instead of inserting the placeholder, the argument cannot be inserted, it remains unchanged, no exceptions are generated.


Examples of using:


 std::cout << vtformat("I have %1 apples and %2 oranges, so I have %3 fruits", apples, oranges, apples + oranges) << std::endl; I have 5 apples and 7 oranges, so I have 12 fruits std::cout << vtformat("%1 + %2 = %3", 2, 3, 2 + 3) << std::endl; 2 + 3 = 5 std::cout << vtformat("%3 = %2 + %1", 2, 3, 2 + 3) << std::endl; 5 = 3 + 2 std::cout << vtformat("%2 = %1 + %1 + %1", 2, 2 + 2 + 2) << std::endl; 6 = 2 + 2 + 2 std::cout << vtformat("%0 %1 %2 %3 %4 %5", 1, 2, 3, 4) << std::endl; %0 1 2 3 4 %5 std::cout << vtformat("%1 + 1% = %2", 54, 54 * 1.01) << std::endl; 54 + 1% = 54.540000 std::string country = "Russia"; const char *capital = "Moscow"; std::cout << vtformat("%1 is a capital of %2", capital, country) << std::endl; Moscow is a capital of Russia template<typename T> std::ostream &operator<<(std::ostream &os, const std::vector<T> &v) { os << "["; bool first = true; for (const auto &x : v) { if (first) first = false; else os << ", "; os << x; } os << "]"; return os; } std::vector<int> v = {1, 4, 5, 2, 7, 9}; std::cout << vtformat("v = %1", v) << std::endl; v = [1, 4, 5, 2, 7, 9] 

Performance comparison


Performance comparison to_string and std :: to_string, milliseconds per million calls


int, mslong long msdouble ms
to_string6817041109
std :: to_string130201291

image


Performance comparison of formatting functions, milliseconds per million calls


ms
fstr1308
sstr1243
format788
boost :: format2554
vtformat2022

image


Thanks for attention. Comments and additions are welcome.


')

Source: https://habr.com/ru/post/318962/


All Articles