📜 ⬆️ ⬇️

File paths

It would seem - what could be simpler than working with files in C ++. But individuals are astounded in their search for the worst approach.
Do not do this:

std::string filepath("C:\\");
std::ofstream file(filepath.c_str());



In short, the use of non-ASCII characters in char strings can have dire consequences. I have already discussed this issue in a post about encodings . In this case, the file name directly depends on the encoding of the source code and if someone writes something like this in utf-8, in windows-xp you can get a file with forbidden characters, with which nothing can be done. Can not use non-ASCII. But you cannot forbid it to the user (to a flow or a DB from which the way is received). This is discrimination based on nationality! We urgently correct:

std::wstring filepath= L"C:\"
std::ofstream file(filepath.c_str());


Visual Studio users who are ignorant of the standard can calm down until the need forces the compiler to change (more precisely STL). And here it begins ...
- “wacky gcc” or “wacky stlport” does not contain the constructor ofstream :: ofstream (wchar_t *)

The fact is that the current standard does not imply its presence (until we touch C ++ 0x). This is purely small-scale enthusiasm.
')

What to do?



Several options


With the third option, everything is clear, with the first, too, nothing complicated:
  1. #include <boost / filesystem / fstream.hpp>
  2. #include <string>
  3. namespace fs = boost :: filesystem ;
  4. int main ( int argc, char * argv )
  5. {
  6. std :: wstring filepath ( L "C: \ test" ) ;
  7. fs :: ofstream ( filepath ) ;
  8. return 0 ;
  9. }

But over the second, those who did not teach the materiel may have problems.

We use std :: locale



Retreat.
Grieving users mingw: you have to use a third-party STL implementation (for example, stlport) due to the lack of proper localization support in your native language. Or rather, the function std :: locale ("") always returns std :: locale ("C"), whatever you have in it. The same stlport is devoid of such a disadvantage. About how to make a bunch of mingw + stlport + boost, I wrote back here .

All we need to do is follow the simple rules - with non-ASCII, we work in an “extended” way. That is, we read the path to std :: wstring using the appropriately localized stream, and, when used, we narrow down by user localization. This idea is based on the fact that once the user correctly sees the characters of his language in the console, his user localization knows in which encoding it is necessary to narrow a wide string in order to correctly interpret the path. So an example. Suppose we have a file in cp866 encoding containing a path. We need to create a file along this path. What are we doing:
  1. #include <iostream>
  2. #include <string>
  3. #include <fstream>
  4. #include <locale>
  5. #include <memory>
  6. #include "facet / codecvt / codecvt_cp866.hpp"
  7. / ** @ brief Narrows a wide string using loc localization
  8. @return Returns a narrowed string or an empty narrowed string, in
  9. case. if an error occurs * /
  10. std :: string narrow ( const std :: wstring & wstr, const std :: locale & loc )
  11. {
  12. const size_t sz = wstr. length ( ) ;
  13. if ( sz == 0 )
  14. return std :: string ( ) ;
  15. mbstate_t state = 0 ;
  16. char * cnext ;
  17. const wchar_t * wnext ;
  18. const wchar_t * wcstr = wstr. c_str ( ) ;
  19. char * buffer = new char [ sz + 1 ] ;
  20. std :: uninitialized_fill ( buffer, buffer + sz + 1 , 0 ) ;
  21. typedef std :: codecvt < wchar_t , char , mbstate_t > cvt ;
  22. cvt :: result res ;
  23. res = std :: use_facet < cvt > ( loc ) . out ( state, wcstr, wcstr + sz, wnext,
  24. buffer, buffer + sz, cnext ) ;
  25. std :: string result ( buffer ) ;
  26. if ( res == cvt :: error )
  27. return std :: string ( ) ;
  28. return result ;
  29. }
  30. / ** @ brief Extends a string using loc localization
  31. @return Returns an extended string or an empty extended string, in
  32. if an error occurred. * /
  33. std :: wstring widen ( const std :: string & str, const std :: locale & loc )
  34. {
  35. const size_t sz = str. length ( ) ;
  36. if ( sz == 0 )
  37. return std :: wstring ( ) ;
  38. mbstate_t state = 0 ;
  39. const char * cnext ;
  40. wchar_t * wnext ;
  41. const char * cstr = str. c_str ( ) ;
  42. wchar_t * buffer = new wchar_t [ sz + 1 ] ;
  43. std :: uninitialized_fill ( buffer, buffer + sz + 1 , 0 ) ;
  44. typedef std :: codecvt < wchar_t , char , mbstate_t > cvt ;
  45. cvt :: result res ;
  46. res = std :: use_facet < cvt > ( loc ) . in ( state, cstr, cstr + sz, cnext,
  47. buffer, buffer + sz, wnext ) ;
  48. std :: wstring result ( buffer ) ;
  49. delete [ ] buffer ;
  50. if ( res == cvt :: error )
  51. return std :: wstring ( ) ;
  52. return result ;
  53. }
  54. int main ( int argc, char * argv [ ] )
  55. {
  56. // Let there be a cp866 file with a path
  57. std :: ofstream ofile ( "input.txt" , std :: ios :: binary ) ;
  58. if ( ! ofile )
  59. {
  60. std :: cerr << "Error open file" << std :: endl ;
  61. return 0 ;
  62. }
  63. std :: ostreambuf_iterator < char > writer ( ofile ) ;
  64. * ( writer ) = 0xe2 ; // t
  65. * ( ++ writer ) = 0xa5 ; // e
  66. * ( ++ writer ) = 0xe1 ; // with
  67. * ( ++ writer ) = 0xe2 ; // t
  68. ofile. close ( ) ;
  69. // Read the path
  70. std :: locale cp866 ( std :: locale ( ) , new codecvt_cp866 ) ;
  71. std :: wifstream ifile ( "input.txt" , std :: ios :: binary ) ;
  72. ifile. imbue ( cp866 ) ;
  73. std :: wstring wpath ;
  74. ifile >> wpath ;
  75. ifile >> wpath ;
  76. ifile. close ( ) ;
  77. // Create a file in this path
  78. std :: ofstream file ( narrow ( wpath, std :: locale ( "" ) ) . c_str ( ) ) ;
  79. file << "testing" ;
  80. file . close ( ) ;
  81. }


Facets can be taken on git-hub .

SUMMARY



If you use the path from argv, feel free to work with it (the user knows what he is doing). From the “external environment”, get a path using a properly localized stream as a wide string and narrow it down using custom localization.

Questions can be addressed:
0. To standard
1. To the book of Straustrup (3rd special edition, annex)
2. To the documentation for mingw .
3. To documentation on boost .
4. To post about facets and encodings .

All direct ways!

UPD: Well, as Gorthauer87 , Migun and naryl correctly noted in the comments, backslashes and platform-specific paths are also a bad idea.

Source: https://habr.com/ru/post/112997/


All Articles