Technology Sharing

C The secrets you don't know in stl (string)

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Table of contents

1. Why learn string class?

1.1 Strings in C

2. String class in the standard library

2.1 string class

2.2 Common interface description of string class

1. Common construction of string class objects

2. Operations on string objects

3. Description of string structure under vs and g++

3. Simulation implementation of string class

3.2 Shallow Copy

3.3 Deep Copy

3.4 Copy-on-write

3.5 Simulation implementation of string class


1. Why study?stringkind?

1.1 CStrings in a language

C In the language, strings are '0' A collection of some characters at the end, for ease of operation, C The standard library provides some str A series of library functions, but these library functions are separated from strings and do not conform to OOP , and the underlying space needs to be managed by the user himself, and if you are not careful, you may access it beyond the boundary.

2. In the standard librarystringkind

2.1 stringkind

https://cplusplus.com/reference/string/string/?kw=string

  • 1. A string is a class that represents a sequence of characters
  • 2. The standard string class provides support for such objects. Its interface is similar to that of the standard character container, but with the addition of specialized functions for manipulating
  • Design feature of single-byte character strings.
  • 3. stringClass is usedchar(That is, as its character type, use its defaultchar_traitsand allocator type(More information about templates
  • For more information, please refer tobasic_string)
  • 4. stringClass isbasic_stringAn instance of a template class that usescharTo instantiatebasic_stringTemplate class, and usechar_traits
  • andallocatorAsbasic_stringThe default parameters(For more template information, please refer tobasic_string)
  • 5. Note that this class handles bytes independently of the encoding used.:If used to process multi-byte or variable-length characters(likeUTF-8)The sequence, this
  • All members of the class(Such as length or size)and its iterators will still be in bytes(Instead of the actual encoded characters)to operate.
Summarize:
  • 1. stringIs a string class that represents a string
  • 2. The interface of this class is basically the same as that of a regular container, with some additional functions specifically designed for operation.stringof normal operation.
  • 3. stringWhat actually happens under the hood is:basic_stringAliases for template classes,typedef basic_string<char, char_traits, allocator>
  • string;
  • 4. Cannot operate on sequences of multi-byte or variable-length characters.
existusestringClass, must contain#includeHeader files andusing namespace std;

2.2 stringCommon interface description of the class

1. stringCommon construction of class objects

  1. void Teststring()
  2. {
  3. string s1; // 构造空的string类对象s1
  4. string s2("hello bit"); // 用C格式字符串构造string类对象s2
  5. string s3(s2); // 拷贝构造s3
  6. }

2. stringOperations on class objects

PS: 

  • 1. size() andlength()The underlying implementation principle of the method is exactly the same.size()The reason is to keep consistent with the interface of other containers. Generally, size() is used.
  • 2. clear() just setsstringThe valid characters in the textarea are cleared without changing the size of the underlying space.
  • 3. resize(size_t n) and resize(size_t n, char c)Both change the number of valid characters in the string tonThe difference is that when the number of characters increases: resize(n)use0To fill the extra element space,resize(size_t n, char c)Use characterscTo fill the extra element space. Note: resizeWhen changing the number of elements, if the number of elements is increased, the size of the underlying capacity may change. If the number of elements is reduced, the total size of the underlying space remains unchanged.
  • 4. reserve(size_t res_arg=0):forstringReserve space without changing the number of valid elements.reserveThe parameter is less than
  • stringWhen the total size of the bottom layer space isreserverThe capacity will not change.

3.vsandg++DownstringDescription of the structure

  The following structure is in 32 Verify on the platform. 32 Pointer occupancy under bit platform 4 bytes.
vs. The structure of string
string accounts for a total of 28 Bytes , the internal structure is a little more complicated, first There is a union, which is used to define Storage space for strings in string:
  1. When the string length is less than 16, an internal fixed character array is used to store
  2. When the string length is greater than or equal to16When the space is allocated from the heap 
  1. union _Bxty
  2. { // storage for small buffer or pointer to larger one
  3. value_type _Buf[_BUF_SIZE];
  4. pointer _Ptr;
  5. char _Alias[_BUF_SIZE]; // to permit aliasing
  6. } _Bx;
This design also makes sense. In most cases, the length of the string is less than 16 ,That string After the object is created, it already has 16 A fixed space for character arrays does not need to be created through the heap, which is highly efficient.
Secondly: one size_t The field holds the length of the string, a size_t The field stores the total capacity of the space allocated from the heap.
Finally: There is a pointer Do something else.
Therefore, the total16+4+4+4=28bytes.
g++ Down string Structure
G++ Down, string This is achieved through copy-on-write. string The total number of objects 4 Bytes, which only contains a pointer, which will point to a heap space in the future, and contains the following fields:
  1. Total space size
  2. Valid string length
  3. Reference counting
  4. Pointer to the heap space used to store the string.
  1. struct _Rep_base
  2. {
  3. size_type _M_length;
  4. size_type _M_capacity;
  5. _Atomic_word _M_refcount;
  6. };

3. stringMock implementation of class

PS: When implementing the string class yourself, you must pay attention to the shallow copy problem

Above String If the class does not explicitly define its copy constructor and assignment operator overloads, the compiler will synthesize default ones. s1 Structure make s2 , the compiler will call the default copy constructor. The final problem is, s1 s2 Share the same memory space, and release the same The space is released multiple times and the program crashes This type of copying is called a shallow copy.

 3.2 Shallow copy

Shallow copy: also called bit copy, the compiler just copies the value in the object .if Managing resources in objects , and finally Multiple objects share When an object is destroyed, the resource is released. However, other objects do not know that the resource has been released. It is still valid, so when you continue to operate on the resource, an access violation will occur

3.3 Deep Copy

If a class involves resource management, its copy constructor, assignment operator overload, and destructor must be explicitly provided. Generally, they are provided in a deep copy manner.

3.4 Copy-on-write

Copy-on-write is a form of procrastination, which is achieved by adding reference counting on the basis of shallow copy.
Reference count: used to record the number of resource users. During construction, the resource count is given to 1 Each time an object uses the resource, the count is increased by 1. When an object is destroyed, first decrement the count 1 Then check whether resources need to be released. If the count is 1 , indicating that the object is the last user of the resource and releases the resource; otherwise, it cannot be released because there are other objects using the resource.

3.5 stringMock implementation of class

  1. //string.h
  2. #pragma once
  3. #include<iostream>
  4. #include<assert.h>
  5. using namespace std;
  6. namespace mystr {
  7. class string
  8. {
  9. public:
  10. //迭代器, 因为字符串底层内存连续, 所以可以简单的定义成指针
  11. typedef char* iterator;
  12. typedef const char* const_iterator;
  13. //配合范围for循环
  14. iterator begin() { return _str; }
  15. iterator end() { return _str + _size; }
  16. //兼容常量字符串
  17. const_iterator begin() const { return _str; }
  18. const_iterator end() const { return _str + _size; }
  19. //string();
  20. string(const char* str = "");
  21. string(const string& s);
  22. string& operator=(string temp) { swap(temp); return *this; }
  23. ~string() { delete[] _str; _str = nullptr; _size = _capacity = 0; }
  24. //返回C语言字符数组
  25. const char* c_str() const { return _str; }
  26. size_t size() const { return _size; }
  27. char& operator[](size_t pos) { assert(pos < _size); return _str[pos]; }
  28. const char& operator[](size_t pos) const{ assert(pos < _size); return _str[pos]; }
  29. //重置大小
  30. void reserve(size_t n);
  31. void push_back(char ch) { insert(_size, ch); }
  32. void append(const char* str) { insert(_size, str); }
  33. string& operator+=(char ch) { insert(_size, ch); return *this; }
  34. string& operator+=(const char* str) { insert(_size, str); return *this; };
  35. void insert(size_t pos, char ch);
  36. void insert(size_t pos, const char* str);
  37. void erase(size_t pos = 0, size_t len = npos);
  38. size_t find(char ch, size_t pos = 0) {
  39. for (size_t i = pos; i < _size; i++) if (_str[i] == ch) return i;
  40. return npos;
  41. }
  42. size_t find(const char* str, size_t pos = 0) { return strstr(_str + pos, str) - _str; }
  43. void swap(string& s);
  44. string substr(size_t pos = 0, size_t len = npos);
  45. bool operator<(const string& s) const { return strcmp(_str, s._str) < 0; }
  46. bool operator>(const string& s) const { return !(*this <= s); }
  47. bool operator<=(const string& s) const { return !(*this > s); }
  48. bool operator>=(const string& s) const { return !(*this < s); }
  49. bool operator==(const string& s) const {return strcmp(_str, s._str) == 0; }
  50. bool operator!=(const string& s) const { return !(*this == s); }
  51. void clear() { _str[0] = '0'; _size = 0; }
  52. private:
  53. char* _str;
  54. size_t _size;
  55. size_t _capacity;
  56. //一般static变量的定义要放在类外, 整型是特例
  57. const static size_t npos = -1;
  58. };
  59. void swap(string& s1, string& s2);
  60. istream& operator>>(istream& ci, string& s);
  61. ostream& operator<<(ostream& co, string& s);
  62. }
  1. //string.cpp
  2. #include "string.h"
  3. namespace mystr {
  4. string::string(const char* str):_size(strlen(str)) {
  5. _str = new char[_size + 1];
  6. _capacity = _size;
  7. strcpy(_str, str);
  8. }
  9. string::string(const string& s) {
  10. string temp(s._str);
  11. swap(temp);
  12. }
  13. void string::reserve(size_t n) {
  14. if (_capacity < n) {
  15. char* temp = new char[n + 1];
  16. strcpy(temp, _str);
  17. delete[] _str;
  18. _str = temp;
  19. _capacity = n;
  20. }
  21. }
  22. void string::insert(size_t pos, char ch) {
  23. assert(pos <= _size);
  24. if (_size == _capacity) {
  25. size_t newcapacity = _capacity == 0 ? 4 : 2 * _capacity;
  26. reserve(newcapacity);
  27. }
  28. size_t end = _size + 1;
  29. while (end > pos) _str[end] = _str[end - 1], --end;
  30. _str[pos] = ch;
  31. _size++;
  32. }
  33. void string::insert(size_t pos, const char* str) {
  34. assert(pos <= _size);
  35. size_t len = strlen(str);
  36. if (_size + len > _capacity) reserve(_size + len);
  37. size_t end = _size + len;
  38. while (end > pos + len - 1) _str[end] = _str[end - len], --end;
  39. memcpy(_str + pos, str, len);
  40. _size += len;
  41. }
  42. void string::erase(size_t pos, size_t len) {
  43. if (len > _size - pos) _str[pos] = '0', _size = pos;
  44. else strcpy(_str + pos, _str + pos + len), _size -= len;
  45. }
  46. void string::swap(string& s) {
  47. char* temp = _str;
  48. _str = s._str;
  49. s._str = temp;
  50. std::swap(_size, s._size);
  51. }
  52. string string::substr(size_t pos, size_t len) {
  53. if (len > _size - pos) { string sub(_str + pos); return sub; }
  54. else {
  55. string sub;
  56. sub.reserve(len);
  57. for (size_t i = pos; i < pos + len; i++) sub += _str[i];
  58. return sub;
  59. }
  60. }
  61. void swap(string& s1, string& s2){ s1.swap(s2); }
  62. istream& operator>>(istream& ci, string& s) {
  63. s.clear();
  64. char ch = ci.get();
  65. while (ch != ' ' && ch != 'n') s += ch, ch = ci.get();
  66. return ci;
  67. }
  68. ostream& operator<<(ostream& co, string& s) {
  69. for (size_t i = 0; i < s.size(); i++) co << s[i];
  70. return co;
  71. }
  72. }
  1. //test.cpp
  2. #include "string.h"
  3. namespace mystr {
  4. void test1() {
  5. string s1 = "1111";
  6. string s2 = s1;
  7. cout << s1.c_str() << endl << s2.c_str() << endl;
  8. cout << s1.size() << endl;
  9. }
  10. void test2() {
  11. string s1 = "111";
  12. string s2 = "222222";
  13. s1 = s2;
  14. cout << s1.c_str() << endl;
  15. }
  16. void test3() {
  17. string s1 = "111222333";
  18. for (auto& i : s1) i += 3;
  19. cout << s1.c_str() << endl;
  20. const string s2 = "111222333";
  21. for (auto& i : s2) cout << i;
  22. cout << endl;
  23. for (size_t i = 0; i < s1.size(); i++) cout << (s1[i] += 2);
  24. cout << endl;
  25. }
  26. void test4() {
  27. string s1 = "sadfsf";
  28. s1.insert(2, '-');
  29. cout << s1.c_str() << endl;
  30. s1.insert(0, '-');
  31. cout << s1.c_str() << endl;
  32. s1.insert(2, "11111");
  33. cout << s1.c_str() << endl;
  34. s1.insert(0, "222222");
  35. cout << s1.c_str() << endl;
  36. }
  37. void test5() {
  38. string s1 = "asgfidsgf";
  39. s1.push_back('-');
  40. cout << s1.c_str() << endl;
  41. s1.append("=====");
  42. cout << s1.c_str() << endl;
  43. s1 += 'w';
  44. cout << s1.c_str() << endl;
  45. s1 += "0000";
  46. cout << s1.c_str() << endl;
  47. s1.erase(10);
  48. cout << s1.c_str() << endl;
  49. s1.erase(7, 100);
  50. cout << s1.c_str() << endl;
  51. s1.erase(3, 2);
  52. cout << s1.c_str() << endl;
  53. s1.erase(0);
  54. cout << s1.c_str() << endl;
  55. }
  56. void test6() {
  57. string s1 = "ksjfghks";
  58. cout << s1.find('h', 2) << endl;
  59. cout << s1.find("ghk", 2) << endl;
  60. cout << s1.find("ghksgs", 2) << endl;
  61. }
  62. void test7(){
  63. string s1 = "sggsdsdf";
  64. string s2 = "sdgfrgdb";
  65. cout << s1.c_str() << endl;
  66. cout << s2.c_str() << endl;
  67. swap(s1, s2);
  68. cout << s1.c_str() << endl;
  69. cout << s2.c_str() << endl;
  70. s1.swap(s2);
  71. cout << s1.c_str() << endl;
  72. cout << s2.c_str() << endl;
  73. string s3 = s1.substr(2, 5);
  74. cout << s3.c_str() << endl;
  75. }
  76. void test8() {
  77. string s1, s2;
  78. cin >> s1 >> s2;
  79. cout << s1 << endl << s2 << endl;
  80. }
  81. }
  82. int main() {
  83. mystr::test8();
  84. return 0;
  85. }