Review
- 2024-06-22 23:20
一、Introduction #
Strings are implemented as a collection of bytes, plus some methods to provide useful functionality when those bytes are interpreted as text. Remember that strings are UTF-8 encoded, so we can include any properly encoded data in them. A String is a wrapper over a Vec<u8>.
[!info] String vs String slice vs String literal String:可变、Heap存储、类型
&StringString Literal:不可变只读、类型是&'static str、编译时知道内容 String Slice: 不可变、不拥有数据,只是借用,可以是String的全部或部分,也可以是String Literal的部分、类型是&str
String type Features
- immutable
- not every string value can be known when we write our code
This type manages data allocated on the heap and as such is able to store an amount of text that is unknown to us at compile time.
let x = String::from("hello");
let y = "world"let mut s = String::from("hello");
let mut m = String::new();
let mut n = "initial contents".to_string();
s.push_str(", world!"); // push_str() appends a literal to a String
println!("{s}"); // This will print `hello, world!`
{
let s = String::from("hello"); // s is valid from this point forward
// do stuff with s
} // this scope is now over, and s is no
// longer valid
Take a look at Figure 4-1 to see what is happening to String under the covers. A String is made up of three parts, shown on the left: a pointer to the memory that holds the contents of the string, a length, and a capacity. This group of data is stored on the stack. On the right is the memory on the heap that holds the contents.
Figure 4-1 ![[e7c56a0c8b44_6dd6b9ad.svg]]
The length is how much memory, in bytes, the contents of the String are currently using. The capacity is the total amount of memory, in bytes, that the String has received from the allocator.
When we assign s1 to s2, the String data is copied, meaning we copy the pointer, the length, and the capacity that are on the stack. We do not copy the data on the heap that the pointer refers to. In other words, the data representation in memory looks like Figure 4-2.
Figure 4-2
Earlier, we said that when a variable goes out of scope, Rust automatically calls the drop function and cleans up the heap memory for that variable. But Figure 4-2 shows both data pointers pointing to the same location. This is a problem: when s2 and s1 go out of scope, they will both try to free the same memory. This is known as a double free error and is one of the memory safety bugs we mentioned previously. Freeing memory twice can lead to memory corruption, which can potentially lead to security vulnerabilities.
To ensure memory safety, after the line let s2 = s1;, Rust considers s1 as no longer valid. Therefore, Rust doesn’t need to free anything when s1 goes out of scope.
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used
The reason
s1is no longer valid after the addition, and the reason we used a reference tos2, has to do with the signature of the method that’s called when we use the+operator. The+operator uses theaddmethod, whose signature looks something like this:fn add(self, s: &str) -> String {
We can only add a &str to a String; we can’t add two String values together. But wait—the type of &s2 is &String, not &str, as specified in the second parameter to add. The reason we’re able to use &s2 in the call to add is that the compiler can coerce the &String argument into a &str.
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = s1 + "-" + &s2 + "-" + &s3;For more complicated string combining, we can instead use the format! macro:
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = format!("{s1}-{s2}-{s3}");The
format!macro works likeprintln!, but instead of printing the output to the screen, it returns aStringwith the contents. The version of the code usingformat!is much easier to read, and the code generated by theformat!macro uses references so that this call doesn’t take ownership of any of its parameters.
Indexing into a string is often a bad idea because it’s not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice.
If we were to try to slice only part of a character’s bytes with something like &hello[0..1], Rust would panic at runtime in the same way as if an invalid index were accessed in a vector.
for c in "Зд".chars() {
println!("{c}");
}