Ben Chuanlong Du's Blog

It is never too late to learn.

String in Rust

In [ ]:
:timing
:sccache 1

Tips and Traps

  1. Rust has 2 popular string types String and str (there are more string types in Rust but won't be covered here). String can be MUTABLE (different from Java and Python) and is heap-allocated while str is an immutable sequence of UTF-8 bytes somewhere in memory (static storage, heap or stack). String owns the memory for it while str does NOT. Since the size of str is unknown, one can only handle it behind a pointer. This means that str most commonly appears as &str: a reference to some UTF-8 data, normally called a string slice or just a slice . &str vs String is similar to slice vs array or Vec.
  1. &String is a reference to a String type and is also called a borrowed type. It is nothing more than a pointer which you can pass around without giving up ownership. &String can be coerced to a &str implicitly.

  2. If you want a rea-only view of a string, &str is preferred. If you want to own and mutate a string, String should be used. For example, String should be used for returning strings created within a function or (usually) when storing sstrings in a struct or enum.

  3. Indexing into a string is not available in Rust. The reason for this is that Rust strings are encoded in UTF-8 internally, so the concept of indexing itself would be ambiguous and people would misuse it. Byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character, which is really bad if you need text processing) while char indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string to find the required code point.

    There are 2 ways to get chars out of a string. First, you can call the chars method which returns an iterator. This ways is not efficient of course if you want random access. Second, you can get the underlying bytes representation of a string by calling the as_bytes method (which returns a byte slice &[u8]. You can then index the byte slice and convert a u8 variable to char using the as keyword.

  4. let my_str = "Hello World"; defines a &str (not String).

  5. If you have a &str and want a new String, you can clone it either by to_owned() or to_string() (they are effectively the same). Both of those 2 methods will copy the memory and make a new String.

Convert &str to String

There are many ways to covnert a &str to a String.

  • &str.to_string
  • &str.to_owned
  • String::from
  • &str.into (Into is the reciprocal of From)
  • String.push_str

The first 4 ways are equivalent. For more detailed discussions, please refer to How do I convert a &str to a String in Rust? .

Convert String to &str

Assume s is a value of String, there are (at least) 3 way to convert it to &str.

  • &s[..]
  • &*s
  • s.as_str()
  • s.as_ref()

I personally perfer as.as_str() or s.as_ref().

In [2]:
let s = "how are you".to_string();
s
Out[2]:
"how are you"
In [4]:
s.as_str()
Out[4]:
"how are you"
In [9]:
{
    let s2: &str = s.as_ref();
    s2
}
Out[9]:
"how are you"

&str vs String vs AsRef<str> for Function Parameters

  1. &str is preferred over String as function paraemters unless you really need to own a string value in the function (in which case you need String).

  2. If you need even a more generic string parameter or if you need a generic item type for a collection, you have to use AsRef<str>. For more discussions, please refer to AsRef .

&str

Primitive, immutable, fixed length.

In [2]:
let mut s: &str = "how are you";
s
Out[2]:
"how are you"
In [4]:
let s2 = String::from("abc");
s2[0]
s2[0]
^^^^^ `String` cannot be indexed by `{integer}`
the type `String` cannot be indexed by `{integer}`
help: the trait `Index<{integer}>` is not implemented for `String`
In [3]:
s[0]
s[0]
^^^^ string indices are ranges of `usize`
the type `str` cannot be indexed by `{integer}`
help: the trait `SliceIndex<str>` is not implemented for `{integer}`
In [11]:
s + 'a'
s + 'a'
^ &str
s + 'a'
    ^^^ char
s + 'a'
  ^ 
cannot add `char` to `&str`
In [8]:
s.chars()
Out[8]:
Chars(['h', 'o', 'w', ' ', 'a', 'r', 'e', ' ', 'y', 'o', 'u'])
In [10]:
s.chars().nth(4)
Out[10]:
Some('a')
In [20]:
s.push('c2')
s.push('c2')
       ^^^^ 
character literal may only contain one codepoint
s.push('c2')
             expected one of `.`, `;`, `?`, `}`, or an operator here
expected one of `.`, `;`, `?`, `}`, or an operator, found `evcxr_variable_store`
s.push('c2')
  ^^^^ 
no method named `push` found for type `&str` in the current scope
In [21]:
s.is_empty()
Out[21]:
false
In [3]:
s.len()
Out[3]:
11

String

In [4]:
let s1: String = "Hello World!";
s1
let s1: String = "Hello World!";
                 ^^^^^^^^^^^^^^ expected struct `String`, found `&str`
let s1: String = "Hello World!";
        ^^^^^^ expected due to this
mismatched types
help: try using a conversion method

"Hello World!".to_string()
In [5]:
let mut s2: String = String::from("Hello World!");
s2
Out[5]:
"Hello World!"
In [12]:
s2 + 'a'
s2 + 'a'
     ^^^ expected `&str`, found `char`
mismatched types
In [13]:
s2.push('a')
Out[13]:
()
In [14]:
s2
Out[14]:
"Hello World!a"

Construct Strings

String::new

String::new creates an new empty string.

In [8]:
String::new()
Out[8]:
""

String::with_capacity creates a new emtpy string with the given capacity.

In [10]:
let my_str = String::with_capacity(2);
my_str
Out[10]:
""
In [12]:
my_str.capacity()
Out[12]:
2

Cases of String

  1. The to_*case methods return a new String object (mainly because changing the case of non-ASCII character might change the length of the string). The make_ascii_*case methods changes cases in place (as changing the case of ASCII characters won't change the length of the string).

  2. to_*case methods change the case of all characters while to_ascii_*case methods only change the case of ASCII characters and leave non-ASCII characters unchanged.

to_lowercase and to_uppercase

to_ascii_lowercase and to_ascii_uppercase

make_ascii_lowercase and make_ascii_upper

chars

contains

get

In [2]:
let s: String = String::from("Hello World!");
s.get(0..3)
Out[2]:
Some("Hel")
In [5]:
let s: String = String::from("Hello World!");
let ss = s.get(0..3).unwrap().to_string();
ss
Out[5]:
"Hel"

join

In [6]:
["a", "b"].join("")
Out[6]:
"ab"
In [7]:
['a', 'b'].join("")
['a', 'b'].join("")
           ^^^^ method not found in `[char; 2]`
no method named `join` found for array `[char; 2]` in the current scope
In [6]:
vec!["a", "b"].join("")
Out[6]:
"ab"
In [7]:
vec![String::from("a"), String::from("b")].join("")
Out[7]:
"ab"

len

matches

An iterator over the disjoint matches of a pattern within the given string slice.

The pattern can be a &str, char, a slice of chars, or a function or closure that determines if a character matches.

In [5]:
"abcXXXabcYYYabc".matches("abc").collect::<Vec<_>>()
Out[5]:
["abc", "abc", "abc"]
In [8]:
char::is_numeric('a')
Out[8]:
false
In [7]:
char::is_numeric('1')
Out[7]:
true
In [10]:
"1abc2abc3".matches(char::is_numeric).collect::<Vec<_>>()
Out[10]:
["1", "2", "3"]

replace

parse (Convert String to Other Types)

String Conversions

Convert an integer to string.

In [2]:
let s = 123.to_string();
s
Out[2]:
"123"
In [3]:
1.to_string()
Out[3]:
"1"

Convert a string to bytes.

In [8]:
"1".as_bytes()
Out[8]:
[49]
In [4]:
1.to_string().as_bytes()
Out[4]:
[49]
In [5]:
1i32.to_be_bytes()
Out[5]:
[0, 0, 0, 1]

Convert the string back to integer.

In [6]:
s.parse::<i32>()
Out[6]:
Ok(123)
In [7]:
s.parse::<i32>().unwrap()
Out[7]:
123

push

You cannot concatenate a char to a string using the + operator. However, you can use the String.push method to add a char to the end of a String.

push_str

is_empty

split

In [3]:
"".split(",").collect::<Vec<&str>>()
Out[3]:
[""]
In [4]:
"".split(" ").collect::<Vec<&str>>()
Out[4]:
[""]
In [5]:
"1,2,3".split(",")
Out[5]:
Split(SplitInternal { start: 0, end: 5, matcher: StrSearcher { haystack: "1,2,3", needle: ",", searcher: TwoWay(TwoWaySearcher { crit_pos: 0, crit_pos_back: 1, period: 1, byteset: 17592186044416, position: 0, end: 5, memory: 0, memory_back: 1 }) }, allow_trailing_empty: true, finished: false })
In [6]:
let mut it = "1,2,3".split(",");
it
Out[6]:
Split(SplitInternal { start: 0, end: 5, matcher: StrSearcher { haystack: "1,2,3", needle: ",", searcher: TwoWay(TwoWaySearcher { crit_pos: 0, crit_pos_back: 1, period: 1, byteset: 17592186044416, position: 0, end: 5, memory: 0, memory_back: 1 }) }, allow_trailing_empty: true, finished: false })
In [21]:
it.next()
Out[21]:
Some("1")
In [22]:
it.next()
Out[22]:
Some("2")
In [23]:
it.next()
Out[23]:
Some("3")
In [24]:
it.next()
Out[24]:
None
In [7]:
let v: Vec<&str> = "1,2,3".split(",").collect();
v
Out[7]:
["1", "2", "3"]
In [8]:
v.capacity()
Out[8]:
4
In [9]:
let v: Vec<i8> = "1,2,3".split(",").map(|x| x.parse::<i8>().unwrap()).collect();
v
Out[9]:
[1, 2, 3]
In [10]:
v.capacity()
Out[10]:
8

split_whitespace

In [23]:
"how are you".split_whitespace()
Out[23]:
SplitWhitespace { inner: Filter { iter: Split(SplitInternal { start: 0, end: 11, matcher: CharPredicateSearcher { haystack: "how are you", char_indices: CharIndices { front_offset: 0, iter: Chars { iter: Iter([104, 111, 119, 32, 97, 114, 101, 32, 121, 111, 117]) } } }, allow_trailing_empty: true, finished: false }) } }
In [27]:
for word in "how are you".split_whitespace() {
    println!("{}", word);
}
how
are
you
Out[27]:
()

trim(&self) -> &str

Returns a string slice with leading and trailing whitespace removed.

In [2]:
"  how\n".trim()
Out[2]:
"how"

trim_end(&self) -> &str

Returns a string slice with trailing whitespace removed.

In [8]:
"  how\n".trim_end()
Out[8]:
"  how"

trim_start(&self) -> &str

Returns a string slice with leading whitespace removed.

In [9]:
"  how\n".trim_start()
Out[9]:
"how\n"

with_capacity

In [25]:
let ss = String::with_capacity(3);
ss
Out[25]:
""
  1. You cannot use print an integer directly. Instead, you have to convert it to a String first.

  2. It is suggested that you use println!("{}", var); to print the variable to terminal so that you do not have to worry about its type.m

In [2]:
println!(5)
format argument must be a string literal
In [3]:
println!("{}", 5)
5
Out[3]:
()
In [5]:
println!("My name is {} and I'm {}", "Ben", 34);
My name is Ben and I'm 34
In [6]:
println!("{0} * {0} = {1}", 3, 9);
3 * 3 = 9
In [7]:
println!("{x} * {x} = {y}", x=3, y=9);
3 * 3 = 9

Placeholder Traits

In [12]:
println!("Binary: {v:b}, Hex: {v:x}, Octol: {v:o}", v = 64);
Binary: 1000000, Hex: 40, Octol: 100
In [13]:
println!("{:?}", ("Hello", "World"));
("Hello", "World")

The concat! Macro

Concatenates literals into a static string slice.

Concatenate a String and a Char

In [2]:
let mut my_str = String::from("Hello World");
my_str.push('!');
my_str
Out[2]:
"Hello World!"

Concatenate Several Strings Together

The GitHub repo dclong/conccatenation_benchmarks-rs has a summary of different ways of joining strings and their corresponding performance.

Concatenate Strings in an Array/Vector

In [7]:
["how", "are", "you"].join(" ")
Out[7]:
"how are you"
In [8]:
vec!["how", "are", "you"].join(" ")
Out[8]:
"how are you"

Concatenate Strings in an Iterator

In [15]:
let v = vec!["how", "are", "you"];
v.into_iter().collect::<String>()
Out[15]:
"howareyou"
In [10]:
let v = vec!["how", "are", "you"];
v.into_iter().collect::<String>()
Out[10]:
"howareyou"
In [9]:
let arr = ["how", "are", "you"];
arr.into_iter().collect::<String>()
arr.into_iter().collect::<String>()
                ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=&&str>`
a value of type `String` cannot be built from an iterator over elements of type `&&str`
help: the trait `FromIterator<&&str>` is not implemented for `String`
In [11]:
let arr = ["how", "are", "you"];
arr.into_iter().copied().collect::<String>()
Out[11]:
"howareyou"
In [7]:
let v = vec!["how", "are", "you"];
v.into_iter().intersperse(" ")
v.into_iter().intersperse(" ")
              ^^^^^^^^^^^ 
use of unstable library feature 'iter_intersperse': recently added

Indexing a String

Indexing into a string is not available in Rust. The reason for this is that Rust strings are encoded in UTF-8 internally, so the concept of indexing itself would be ambiguous and people would misuse it. Byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character, which is really bad if you need text processing) while char indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string to find the required code point.

There are 2 ways to get chars out of a string. First, you can call the chars method which returns an iterator. This ways is not efficient of course if you want random access. Second, you can get the underlying bytes representation of a string by calling the as_bytes method (which returns a byte slice &[u8]. You can then index the byte slice and convert a u8 variable to char using the as keyword.

In [6]:
let s = String::from("how are you");
s[0]
s[0]
^^^^ `String` cannot be indexed by `{integer}`
the type `String` cannot be indexed by `{integer}`
help: the trait `Index<{integer}>` is not implemented for `String`
In [7]:
let s = String::from("how are you");
s.chars().next()
Out[7]:
Some('h')
In [8]:
let s = String::from("how are you");
s.as_bytes()[2] as char
Out[8]:
'w'

Slicing a String

In [4]:
"how are you"[..]
Out[4]:
"how are you"
In [3]:
"how are you"[..3]
Out[3]:
"how"
In [6]:
"how are you"[4..]
Out[6]:
"are you"
In [7]:
"how are you"[4..7]
Out[7]:
"are"

Format Strings

Please refer to Format Strings in Rust 1.58 for detailed discussions.

Third-party Libraries for String Manipulation

indoc https://github.com/dtolnay/indoc This crate provides a procedural macro for indented string literals. The indoc!() macro takes a multiline string literal and un-indents it at compile time so the leftmost non-space character is in the first column.

compact_str https://crates.io/crates/compact_str A memory efficient string type that transparently stores strings on the stack, when possible

In [ ]:

Comments