Manning publication gave me yet another funtastic opportunity to learn a new technology and contribute back to the technical community at the same time. I am privileged to help in the making of Manning’s Rust-In-Action written by Tim McNamara.
Rust In Action is for programmers who aspire to be system programmers without the background of C and C++. This book is not necessarily an in-depth review of every nook and cranny of Rust programming language (although there are quite a few details). It’s geared more towards programmers who want to learn system programming through a modern language.
If this book were could be named “System Programming Using Rust”. It won’t be a misnomer. As such, Rust In Action is also a fine name. After all, you learn Rust through application to system programming.
Who is this book for
This book is for folks new to Rust AND systems programming. The readers learn to do various systems programming tasks using Rust. A number of operating system aspects are covered in this book including memory, stack, heap, pointers, virtual memory, file I/O, signals and interrupts, clocks, network programming, and more. In that sense it’s a very good starting point for CS students learning system programming fundamentals in a modern language.
From Rust language point of view significant material is covered including program structure, lifetime, borrow and ownership semantics, pointers, references, functions, closures, smart pointers, generics, traits and so on. A number of examples show usage of standard and third-party libraries including argument parsing, file i/o, reference counting, common traits, and so on. Each chapter includes good commentary about Rust tooling including Cargo commands, debug/release program compilation, package management, creating crates, taking third-party crate dependencies etc.
Having said that, it feels a little light on deeper aspects core Rust and systems programming. Being a language guy, I was hoping to dive into Rust, the language, as much as possible. I’m generally ready for clever programs and knock-your-socks-off kind of abstractions— to learn but not necessarily in production code ;). You won’t find them in this book. That’s not the focus here. Secondly, proficient system-level programmer will learn a new syntax and libraries for doing what you already know—albeit with more safety.
Let’s look at it chapter by chapter. The source code for each chapter is available on github.
Chapter 1: Introduction
A number of things stand out in chapter 1. The key selling point of Rust are Safety, control, productivity, and fearless concurrency. Rust has a standard build system, package manager, and documentation generator called Cargo. There are library and app templates for Cargo, which help get a project off the ground very fast. It also provides an ability for library writers to tweak the compiler’s behavior with plugins. Data in Rust is immutable by default. This aids in safety and concurrency aspects. Finally, errors have standard numbers in Rustc—the Rust compiler. For example, see error E0308. Neat.
error[E0308]: mismatched types
--> hello_world.rs:3:6
|
3 | if a = 10 {
| ^^^^^^
|
|
| expected bool, found ()
| help: try comparing for equality: `a == 10`
|
= note: expected type `bool`
found type `()`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0308`.
The content in the first chapter is a little sparse. Too much text is about the community and community’s goals. I would have liked seeing something that blew my mind. Perhaps something from generics, concurrency, safety, expressiveness, etc. The example code including UTF-8 German and Japanese letters is interesting but not enough to my taste. The book barely mentions the paradigms supported by the language. I think the book mentions that Rust is not object-oriented. Paradigm-level comparison with other languages would be nice to have.
Chapter 2: Language Foundations
Chapter 2 in the book is where you really start to get a feel for the languages. There are examples of standard library, regex library, argument parsing library, pattern matching, I/O, error-handling, command-line argument parsing, reading from stdin, and constrained generics, etc.
Cargo makes life really easy. A good standard package manager and build system is a must for a modern language. All details of linking/libs are hidden and still the programmers are productive in Rust.
Some highlights from chapter 2
- Signed and unsigned integers. i8, i16, i32, i64, u8, u16, u32, u64, isize, usize (native CPU width)
- String and str are two different types. Compare that to C++ char *, const char *, std::string, and more.
- Size of the array matters to the type system. For example, [u8; 3] is a different type than [u8; 4].
- Slices for arrays resembles a lot to C++ spans.
- Rust has Scala-like pattern matching called match.
- Command line arguments are not passes via arguments to main().
- Regex pattern matching, argument parsing, file I/O (e.g., open file, read line) all return Result<T,E> an either type, which minimally must be unwrapped. I.e., unwrapping an error result is by-default a panic (like an exception).
- Regex.find(&line). Pass-by-reference is a caller’s responsibility. This is the first time you encounter Rust automatic move-semantics
Chapter 3: Compound Data Types
The third chapter talks about compound data types including structs, enumerations, and Rust’s Result type. Structs combined with Traits appear a lot like objects. A thing that stands out about Rust structs is separation of struct definition and application of a trait (an interface) to a struct. I.e., It seems possible to attach multiple traits to a struct type after the fact. This is a really nice feature. Interfaces in C++, Java are intrusive—they must be BOTH known AND attached at the time of definition of the struct/class. Rust looks different. Here’s an example from the book.
#[derive(Debug)]
struct File {
name: String,
data: Vec,
}
trait Read {
fn read(self: &Self, save_to: &mut Vec) -> Result;
}
impl Read for File {
fn read(self: &File, save_to: &mut Vec) -> Result {
Ok(0)
}
}
First, Read is defined after File. Later, the capability of Read (i.e., read function) is “attached” to File after it is defined. Third, &self is of type non-mutable reference to File. I wonder though if it works across translation units if different translation units apply different set of traits or different implementations of the same trait. Basically, how does Rust ensure C++’s ODR (one definition rule)?
This chapter is lacking enough discussion of references and full discussion of & (which is chapter #4). So it’s unclear why match would use *self but accessing struct members is self.name and self.state.
Constructors in Rust are named constructors. For example, File::new(…). In this case, new is a convention. I wonder if they can be overloaded. I tried. Overloading did not compile. Rust has unsafe keyword and block. It provides the same level of safety offered by C at all times. There’s const keyword for values that may never change.
A few thoughts popped up while reading this chapter.
- Does Rust String have small-string optimization? It does not look like it. See some relevant discussion here (https://internals.rust-lang.org/t/short-string-optimization/8436).
- Result is an enum type in Rust. It looks very handy.
- Function new is a convention. For example, File::new(…). So that implies named constructors is a default convention. However, in C++, names constructor benefit significantly from overloading. Overloaded new did not seem to compile in Rust. No overloading, seriously?
- Do Rust functions have return type deduction? It does not! Rust employs local type inference, but not global type inference. Here’s some rationale (https://stackoverflow.com/questions/24977365/differences-in-type-inference-for-closures-and-functions-in-rust/24977576#24977576).
- So looks like no expression templates (https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template) as in C++. C++ expression templates allow mind-boggling level of abstraction at zero cost.
- The “unsafe” keyword and block—the same level of safety offered by C. The “const” keyword—values that may never change.
Chapter 4: Lifetimes, Ownership, and Borrowing
This is the chapter I’m primarily interested in. I’ve heard a lot about linear types–move semantics by default. In Rust, compound data types have move semantics by default. I.e., Simple expressions like assignments are in fact moves. Passing objects to and from functions is a move by default. Move occurs when a compound data type does not implement the Copy trait. Due to default move semantics, one of the main things compiler does is track use-after-move errors and fail compilation. That’s kinda neat.
The types that implement the Copy trait is basically same trivially_copyable types in C++. I.e., simple byte by byte copy is sufficient to create another value. Types with Clone trait implementation are types with user-defined copy-constructor. Primitive types have copy-semantics by default.
This example is the simplest example of move semantics checked by Rust compiler
OK | Compiler Error |
#[derive(Debug, Copy, Clone)] | #[derive(Debug)] |
The best part of course is the clarity of error diagnostics. Just take a look yourself.
error[E0382]: use of moved value: `sat_a`
--> ch4-check-sats-clone-and-copy-traits.rs:25:32
|
23 | let mut sat_b = sat_a;
| --------- value moved here
24 | sat_b.id = 1;
25 | println!("a: {:?}, b: {:?}", sat_a, sat_b);
| ^^^^^ value used here after move
|
= note: move occurs because `sat_a` has type `CubeSat`, which does not implement the `Copy` trait
error: aborting due to previous error
The compiler is making abundantly clear why it does not like your program. Hats off to the Rust compiler writers!
Attempting to overwrite a value that’s still available from elsewhere in the program will cause the compiler to refuse to compile your program. There is a distinction between a value’s lifetime and its scope. When values go out of scope or their lifetimes end for some other reason, their destructors are called. A destructor is a function that removes traces of the value from the program by deleting references and freeing memory.
To provide a custom destructor for a type, implement Drop trait. This will typically be needed in cases where you have used unsafe blocks to allocate memory. Drop has one method, drop(&mut self) that you can use to conduct any necessary wind up activities. It is possible to return ownership of objects back to their original (caller’s scope) variables via functions’ return values.
In Rust Copy semantics can be controlled using Copy and Clone. Copy is implicit. Copy is a shallow copy and C++ does it by default. Clone is potentially deep expensive copy (a copy-constructor). Copy implies Clone. That’s weird.
The book mentions four general strategies can help with ownership issues:
- Use references where full ownership is not required (pass-by-reference)
- Duplicate the value (copy-semantics)
- Refactoring code to reduce the number of long-lived objects
- Wrap your data in a type designed to assist with movement issues
I could not help but wonder what happens when an object is borrowed but there’s an exception before using it? Is the ownership passed back to the caller?
The main Rust highlights I learned from this chapter are as follows.
- Attempting to overwrite a value that’s still available from elsewhere in the program will cause the compiler to refuse to compile your program.
- There is a distinction between a value’s lifetime and its (lexical) scope. This is true in case C++ as well although not by default.
- An owner cleans up when its value’s lifetimes end. Basically, destructors.
- When values go out of scope or their lifetimes end for some other reason, their destructors are called.
- To provide a custom destructor for a type, implement Drop. This will typically be needed in cases where you have used unsafe blocks to allocate memory. Drop has one method, drop(&mut self) that you can use to conduct any necessary wind up activities. The argument is a reference to a mutable *this.
- It is possible to return ownership of objects back to their original (caller’s scope) variables via functions’ return values
The questions that remain unanswered for me are
- What happens when an object is borrowed but there’s an exception before using it? Is the ownership passed back to the caller? In C++, just documenting this would suffice if the argument is of type Foo&&.
- Move seems to happen conditionally in Rust. If the function is unsuccessful before using the moved value, the function must move the value back to the caller via the return type. This affects the api.
What could be better in this chapter?
- In section 4.5.4, a sentence suggests that rust allows programmers to opt-in to runtime garbage collection. The section talks about reference counting Rc<T>. Reference counting is not runtime garbage collection. An important difference between reference counting and garbage collection is determinism. Garbage collection as in Java/C# is non-deterministic. Reference Counting and destruction is deterministic, right? Reference counting is not runtime garbage collection.
- Dereferencing is not explained in this chapter. It has been used earlier but it still remains illusive. Other than that this chapter is a good read. Reference counting is introduced but no discussion about breaking the cycles is presented. Not sure if it’s later in the book.
Chapter 5: Data In Depth
This chapter goes in great detail to shed light on low-level bit manipulation and CPU instruction processing. This chapter gives programmers a taste of low level machine representation of integers. Integer overflow, underflow are interesting concepts known in systems programming circles but not necessarily to freshers. I skimmed this chapter as the pain of trying to understand low-level bit manipulation was really excruciating.
The following quote stood out from the chapter.
Developing strategies for preventing integer overflow is one of the ways that systems programmers are distinguished from others.
Chapter 6: Memory
The chapter on memory is ambitious and at the same time not enough. It’s leaning towards computer memory management, operating systems, virtual memory, paging, low-level program image kind of things (an area dominated by C forget C++ and Rust) rather than the “language” aspects to simplify programming. It tries to open up low level details of a computer but if you already know them, there isn’t anything to learn here. At the same time, you feel slightly cheated that a chapter that would been fine place for Box, Rc, Arc, Arena, etc, get only lip-service.
A chapter on memory in a high-level programming language book should perhaps focus on managing struct sizes, cache locality, combined control block and data optimization (std::make_shared kid of optimizations), small string optimizations, if it exists in Rust, Weak pointers, shared_from_this, etc.
This chapter feels like someone rewrote a memory chapter from an operating systems book and sprinkled some Rust. I felt it overuses unwrap(). To bypass error handling, there are some usages like unwrap().unwrap(). This is simplistic and dying for “for comprehensions” for the Option type.
This chapter sets expectation to learn pointers, smart pointers, stack, heap, etc. It feels like reading a book on C. Rust Foreign Function Interface (FFI).
The following things stand out in the chapter.
Rust’s std::os::raw::c_char is like C’s char—the sizeof char is not set by the standard. Same with Rust. Slices have no compile-time length. Internally, they’re a pointer to some part of an array.
As a library author, it can simplify downstream application code if you could accept both &str and String types to your functions. For example the following code does not work with because pw is &str.
fn is_strong(password: String) -> bool {
password.len() > 5
}
let pw = “justok”;
is_strong(pw);
Generics and implicit conversion strategies must be used to allow the program to work.
fn is_strong(passwd: String) -> bool { | let s : String = String.from(“justok”); is_strong(s); // OK is_strong(s); // NOT OK. Use-after-move is_strong(“justok”); // NOT OK. type mismatch |
fn is_strong(passwd: &str) -> bool { | is_strong(“justok”); // OK let s : String = String.from(“justok”); is_strong(s); // does not compile |
fn is_strong<T: AsRef<str>>(passwd: T) -> bool { | is_strong(“justok”); // OK let passwd : String = String::from(“justok”); is_strong(passwd); // OK is_strong(passwd); // NOT OK. Huh! // use-after-move. It’s still a compound type. |
fn is_strong<T: AsRef<str>>(passwd: T) -> bool { | let passwd : &str = “justok”; is_strong(passwd); // OK is_strong(passwd); // OK. Not a use-after-move |
fn is_strong<T: Into<String>>(passwd: T) -> bool { | let passwd : String = String::from(“justok”); is_strong(passwd); // OK is_strong(passwd); // Not OK. Use-after-move |
fn is_strong<T: Into<String>>(passwd: T) -> bool { | let passwd : &str = “justok”; is_strong(passwd); // OK is_strong(passwd); // OK. Not a use-after-move |
AsRef<String> | Exploded—doesn’t have a size known at compile-time. Huh! |
There are a lot of moving parts here. I struggled here to pick the right type just like rookie C++ programmers struggle in modern C++. Clearly, if you know C++ well, learning Rust requires some non-trivial unlearning and new learning. Hopefully, the samples above give you a decent idea of the level of thinking programmers have to do to pass a string ergonomically and efficiently.
- Overloading is disallowed in Rust. So you have to choose the right version. I feel powerless.
- The book suggests this: “To improve dynamic allocation speed one alternative is to use arrays of uninitialized objects. But you’re circumventing Rust’s lifetime checks.”
For programmers well-versed in C and C++, there’s a difference between “a const pointer to T” and “a pointer to const T” and they can’t be used interchangeably. It would be very helpful to clarify if that distinction exists in Rust. I.e., “*mut T” and “*const T”.
Chapter 7: Files and Storage
Chapter 7 deals with Storage, file I/O, checksums, endianness, HashMap and BTreeMap data structures. By the end of the chapter, you would have built a working key-value store that’s guaranteed to be durable to hardware failure at any stage.
Chapter 8: Networking
The chapter on networking looks encouraging as it peels the layers of the HTTP network stack one-by-one. Lot of error handling and a thorough discussion to trait objects.
Compiling ch-8/ch8-simple was a breeze. The following Cargo.toml downloaded all the necessary dependencies and compiled the main program under a minute.
[package]
name = "ch8-simple"
version = "0.1.0"
authors = ["Tim McNamara <code@timmcnamara.co.nz>"]
edition = "2018"
[dependencies]
reqwest = "0.9"
And the program.
extern crate reqwest;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let url = "http://www.rustinaction.com/";
let mut response = reqwest::get(url)?;
let content = response.text()?;
print!("{}", content);
Ok(())
}
Box<dyn std::error::Error> is a trait object which enables runtime polymorphism. Trait objects are a form of type erasure. The compiler does not have access to the original type. &Trait is a reference to something that implements “Trait” whereas &Type is a reference to an object of type Type. Traits are used to create collections of heterogenous objects and dynamic dispatch.
Next, the chapter talks about ergonomic error handling. Rust has short form syntax for unwrapping a Result<T, E> type. It’s just a ? (question mark). Here’s an example.
use std::fs::File;
use std::net::Ipv6Addr;
fn main() -> Result<(), std::io::Error> {
let _f = File::open("invisible.txt")?;
let _localhost = "::1".parse::<Ipv6Addr>()?;
Ok(())
}
Did you notice it? It’s so inconspicuous. The ? is roughly equivalent to the following pseudo-code.
macro try {
match File::open("invisible.txt") {
Result::Ok(val) => val,
Result::Err(err) => {
let converted = convert::From::from(err);
return Result::Err(converted);
}
});
}
The chapter later on describes creating an enumeration type UpstreamError that is a union (Either type) of IO and ParseError. This technique enumerates error types of two unrelated libraries into one. I’m unsure about the craftsmanship of such composite error types. As more and more libraries are used, such a union type may bloat over time having to support conversion from a myriad (unrelated) error types. On the flip side it allows only enumerated error types to be converted—adding stronger type safety.
#[derive(Debug)]
enum UpstreamError{
IO(io::Error),
Parsing(net::AddrParseError),
}
impl From<io::Error> for UpstreamError {
fn from(error: io::Error) -> Self {
UpstreamError::IO(error)
}
}
impl From<net::AddrParseError> for UpstreamError {
fn from(error: net::AddrParseError) -> Self {
UpstreamError::Parsing(error)
}
}
fn main() -> Result<(), UpstreamError> {
let _f = File::open("invisible.txt")?; // Calls From::from
let _localhost = "::1".parse::<Ipv6Addr>()?; // Calls From::from
Ok(())
}
I liked how Rust compiler inserts From::from(err) calls where ? is used. The extra conversion function (hook) allows descriptive error messages to be retained and bubble up. Rust-lang.org has an interesting how try! macro evolved into the ? syntax.
An alternative is to use unwrap() and expect() but that’s like assuming that an Option will always have a value.
This chapter covers a lot of ground including MAC addresses, TCP, UDP, DNS, error handling, traits, and Rust crates such as smoltcp, std::net, trust_dns, etc. It has a good balance to language and systems knowledge.
Chapter 9: Time and Time Keeping
This chapter has interesting discussion of a variety of different clocks: realtime clock, system clock, monotonically increasing clock, steady clock, high accuracy clock, high resolution clock, fast clock, and atomic clock. This is the longest list of types of clocks I’ve ever seen. Neat.
It also talks about setjmp and longjmp—which never gets stale as it’s a really old nifty tool to hack program stack.
My general thoughts after reading this chapter.
- Rust has #[cfg(not(windows))] which works a lot like conditional macros in C. Basically, there’s no escape from platform-specific code unless you are using managed runtimes (JVM, .NET).
Chapter 10: Processes, Threads, and Containers
This chapter introduces Rust closures—anonymous functions. It also exposes you to the standard library and crossbeam and rayon crates. Crossbeam is used for asynchronous messaging passing whereas rayon is used for parallel programming in Rust.
The book shows a comparison between thread::sleep and spinning loop to kill wall clock time. Data shows that as the number of threads increases beyond the number of cores (hyper-threaded) in a CPU, the accuracy of operating system sleep(20ms) is better than a spinning for 20ms. At 500 threads and beyond the variance in “spinning pause” is significantly higher than “OS sleep”. What if the spin loop is much shorter than 20ms?
Even for a small number of threads (less than 20), comparing the time taken to wait for 20ms using sleep and spin loop strategies, shows that sleep is more accurate (less variance) than spinning.
Mutex and Arc (Atomic Reference Count) are not unified into a single type in Rust to provide programmers with added flexibility. Consider a struct with several fields. You may only need a Mutex on a single field, but you could put the Arc around the whole struct. This approach provides faster read access to the fields that are not protected by the Mutex. A single Mutex retains maximum protection for the field that has read/write access.
Chapter 11: Kernel
Chapter 11 in this book describes how a minimalistic operating system kernel can be built using Rust. Why would you do that? Well, very small embedded devices have just one program running in them. It is intriguing that Rust caters to such environments. I had no idea.
Rust build-system comes ready with cross compilation to many platforms. On my Mac installation of rust, “rustc target list” showed 77 architectures. That’s cool.
Rust enums can specify size of each enumeration using #[repr(u8)] annotation. C++ equivalent would be an “enum class Foo : uint8_t { …}”. Writing to raw memory referred by a pointer can be done in two different varieties. The following two Rust snippets are equivalent.
let mut framebuffer = 0xb8000 as *mut u8;
unsafe {
framebuffer.offset(1).write_volatile(0x30);
}
And direct pointer arithmetic.
let mut framebuffer = 0xb8000 as *mut u8;
unsafe {
*(framebuffer + 1) = 0x30;
}
This chapter goes into a lot of low level details such writing to VGA framebuffer, kernel panic handler, the halt instruction, etc. This low-level fiddling with memory reminded me of MS-DOS programming back in 1990s. “Advanced MS-DOS Programming” by Ray Duncan anyone?
Chapter 12: Signals, interrupts, exceptions
I feel this chapter’s name should not include interrupts and perhaps exceptions either. Simply “signal handling” because both interrupts as in hardware interrupts are not discussed much. Most programmers think about Exceptions as in language level control-flow rather than Intel’s definition of exceptions. Unless this chapter is extended to include hardware interrupts and exceptions.
All in all, this book is a good start for aspiring system programmers. For seasoned system programmers wanting to learn the language the book may be a good start. For hardcore language fanatics, I would recommend the book by Carol Nichols and Steve Klabnik—The Rust Programming Language.