While writing the nix-nar
crate, I had to bend Rust’s Iterator
API to do things it wasn’t designed for. The resulting code looks strange, so let’s walk through increasingly complex implementations to see why it has to be that way.
The full code for this post is here.
The goal
The nix-nar
crate implements an encoder and decoder for the NAR format. NAR is like tar, but the encoding rules are stricter such that there’s only one valid NAR encoding for a set of files. For instance, this is useful if we’re doing repeatable builds, and storing artefacts in NAR files. Then, two builds are guaranteed to result in the exact same NAR, whereas with tar, the order of files might be different, resulting in different hashes for the archive.
The encoder API is easy, but the decoder has requirements that don’t mesh well together:
-
The decoder should list the files in the archive, one at a time. However, it should not load the whole list at once to avoid problems with archives of very many small files,
-
Users should be able to choose whether to load a file from the archive or not. If they choose to skip a file, it shouldn’t be loaded into memory,
-
More generally, files should never be fully stored in memory to avoid problems with big files, and
-
The file listing and file reading APIs should follow Rust conventions and use standard traits where possible.
Ultimately, we want usage to look like this:
let dec = Decoder::new(BufReader::new(File::open("my-archive.nar")?))?;
for entry in dec.entries()? {
let entry = entry?;
match entry.content {
Content::Directory => create_dir(entry.path)?,
Content::Symlink { target } => create_symlink(entry.path, target)?
Content::File { mut data, .. } => {
let mut out = File::create(entry.path)?;
io::copy(&mut data, &mut out)?;
}
}
}
The key bits of Rust machinery in use are the Read
and Iterator
traits. The data
in Content::File
above is a struct that behaves like a readable file, but it actually references a slice of bytes in the real underlying file. Because it implements the Read
trait, this is transparent to users.
The Iterator
implementation in dec.entries()
is more convoluted. Before we delve into it, let’s look at a simpler example.
A simple Iterator
over a Vec
The simplest iterator we could possible write is that over a Vec
.
pub struct MyVec<T>(Vec<T>);
pub struct MyIterator<'a, T> {
vec: &'a MyVec<T>,
idx: usize,
}
impl<T> MyVec<T> {
pub fn new(vec: Vec<T>) -> Self {
Self(vec)
}
pub fn iter(&self) -> MyIterator<T> {
MyIterator { vec: self, idx: 0 }
}
}
impl<'a, T> Iterator for MyIterator<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
if self.idx < self.vec.0.len() {
self.idx += 1;
Some(&self.vec.0[self.idx - 1])
} else {
None
}
}
}
We define the MyVec
wrapper around a real Vec
, and the MyIterator
struct which will be our iterator. The latter holds a reference to MyVec
so that it can access elements, and it also stores the current index into the vector.
To implement Iterator
, we need to declare the type of the Item
s returned by next()
, and then define next()
itself. Per the Iterator
trait, next
takes a mutable reference to the iterator, and returns read-only references to the underlying values of type T
. This function signature will make our jobs harder in a bit.
Interlude: A Read
er with predictable output
In the next section, we’ll write an iterator which reads blocks from files. As in, it will return the first 10 bytes of the file, then the next 10 bytes, and so on. This is simpler than parsing NAR files, and it illustrates all the interesting problems.
Before that, we need something to run the iterator on. We could use a concrete file, but the Read
trait lets us generate configurable data at runtime:
use std::io::{self, Read};
// A struct implementing `Read` which generates 'a' for 10 bytes,
// then 'b' for 10 bytes, etc.
pub struct AlphabetReader {
idx: u64,
size: u64,
}
impl AlphabetReader {
pub fn new(size: u64) -> Self {
Self { idx: 0, size }
}
}
impl Read for AlphabetReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.idx >= self.size {
Ok(0)
} else {
buf[0] = 65 + ((self.idx / 10) % 26) as u8;
self.idx += 1;
Ok(1)
}
}
}
The main attraction in Read
is the fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>
function. We’re given a buffer, and we’re supposed to fill as much of it as possible, and then return the number of bytes written. In our implementation, we fill just one byte because it’s easy, and it’s more likely to reveal bugs later.
An Iterator
with eager reads
Now that we have our data source, let’s write a strict block iterator which loads everything into memory. This way, we avoid having to worry about ownership and references for now.
use std::io::Read;
pub struct MyReader<R: Read>(R);
pub struct Blocks<R> {
reader: R,
block_size: u64,
}
impl<R: Read> MyReader<R> {
pub fn new(reader: R) -> Self {
Self(reader)
}
pub fn blocks(self, block_size: u64) -> Blocks<R> {
Blocks {
reader: self.0,
block_size,
}
}
}
impl<R: Read> Iterator for Blocks<R> {
type Item = Result<Vec<u8>, std::io::Error>;
fn next(&mut self) -> Option<Self::Item> {
let mut buf = vec![0; self.block_size as usize];
let mut bytes_read = 0;
loop {
match self.reader.read(&mut buf[bytes_read..]) {
Ok(0) => break,
Ok(n) => {
bytes_read += n;
if bytes_read as u64 == self.block_size {
break;
}
}
Err(err) => return Some(Err(err)),
}
}
if bytes_read == 0 {
None
} else {
buf.truncate(bytes_read);
Some(Ok(buf))
}
}
}
We wrap the given Read
into MyReader
, so that we can define methods on it. Our iterator is Blocks
, which takes the Read
from MyReader
, and also stores the requested block size.
The iterator returns Result<Vec<u8>, std::io::Error>
. Everything is wrapped in a Result
so that we can pass IO errors the user. The actual bytes from the file are in the Vec<u8>
. Since Vec
is an owned type, we don’t have to deal with borrowing here–nothing from outside the iterator ever references data inside of it.
We try this out with a simple loop over the iterator’s values:
let alpha_reader = alphabet_reader::AlphabetReader::new(100);
let reader = blocks1::MyReader::new(alpha_reader);
const BLOCK_SIZE: u64 = 10;
for (idx, block) in reader.blocks(BLOCK_SIZE).enumerate().take(10) {
let block = block?;
println!("block {idx}: {}", std::str::from_utf8(&block)?);
}
$ cargo run -q -- blocks1
block 0: AAAAAAAAAA
block 1: BBBBBBBBBB
block 2: CCCCCCCCCC
block 3: DDDDDDDDDD
block 4: EEEEEEEEEE
block 5: FFFFFFFFFF
block 6: GGGGGGGGGG
block 7: HHHHHHHHHH
block 8: IIIIIIIIII
block 9: JJJJJJJJJJ
This works, but reading each block into memory isn’t good. If this were a real archive file, and we encountered a multi-gigabyte Blu-ray image, we’d be in trouble.
Manual iteration with lazy reading
Keeping with the theme of solving simpler problems, let’s now write a trait that lets us walk through the file blocks without reading them eagerly. This isn’t what we ultimately want because it’s not an Iterator
, but it shows that the problem is solvable.
use std::io::{Read, Take};
pub struct MyReader<R: Read> {
reader: R,
}
pub struct Blocks<R> {
reader: R,
block_size: u64,
}
impl<R: Read> MyReader<R> {
pub fn new(reader: R) -> Self {
Self { reader }
}
pub fn blocks(self, block_size: u64) -> Blocks<R> {
Blocks {
reader: self.reader,
block_size,
}
}
}
impl<R: Read> Blocks<R> {
pub fn next(&mut self) -> Option<Take<R>> {
// 👇 create a `Read` over a slice of the data
Some(self.reader.take(self.block_size))
}
}
This is the same code as before, except for the next()
function. The magic happens in the returned Take
. This is a type which wraps a Read
, but only lets users access a slice of the underlying data.
We try compiling and get an error:
error[E0507]: cannot move out of `self.reader` which is behind a mutable reference
--> src/main.rs:224:18
|
224 | Some(self.reader.take(self.block_size))
| ^^^^^^^^^^^ move occurs because `self.reader` has type `R`, which does not implement the `Copy` trait
The Take
isn’t just wrapping the R
reader–it’s also taking ownership of it. This would be fine if we called next()
only once, but we want to call it multiple times, so the iterator’s return value can’t steal R
. We can fix this by calling take
on a reference to R
. This way, each return value gets a copy of the reference to the reader. This also means we can have multiple references to the reader at the same time, but more on this later.
impl<R: Read> Blocks<R> {
// 👇 reference
pub fn next(&mut self) -> Option<Take<&R>> {
// 👇 take reference
Some((&self.reader).take(self.block_size))
}
}
error[E0599]: the method `read_exact` exists for struct `std::io::Take<&AlphabetReader>`, but its trait bounds were not satisfied
--> src/main.rs:36:31
|
36 | block.read_exact(&mut buf)?;
| ^^^^^^^^^^ method cannot be called on `std::io::Take<&AlphabetReader>` due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
`&AlphabetReader: std::io::Read`
which is required by `std::io::Take<&AlphabetReader>: std::io::Read`
The problem now is that Take
requires Read
, and while Read
is implemented for AlphabetReader
(the concrete type of the Read
is being revealed by the generics instantiation), it is not implemented for &AlphabetReader
. We can implement it ourselves, but first we need to wrap R
into a struct:
pub struct MyReader<R: Read> {
inner: MyReaderInner<R>,
}
pub struct MyReaderInner<R> {
reader: R,
}
pub struct Blocks<R> {
reader: MyReaderInner<R>,
block_size: u64,
}
Now, we implement Read
for &mut MyReaderInner<R>
. The reference has to be mutable in order to call read
on it.
impl<R: Read> Read for &mut MyReaderInner<R> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let i = self.reader.read(buf)?;
Ok(i)
}
}
impl<R: Read> Blocks<R> {
// 👇 the tower of types grows ever taller
pub fn next(&mut self) -> Option<Take<&mut MyReaderInner<R>>> {
Some(self.reader.take(self.block_size))
}
}
Let’s see if this works by iterating over the first few blocks:
const BLOCK_SIZE: u64 = 10;
const BUF_SIZE: usize = 10;
let mut idx = 0;
let mut blocks = reader.blocks(BLOCK_SIZE);
while idx < 10 {
match blocks.next() {
None => break,
Some(mut block) => {
let mut buf = [0; BUF_SIZE];
block.read_exact(&mut buf)?;
println!("block {idx}: {}", std::str::from_utf8(&buf)?);
}
}
idx += 1;
}
$ cargo run -q -- blocks_manual
block 0: AAAAAAAAAA
block 1: BBBBBBBBBB
block 2: CCCCCCCCCC
block 3: DDDDDDDDDD
block 4: EEEEEEEEEE
block 5: FFFFFFFFFF
block 6: GGGGGGGGGG
block 7: HHHHHHHHHH
block 8: IIIIIIIIII
block 9: JJJJJJJJJJ
It works! But that’s a lot of ceremony for what should’ve been a simple for loop. It’s twice as long as what we had before. We can do better: we just have to implement Iterator
for Blocks
. How hard can it be?
Iterating towards a lazy Iterator
We already have our next
function. Let’s try putting it into the Iterator
template:
impl<R> Iterator for Blocks<R> {
type Item<'a> = Take<&'a mut MyReaderInner<R>>;
fn next(&mut self) -> Option<Self::Item> {
Some(self.reader.take(self.block_size))
}
}
error[E0658]: generic associated types are unstable
--> src/main.rs:242:9
|
242 | type Item<'a> = Take<&'a mut MyReaderInner<R>>;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #44265 <https://github.com/rust-lang/rust/issues/44265> for more information
We need a lifetime for the &mut MyReaderInner<R>
reference, so we try to introduce it in the trait’s generic type, but the compiler doesn’t support that yet. The issue for Generic Associated Types is close to being merged, but it’s not there yet. We need a workaround.
Since we can’t introduce the lifetime in the type, let’s do it in the impl
declaration:
// 👇
impl<'a, R> Iterator for Blocks<R> {
type Item = Take<&'a mut MyReaderInner<R>>;
fn next(&mut self) -> Option<Self::Item> {
Some(self.reader.take(self.block_size))
}
}
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:255:10
|
255 | impl<'a, R> Iterator for Blocks<R> {
| ^^ unconstrained lifetime parameter
A lifetime introduced in the impl
must be used it in the same declaration. Easily done: we can turn Blocks<R>
into Blocks<'a, R>
by adding a PhantomData<&'a R>
to it:
// 👇 a lifetime to use later in the Iterator
pub struct Blocks<'a, R> {
reader: MyReaderInner<R>,
block_size: u64,
// 👇 added to use the 'a
phantom_data: PhantomData<&'a mut R>,
}
impl<'a, R: Read> Iterator for Blocks<'a, R> {
// 👇 the lifetime 👆
type Item = Take<&'a mut MyReaderInner<R>>;
fn next(&mut self) -> Option<Self::Item> {
Some(<&mut MyReaderInner<R> as Read>::take(
&mut self.reader,
self.block_size,
))
}
}
We also need to explicitly say that we’re using Read::take
. Otherwise, Rust guesses that take
is coming from the Iterator
trait, which is not what we want. This is the only version of the code in this post where Rust gets confused like this, and I don’t have an explanation as to why.
Let’s see if it compiles:
error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
--> src/main.rs:251:17
|
251 | &mut self.reader,
| ^^^^^^^^^^^^^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime defined here...
--> src/main.rs:249:17
|
249 | fn next(&mut self) -> Option<Self::Item> {
| ^^^^^^^^^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:251:17
|
251 | &mut self.reader,
| ^^^^^^^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime `'a` as defined here...
--> src/main.rs:246:10
|
246 | impl<'a, R: Read> Iterator for Blocks<'a, R> {
| ^^
note: ...so that the types are compatible
--> src/main.rs:249:50
|
249 | fn next(&mut self) -> Option<Self::Item> {
| __________________________________________________^
250 | | Some(<&mut MyReaderInner<R> as Read>::take(
251 | | &mut self.reader,
252 | | self.block_size,
253 | | ))
254 | | }
| |_________^
= note: expected `<blocks_manual::Blocks<'a, R> as Iterator>`
found `<blocks_manual::Blocks<'_, R> as Iterator>`
That’s a long error, but I think it boils down to the fact that nothing constrains the 'a
lifetime of Take<&'a mut MyReaderInner<R>>
to be smaller than the implicit lifetime in the &mut self
function argument. In the non-Iterator
version of the code, the two lifetimes were the same because they were mentioned in the function signature: pub fn next(&mut self) -> Option<Take<&mut MyReaderInner<R>>>
.
Let’s try expanding the function signature and constraining the lifetimes:
impl<'a, R: Read> Iterator for Blocks<'a, R> {
type Item = Take<&'a mut MyReaderInner<R>>;
// 👇 explicit lifetime 👇 expand Item to make the 'a visible
fn next(&'a mut self) -> Option<Take<&'a mut MyReaderInner<R>>> {
Some(<&mut MyReaderInner<R> as Read>::take(
&mut self.reader,
self.block_size,
))
}
}
error[E0308]: method not compatible with trait
--> src/main.rs:249:9
|
249 | fn next(&'a mut self) -> Option<Take<&'a mut MyReaderInner<R>>> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ lifetime mismatch
|
= note: expected fn pointer `fn(&mut blocks_manual::Blocks<'a, R>) -> Option<std::io::Take<&mut blocks_manual::MyReaderInner<R>>>`
found fn pointer `fn(&'a mut blocks_manual::Blocks<'a, R>) -> Option<std::io::Take<&'a mut blocks_manual::MyReaderInner<R>>>`
We can’t use next(&'a mut self)
because the Iterator
trait expects next(&mut self)
. Let’s unwind a few steps: we added a PhantomData<&'a R>
to Blocks
to introduce the 'a
lifetime, but we didn’t constrain it, and this caused problems later. We want the lifetime in Blocks<'a, R>
to be the same as the lifetime of the &'a mut MyReaderInner<R>
we’ll use later. But that also has to be smaller than the implicit lifetime in next(&mut self)
, and we can’t add any constraints here because the Iterator
trait doesn’t have any.
So, we need the &'a mut MyReaderInner<R>
lifetime to be smaller than the &mut self
one, forall possible instances of the latter. Practically, I think this means we can’t create the &mut MyReaderInner<R>
reference ahead of time. Since we need to create it on the fly, we can stick the struct into a RefCell
, and then borrow
it:
pub struct Blocks<R> {
reader: RefCell<MyReaderInner<R>>,
block_size: u64,
}
This almost works, but we still need the 'a
lifetime in Blocks
, so that we can later use it in Item
(which has to be a reference so that it doesn’t take ownership of the reader). So, we instead push the RefCell
into MyReaderInner
:
pub struct MyReader<R: Read> {
inner: MyReaderInner<R>,
}
pub struct MyReaderInner<R> {
// 👇 this lets us create fresh borrows on demand
reader: RefCell<R>,
}
pub struct Blocks<'a, R> {
// 👇 this will be the lifetime of the `Iterator::Item`
reader: &'a MyReaderInner<R>,
block_size: u64,
}
// 👇 no need for a mut reference here...
impl<R: Read> Read for &MyReaderInner<R> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
// 👇 ... because we can `borrow_mut` the `RefCell`
let i = self.reader.borrow_mut().read(buf)?;
Ok(i)
}
}
impl<'a, R: Read> Iterator for Blocks<'a, R> {
type Item = Take<&'a MyReaderInner<R>>;
fn next(&mut self) -> Option<Self::Item> {
// 👇 implicit borrow of the `RefCell` creating a new
// 👇 lifetime smaller than that of `&mut self`
Some(self.reader.take(self.block_size))
}
}
This finally compiles, and the code to use it is much more ergonomic:
const BLOCK_SIZE: u64 = 10;
const BUF_SIZE: usize = 10;
for (idx, mut block) in reader.blocks(BLOCK_SIZE).enumerate().take(10) {
let mut buf = [0; BUF_SIZE];
block.read_exact(&mut buf)?;
println!("block {idx}: {}", std::str::from_utf8(&buf)?);
}
$ cargo run -q -- blocks2
block 0: AAAAAAAAAA
block 1: BBBBBBBBBB
block 2: CCCCCCCCCC
block 3: DDDDDDDDDD
block 4: EEEEEEEEEE
block 5: FFFFFFFFFF
block 6: GGGGGGGGGG
block 7: HHHHHHHHHH
block 8: IIIIIIIIII
block 9: JJJJJJJJJJ
Printing the first 10 bytes of the first 10 blocks seems to work. What if we want to print just the first 5 bytes?
const BUF_SIZE: usize = 5;
$ cargo run -q -- blocks2
block 0: AAAAA
block 1: AAAAA
block 2: BBBBB
block 3: BBBBB
block 4: CCCCC
block 5: CCCCC
block 6: DDDDD
block 7: DDDDD
block 8: EEEEE
block 9: EEEEE
It is printing only 5 bytes for each block which is good, but the second block should’ve been BBBBB
. We’re reading the bytes in each block on-demand, but we didn’t account for the case where the user doesn’t consume all the bytes in each block. So, the Blocks
iterator can advance, while the position in the underlying file trails behind.
Syncing the Iterator
and file position
The underlying problem is that calling Read::take
creates an imperfect window into the underlying file data. If we consume all the data in the Take
, the cursor in the file reaches the end of the window, and everything works as expected. But if we don’t consume all the data, then the cursor in the file is left at some partway point in the window. The next Take
then starts from that position, rather than from the start of the next block.
To fix this, we keep track of the file position ourselves:
pub struct MyReader<R: Read> {
inner: MyReaderInner<R>,
}
pub struct MyReaderInner<R> {
reader: RefCell<R>,
// 👇 the position in the file
pos: Cell<u64>,
}
pub struct Blocks<'a, R: Read> {
reader: &'a MyReaderInner<R>,
block_size: u64,
// 👇 the position where the next `Take` should start
next: u64,
}
impl<'a, R: Read> Read for &'a MyReaderInner<R> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let i = self.reader.borrow_mut().read(buf)?;
// 👇 set the current position as the user pulls from
// 👇 the `MyReaderInner` wrapper
self.pos.set(self.pos.get() + i as u64);
Ok(i)
}
}
impl<'a, R: Read> Iterator for Blocks<'a, R> {
type Item = Result<Take<&'a MyReaderInner<R>>, std::io::Error>;
fn next(&mut self) -> Option<Self::Item> {
// 👇 skip to the next starting position...
match self.skip_bytes(self.next - self.reader.pos.get()) {
Ok(()) => {
// ... 👇 and prepare the next next starting position
self.next += self.block_size;
Some(Ok(self.reader.take(self.block_size)))
}
Err(err) => Some(Err(err)),
}
}
}
impl<'a, R: Read> Blocks<'a, R> {
/// Skip bytes by reading them from the `Read`. This would be
/// more efficient if we also required `Seek`.
fn skip_bytes(&mut self, mut bytes_to_skip: u64) -> Result<(), std::io::Error> {
while bytes_to_skip > 0 {
let mut buf = [0u8; 4096 * 8];
let n = std::cmp::min(bytes_to_skip, buf.len() as u64);
match self.reader.read(&mut buf[..n as usize])? {
0 => return Ok(()),
n => {
bytes_to_skip -= n as u64;
}
}
}
Ok(())
}
}
Tracking the position is easy because we implement our own Read
instance for &MyReaderInner
: we just update the position whenever the user calls read
. We do the “skipping” by reading data from the file and immediately discarding it. If we wanted to be more efficient, we could do actual skipping, but that would require the Seek
trait.
Stopping at EOF
The big remaining problem is that our blocks iterator never actually stops iterating. When the reader reaches EOF, the iterator will just continue to hand out Take
s which contain no data. This is slightly tricky to solve because we only require the Read
trait on the reader, and that can’t be queried for EOF. That said, once EOF is reached, skip_bytes
will no longer be able to skip anything, so we could use that as the signal to stop iterating:
fn next(&mut self) -> Option<Self::Item> {
match self.skip_bytes(self.next - self.reader.pos.get()) {
Ok(0) => {
self.next += self.block_size;
Some(Ok(self.reader.take(self.block_size)))
}
// 👇 if we failed to skip, we must be at EOF
Ok(_) => None,
Err(err) => Some(Err(err)),
}
}
// 👇 return the bytes not read
fn skip_bytes(&mut self, mut bytes_to_skip: u64) -> Result<u64, std::io::Error> {
while bytes_to_skip > 0 {
let mut buf = [0u8; 4096 * 8];
let n = std::cmp::min(bytes_to_skip, buf.len() as u64);
// 👇 we don't just pass errors up any more; we treat
// 👇 errors as reaching EOF
match self.reader.read(&mut buf[..n as usize]) {
Ok(0) | Err(_) => return Ok(bytes_to_skip),
Ok(n) => {
bytes_to_skip -= n as u64;
}
}
}
Ok(0)
}
This isn’t perfect because the iterator might still return one empty Take
if the file ends on a block boundary, but it’s good enough in practice.
The thing that doesn’t work
Earlier we glossed over the fact that we’re returning multiple references to the same underlying file. Now, it’s time to face the music. Processing the iterator values out of order just doesn’t work:
const BLOCK_SIZE: u64 = 10;
const BUF_SIZE: usize = 5;
let mut blocks = reader.blocks(BLOCK_SIZE);
let mut block1 = blocks.next().unwrap()?;
let mut block2 = blocks.next().unwrap()?;
// Read first from block2, then block1
let mut buf = [0; BUF_SIZE];
block2.read_exact(&mut buf)?;
println!("block 2: {}", std::str::from_utf8(&buf)?);
block1.read_exact(&mut buf)?;
println!("block 1: {}", std::str::from_utf8(&buf)?);
$ cargo run -q -- blocks_out_of_order
block 2: BBBBB
block 1: BBBBB
It’s the same underlying problem from before. The Take
s we return limit the amount of data read from the underlying file, but don’t do anything to ensure that the starting position is correct. For any one Take
, if anything else changes the file position after it was created (e.g. the byte skipping code, or a different Take
), then it gets out of sync. We could code around this by writing MyTake
which keeps track of the file position, and seeks to the right place before read
ing, but that’s beyond the scope of this post. It would also result in unpredictable runtime performance as seeking then reading is much slower than just reading.
For now, we just put a warning in the docs. To quote the tar
crate:
Note that care must be taken to consider each entry within an archive in sequence. If entries are processed out of sequence (from what the iterator returns), then the contents read for each entry may be corrupted.
Conclusion
The main takeaway is that, although the Iterator
API is ergonomic to use, it’s also fairly constrained and doesn’t allow us to do everything. Specifically, the API wants next()
to return values whose lifetime is smaller than that of the iterator reference. These could be owned values without any references, or references that are created during iteration, but they can’t be references that were stored ahead of time in the iterator itself.