In this article, I will introduce iterators and discuss how they offer an intuitive interface for addressing a diverse set of programming problems. Specifically, I'll explore the potential application of iterators in Rebol, highlighting how they complement Rebol's elegant and expressive approach to problem-solving.
Iterators are mechanisms used to traverse complex data structures without exposing their internal composition. They provide a standard interface for accessing the component parts of those structures sequentially, significantly simplifying the code built around them. Iterators typically share the following characteristics:
- Encapsulation: They encapsulate the domain logic of a given data structure, meaning you do not need to learn that structure to use it.
- Sequential Access: Each call presents the next logical component from that domain.
- State Maintenance: They maintain their internal state between calls, always being ready for the next iteration.
- Completion: This continues until all components have been traversed and the iteration is complete.
Additionally, just as iterators provide incremental access to data, the amount of data needed in memory can be limited to what is required for each iteration. Depending on their implementations, iterators can offer access to data and domains that are as efficient as they are expressive.
The Iterator API and iterators described below utilize my Iterate module requiring both Rebol 3 by Oldes and my modules for that version of Rebol. Assuming both are installed, all these examples require is initiation of the Iterate module: import r3:rgchris:iterate.
The ‘DOM’ iterator walks a tree structure representing XML content. This iterator returns single word events for each stage of tree traversal: OPEN, CLOSE, EMPTY, and TEXT.
The ‘DOM’ iterator uses blocks created by the default XML module:
import r3:xml
tree: load/as "<a><b/><c/><d><e/></d>f</a>" 'xml
This should be equivalent to:
tree: [
document #[] [
[
"a" _ [
["b" _ _]
["c" _ _]
[
"d" _ [
["e" _ _]
]
]
"f"
]
]
]
]
Creating the iterator is relatively intuitive. ITERATE/NEW takes two arguments: the object containing specific iteration instructions for a given domain, and the values to be navigated.
document: iterate/new iterators/dom tree
This iterator is ready to go.
The iterator creates events as it navigates the DOM block. These events can be processed sequentially responding to the event name. Note that following each event, the properties of the iterator are updated to reflect the current position. In the case of the <d> node, the DOM iterator has a close method that closes the current node without walking through its children, thus skipping the <e> sub-node. Once the iterator has concluded walking the tree structure, it returns NONE, thus ending the WHILE loop.
name-of: :first
; shorthand used for clarity
collect [
while [
iterate/next document
][
switch document/event [
open [
keep to tag! name-of document/node
; we're going to skip children of <d>
;
if "d" == name-of document/node [
iterate/close document
; this only closes the 'd' tag, not the iterator
]
]
close [
keep to tag! rejoin [
#"/" name-of document/node
]
]
empty [
keep to tag! rejoin [
name-of document/node #"/"
]
]
text [
keep document/node
; nodes following a text event are strings
]
]
]
]
The result of the COLLECT wrapper is a block of tags and strings, a linear representation of the tree's structure:
[<document> <a> <b/> <c/> <d> </d> "f" </a> </document>]
This short example demonstrates how an iterator might be useful in serializing a nested tree structure.
Another feature of the DOM iterator is that the hierarchial collection of parent nodes are available for each event. This can be examined and tested against.
document: iterate/new iterators/dom tree
while [
iterate/next document
][
if all [
'empty = document/event
"e" == document/node/1
][
path: collect [
keep document/node/1
foreach node document/path [
keep node/1
]
]
break
]
]
path
; ["e" "d" "a" document]
Path evaluation can be useful for extracting content based on a given set of criteria.
The String iterator steps through a block of strings as if they were concatenated returning chunks of a given length. Though perhaps not the most elegant way to work with strings, this iterator is useful for simulating more complicated content streams, for example receiving network responses in chunks, or portions of large files.
In this example, a block of strings of varying lengths are processed and strings of a fixed length (the size specified by the first value in the block) are returned as events.
text: iterate/new iterators/string [
8 "1234567890" "" "12345678" "90"
]
collect [
while [
iterate/next text
][
keep text/chunk
]
]
; ["12345678" "90123456" "7890"]
In this example, the iterator returns NONE when all input is exhausted, though can resume if more text is added to the source block.
The Values iterator walks through a nested structure of Rebol values. This iterator returns OPEN or CLOSE events when it encounters blocks or objects, otherwise it returns a word value corresponding to the datatype of the current value. The Values iterator streamlines the process of traversing complex data structures, trivializing operations such as serialization.
In this example, we will return a block of values to its serialized form. Although this specific example could be handled just as easily by using MOLD, the iterative approach allows for intervention in the process to customize the process.
walker: iterate/new iterators/values [
1
#(object! [one: 1 two: #["two" 2]])
[one]
two #(none)
]
spacer: ""
openers: #[
#(block!) "["
#(map!) "#["
#(object!) "@["
#(paren!) "("
]
closers: #[
#(block!) #"]"
#(map!) #"]"
#(object!) #"]"
#(paren!) #")"
]
rejoin collect [
while [
iterate/next walker
][
if walker/name [
keep spacer
either word? walker/name [
keep mold to set-word! walker/name
][
keep mold walker/name
]
spacer: #" "
]
switch/default walker/event [
open [
keep spacer
keep select openers type-of walker/value
spacer: ""
]
close [
keep select closers type-of walker/value
spacer: #" "
]
none! [
keep spacer
keep #"_"
spacer: #" "
]
][
keep spacer
keep mold walker/value
spacer: #" "
]
]
]
Just for fun, this example uses custom notation (@[...]) for objects to demonstrate the versatility of this approach:
[1 @[one: 1 two: #["two" 2]] [one] two _]
This level of fine-grained control along with the availability of rich metadata on each event lends to a quite expressive style of coding.
Note: As the iterator walks through a map or object value, it steps through names and values together, both of which are available within the iterator object.
Caveat: at this time the iterator doesn't handle circular values.
The Folders iterator is one of my favourites and addresses a common need: navigating deeply through the contents of a filesystem folder. Amongst the applications are preparing a folder for archive, renaming, filtering, or deleting specific files, or performing a deep search. As the iterator steps through the filesystem, each file or subfolder is encountered with its full path, path relative to the provided folder, and just the filename. Folders generate both an OPEN and CLOSE event, the latter signaling completion of handling of that particular folder. Like the DOM example above, folders can be skipped with the ITERATE/CLOSE method.
A list of a folders contents can be obtained through iteration:
folder: iterate/new iterators/folders %my-folder/
collect [
while [
iterate/next folder
][
if find [open file error] folder/event [
keep folder/path
]
]
]
Note: An ERROR event signals a subfolder whose content could not be ascertained.
Or maybe you want to zap all of those .DS_Store files:
folder: iterate/new iterators/folders %project/
while [
iterate/next folder
][
if folder/file == %.DS_Store [
delete folder/full
]
]
The Folders iterator offers a quite elegant tool to take stock of a folder's contents. There's scope for refinement too: e.g. recognition of symbolic links would be of use (not least to identify potential circular structures).
The iterators as described in this document are experimental and relatively primitive. Each of the concepts could use further development to ensure their utility and rigor. My intent is to at least demonstrate the Iterator concept and potential usefulness. Though I'd also like to encourage some measure of scrutiny with a view to integrating the concept at a lower level in the Rebol environment.
*Note: It's more than likely that the PORT! datatype was deviced in part to accommodate
I've labelled this document 'Part I,' I do have further thoughts that I'll try to put together in 'Part II.'
Hi :) Thanks for sharing your thoughts on this matter.
It's nice that you recognized tree as one of the iterable objects, not just series. You seem to be using a stack of series at their current index as a hint for
nexton where to go.I'm using a different approach in tree-hopping. Instead of a stack, it uses a rolling plan of further actions. But it's just a quick here-and-now kludge, as opposed to how it should be designed (I already saw Brian gave you the design link ;)
So my immediate question is: why not take the next step and replace the generic
while [iterate/next folder] [...]scaffolding with your own e.g.foreach-item folder [...]?It also seems like in your iterators interface the only standardized convention is
/next, the rest is up to the implementation. Correct?Another question I have is about globbing. It seems useful as an iterator, but I can't convince myself yet: does it have any use on a local drive? Because even 1M files, say with 100 chars long path each, is just gonna take little over 100MB of RAM.