Skip to content

Instantly share code, notes, and snippets.

@rgchris
Last active November 28, 2025 15:10
Show Gist options
  • Select an option

  • Save rgchris/9770ab0f7b625ed4b5db84a97fa999ba to your computer and use it in GitHub Desktop.

Select an option

Save rgchris/9770ab0f7b625ed4b5db84a97fa999ba to your computer and use it in GitHub Desktop.

Iterators, Part I

In this article, I will introduce iterators and discuss how they offer an intuitive interface for addressing a diverse set of programming problems. Specifically, I'll explore the potential application of iterators in Rebol, highlighting how they complement Rebol's elegant and expressive approach to problem-solving.

What are Iterators?

Iterators are mechanisms used to traverse complex data structures without exposing their internal composition. They provide a standard interface for accessing the component parts of those structures sequentially, significantly simplifying the code built around them. Iterators typically share the following characteristics:

  • Encapsulation: They encapsulate the domain logic of a given data structure, meaning you do not need to learn that structure to use it.
  • Sequential Access: Each call presents the next logical component from that domain.
  • State Maintenance: They maintain their internal state between calls, always being ready for the next iteration.
  • Completion: This continues until all components have been traversed and the iteration is complete.

Additionally, just as iterators provide incremental access to data, the amount of data needed in memory can be limited to what is required for each iteration. Depending on their implementations, iterators can offer access to data and domains that are as efficient as they are expressive.


The Iterator API and iterators described below utilize my Iterate module requiring both Rebol 3 by Oldes and my modules for that version of Rebol. Assuming both are installed, all these examples require is initiation of the Iterate module: import r3:rgchris:iterate.

‘DOM’

The ‘DOM’ iterator walks a tree structure representing XML content. This iterator returns single word events for each stage of tree traversal: OPEN, CLOSE, EMPTY, and TEXT.

Sample Content

The ‘DOM’ iterator uses blocks created by the default XML module:

import r3:xml
tree: load/as "<a><b/><c/><d><e/></d>f</a>" 'xml

This should be equivalent to:

tree: [
    document #[] [
        [
            "a" _ [
                ["b" _ _]
                ["c" _ _]
                [
                    "d" _ [
                        ["e" _ _]
                    ]
                ]
                "f"
            ]
        ]
    ]
]

Initiation

Creating the iterator is relatively intuitive. ITERATE/NEW takes two arguments: the object containing specific iteration instructions for a given domain, and the values to be navigated.

document: iterate/new iterators/dom tree

This iterator is ready to go.

Navigation

The iterator creates events as it navigates the DOM block. These events can be processed sequentially responding to the event name. Note that following each event, the properties of the iterator are updated to reflect the current position. In the case of the <d> node, the DOM iterator has a close method that closes the current node without walking through its children, thus skipping the <e> sub-node. Once the iterator has concluded walking the tree structure, it returns NONE, thus ending the WHILE loop.

name-of: :first
; shorthand used for clarity

collect [
    while [
        iterate/next document
    ][
        switch document/event [
            open [
                keep to tag! name-of document/node

                ; we're going to skip children of <d>
                ;
                if "d" == name-of document/node [
                    iterate/close document
                    ; this only closes the 'd' tag, not the iterator
                ]
            ]

            close [
                keep to tag! rejoin [
                    #"/" name-of document/node
                ]
            ]

            empty [
                keep to tag! rejoin [
                    name-of document/node #"/"
                ]
            ]

            text [
                keep document/node
                ; nodes following a text event are strings
            ]
        ]
    ]
]

The result of the COLLECT wrapper is a block of tags and strings, a linear representation of the tree's structure:

[<document> <a> <b/> <c/> <d> </d> "f" </a> </document>]

This short example demonstrates how an iterator might be useful in serializing a nested tree structure.

Paths

Another feature of the DOM iterator is that the hierarchial collection of parent nodes are available for each event. This can be examined and tested against.

document: iterate/new iterators/dom tree

while [
    iterate/next document
][
    if all [
        'empty = document/event
        "e" == document/node/1
    ][
        path: collect [
            keep document/node/1

            foreach node document/path [
                keep node/1
            ]
        ]

        break
    ]
]

path
; ["e" "d" "a" document]

Path evaluation can be useful for extracting content based on a given set of criteria.

String

The String iterator steps through a block of strings as if they were concatenated returning chunks of a given length. Though perhaps not the most elegant way to work with strings, this iterator is useful for simulating more complicated content streams, for example receiving network responses in chunks, or portions of large files.

Example

In this example, a block of strings of varying lengths are processed and strings of a fixed length (the size specified by the first value in the block) are returned as events.

text: iterate/new iterators/string [
    8 "1234567890" "" "12345678" "90"
]

collect [
    while [
        iterate/next text
    ][
        keep text/chunk
    ]
]

; ["12345678" "90123456" "7890"]

In this example, the iterator returns NONE when all input is exhausted, though can resume if more text is added to the source block.

Values

The Values iterator walks through a nested structure of Rebol values. This iterator returns OPEN or CLOSE events when it encounters blocks or objects, otherwise it returns a word value corresponding to the datatype of the current value. The Values iterator streamlines the process of traversing complex data structures, trivializing operations such as serialization.

Example

In this example, we will return a block of values to its serialized form. Although this specific example could be handled just as easily by using MOLD, the iterative approach allows for intervention in the process to customize the process.

walker: iterate/new iterators/values [
    1
    #(object! [one: 1 two: #["two" 2]])
    [one]
    two #(none)
]

spacer: ""

openers: #[
    #(block!) "["
    #(map!) "#["
    #(object!) "@["
    #(paren!) "("
]

closers: #[
    #(block!) #"]"
    #(map!) #"]"
    #(object!) #"]"
    #(paren!) #")"
]

rejoin collect [
    while [
        iterate/next walker
    ][
        if walker/name [
            keep spacer

            either word? walker/name [
                keep mold to set-word! walker/name
            ][
                keep mold walker/name
            ]

            spacer: #" "
        ]

        switch/default walker/event [
            open [
                keep spacer
                keep select openers type-of walker/value

                spacer: ""
            ]

            close [
                keep select closers type-of walker/value

                spacer: #" "
            ]

            none! [
                keep spacer
                keep #"_"

                spacer: #" "
            ]
        ][
            keep spacer
            keep mold walker/value

            spacer: #" "
        ]
    ]
]

Just for fun, this example uses custom notation (@[...]) for objects to demonstrate the versatility of this approach:

[1 @[one: 1 two: #["two" 2]] [one] two _]

This level of fine-grained control along with the availability of rich metadata on each event lends to a quite expressive style of coding.

Note: As the iterator walks through a map or object value, it steps through names and values together, both of which are available within the iterator object.

Caveat: at this time the iterator doesn't handle circular values.

Folders

The Folders iterator is one of my favourites and addresses a common need: navigating deeply through the contents of a filesystem folder. Amongst the applications are preparing a folder for archive, renaming, filtering, or deleting specific files, or performing a deep search. As the iterator steps through the filesystem, each file or subfolder is encountered with its full path, path relative to the provided folder, and just the filename. Folders generate both an OPEN and CLOSE event, the latter signaling completion of handling of that particular folder. Like the DOM example above, folders can be skipped with the ITERATE/CLOSE method.

Example

A list of a folders contents can be obtained through iteration:

folder: iterate/new iterators/folders %my-folder/

collect [
    while [
        iterate/next folder
    ][
        if find [open file error] folder/event [
            keep folder/path
        ]
    ]
]

Note: An ERROR event signals a subfolder whose content could not be ascertained.

Or maybe you want to zap all of those .DS_Store files:

folder: iterate/new iterators/folders %project/

while [
    iterate/next folder
][
    if folder/file == %.DS_Store [
        delete folder/full
    ]
]

The Folders iterator offers a quite elegant tool to take stock of a folder's contents. There's scope for refinement too: e.g. recognition of symbolic links would be of use (not least to identify potential circular structures).

Closing Thoughts

The iterators as described in this document are experimental and relatively primitive. Each of the concepts could use further development to ensure their utility and rigor. My intent is to at least demonstrate the Iterator concept and potential usefulness. Though I'd also like to encourage some measure of scrutiny with a view to integrating the concept at a lower level in the Rebol environment.

*Note: It's more than likely that the PORT! datatype was deviced in part to accommodate

I've labelled this document 'Part I,' I do have further thoughts that I'll try to put together in 'Part II.'

@hiiamboris
Copy link

Understood. Thanks for your comments, and for the lazy listing idea for my consideration :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment