rgchris/iterate.md

Last active November 28, 2025 15:10

Star (1) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rgchris/9770ab0f7b625ed4b5db84a97fa999ba.js"></script>
Save rgchris/9770ab0f7b625ed4b5db84a97fa999ba to your computer and use it in GitHub Desktop.

Raw

Iterators, Part I

In this article, I will introduce iterators and discuss how they offer an intuitive interface for addressing a diverse set of programming problems. Specifically, I'll explore the potential application of iterators in Rebol, highlighting how they complement Rebol's elegant and expressive approach to problem-solving.

What are Iterators?

Iterators are mechanisms used to traverse complex data structures without exposing their internal composition. They provide a standard interface for accessing the component parts of those structures sequentially, significantly simplifying the code built around them. Iterators typically share the following characteristics:

Encapsulation: They encapsulate the domain logic of a given data structure, meaning you do not need to learn that structure to use it.
Sequential Access: Each call presents the next logical component from that domain.
State Maintenance: They maintain their internal state between calls, always being ready for the next iteration.
Completion: This continues until all components have been traversed and the iteration is complete.

Additionally, just as iterators provide incremental access to data, the amount of data needed in memory can be limited to what is required for each iteration. Depending on their implementations, iterators can offer access to data and domains that are as efficient as they are expressive.

The Iterator API and iterators described below utilize my Iterate module requiring both Rebol 3 by Oldes and my modules for that version of Rebol. Assuming both are installed, all these examples require is initiation of the Iterate module: import r3:rgchris:iterate.

‘DOM’

The ‘DOM’ iterator walks a tree structure representing XML content. This iterator returns single word events for each stage of tree traversal: OPEN, CLOSE, EMPTY, and TEXT.

Sample Content

The ‘DOM’ iterator uses blocks created by the default XML module:

import r3:xml
tree: load/as "<a><b/><c/><d><e/></d>f</a>" 'xml

This should be equivalent to:

tree: [
    document #[] [
        [
            "a" _ [
                ["b" _ _]
                ["c" _ _]
                [
                    "d" _ [
                        ["e" _ _]
                    ]
                ]
                "f"
            ]
        ]
    ]
]

Initiation

Creating the iterator is relatively intuitive. ITERATE/NEW takes two arguments: the object containing specific iteration instructions for a given domain, and the values to be navigated.

document: iterate/new iterators/dom tree

This iterator is ready to go.

Navigation

The iterator creates events as it navigates the DOM block. These events can be processed sequentially responding to the event name. Note that following each event, the properties of the iterator are updated to reflect the current position. In the case of the <d> node, the DOM iterator has a close method that closes the current node without walking through its children, thus skipping the <e> sub-node. Once the iterator has concluded walking the tree structure, it returns NONE, thus ending the WHILE loop.

name-of: :first
; shorthand used for clarity

collect [
    while [
        iterate/next document
    ][
        switch document/event [
            open [
                keep to tag! name-of document/node

                ; we're going to skip children of <d>
                ;
                if "d" == name-of document/node [
                    iterate/close document
                    ; this only closes the 'd' tag, not the iterator
                ]
            ]

            close [
                keep to tag! rejoin [
                    #"/" name-of document/node
                ]
            ]

            empty [
                keep to tag! rejoin [
                    name-of document/node #"/"
                ]
            ]

            text [
                keep document/node
                ; nodes following a text event are strings
            ]
        ]
    ]
]

The result of the COLLECT wrapper is a block of tags and strings, a linear representation of the tree's structure:

[<document> <a> <b/> <c/> <d> </d> "f" </a> </document>]

This short example demonstrates how an iterator might be useful in serializing a nested tree structure.

Paths

Another feature of the DOM iterator is that the hierarchial collection of parent nodes are available for each event. This can be examined and tested against.

document: iterate/new iterators/dom tree

while [
    iterate/next document
][
    if all [
        'empty = document/event
        "e" == document/node/1
    ][
        path: collect [
            keep document/node/1

            foreach node document/path [
                keep node/1
            ]
        ]

        break
    ]
]

path
; ["e" "d" "a" document]

Path evaluation can be useful for extracting content based on a given set of criteria.

String

The String iterator steps through a block of strings as if they were concatenated returning chunks of a given length. Though perhaps not the most elegant way to work with strings, this iterator is useful for simulating more complicated content streams, for example receiving network responses in chunks, or portions of large files.

Example

In this example, a block of strings of varying lengths are processed and strings of a fixed length (the size specified by the first value in the block) are returned as events.

text: iterate/new iterators/string [
    8 "1234567890" "" "12345678" "90"
]

collect [
    while [
        iterate/next text
    ][
        keep text/chunk
    ]
]

; ["12345678" "90123456" "7890"]

In this example, the iterator returns NONE when all input is exhausted, though can resume if more text is added to the source block.

Values

The Values iterator walks through a nested structure of Rebol values. This iterator returns OPEN or CLOSE events when it encounters blocks or objects, otherwise it returns a word value corresponding to the datatype of the current value. The Values iterator streamlines the process of traversing complex data structures, trivializing operations such as serialization.

Example

In this example, we will return a block of values to its serialized form. Although this specific example could be handled just as easily by using MOLD, the iterative approach allows for intervention in the process to customize the process.

walker: iterate/new iterators/values [
    1
    #(object! [one: 1 two: #["two" 2]])
    [one]
    two #(none)
]

spacer: ""

openers: #[
    #(block!) "["
    #(map!) "#["
    #(object!) "@["
    #(paren!) "("
]

closers: #[
    #(block!) #"]"
    #(map!) #"]"
    #(object!) #"]"
    #(paren!) #")"
]

rejoin collect [
    while [
        iterate/next walker
    ][
        if walker/name [
            keep spacer

            either word? walker/name [
                keep mold to set-word! walker/name
            ][
                keep mold walker/name
            ]

            spacer: #" "
        ]

        switch/default walker/event [
            open [
                keep spacer
                keep select openers type-of walker/value

                spacer: ""
            ]

            close [
                keep select closers type-of walker/value

                spacer: #" "
            ]

            none! [
                keep spacer
                keep #"_"

                spacer: #" "
            ]
        ][
            keep spacer
            keep mold walker/value

            spacer: #" "
        ]
    ]
]

Just for fun, this example uses custom notation (@[...]) for objects to demonstrate the versatility of this approach:

[1 @[one: 1 two: #["two" 2]] [one] two _]

This level of fine-grained control along with the availability of rich metadata on each event lends to a quite expressive style of coding.

Note: As the iterator walks through a map or object value, it steps through names and values together, both of which are available within the iterator object.

Caveat: at this time the iterator doesn't handle circular values.

Folders

The Folders iterator is one of my favourites and addresses a common need: navigating deeply through the contents of a filesystem folder. Amongst the applications are preparing a folder for archive, renaming, filtering, or deleting specific files, or performing a deep search. As the iterator steps through the filesystem, each file or subfolder is encountered with its full path, path relative to the provided folder, and just the filename. Folders generate both an OPEN and CLOSE event, the latter signaling completion of handling of that particular folder. Like the DOM example above, folders can be skipped with the ITERATE/CLOSE method.

Example

A list of a folders contents can be obtained through iteration:

folder: iterate/new iterators/folders %my-folder/

collect [
    while [
        iterate/next folder
    ][
        if find [open file error] folder/event [
            keep folder/path
        ]
    ]
]

Note: An ERROR event signals a subfolder whose content could not be ascertained.

Or maybe you want to zap all of those .DS_Store files:

folder: iterate/new iterators/folders %project/

while [
    iterate/next folder
][
    if folder/file == %.DS_Store [
        delete folder/full
    ]
]

The Folders iterator offers a quite elegant tool to take stock of a folder's contents. There's scope for refinement too: e.g. recognition of symbolic links would be of use (not least to identify potential circular structures).

Closing Thoughts

The iterators as described in this document are experimental and relatively primitive. Each of the concepts could use further development to ensure their utility and rigor. My intent is to at least demonstrate the Iterator concept and potential usefulness. Though I'd also like to encourage some measure of scrutiny with a view to integrating the concept at a lower level in the Rebol environment.

*Note: It's more than likely that the PORT! datatype was deviced in part to accommodate

I've labelled this document 'Part I,' I do have further thoughts that I'll try to put together in 'Part II.'

hiiamboris commented Nov 27, 2025

Hi :) Thanks for sharing your thoughts on this matter.

It's nice that you recognized tree as one of the iterable objects, not just series. You seem to be using a stack of series at their current index as a hint for next on where to go.

I'm using a different approach in tree-hopping. Instead of a stack, it uses a rolling plan of further actions. But it's just a quick here-and-now kludge, as opposed to how it should be designed (I already saw Brian gave you the design link ;)

So my immediate question is: why not take the next step and replace the generic while [iterate/next folder] [...] scaffolding with your own e.g. foreach-item folder [...]?

It also seems like in your iterators interface the only standardized convention is /next, the rest is up to the implementation. Correct?

Another question I have is about globbing. It seems useful as an iterator, but I can't convince myself yet: does it have any use on a local drive? Because even 1M files, say with 100 chars long path each, is just gonna take little over 100MB of RAM.

Author

rgchris commented Nov 27, 2025

Thanks for sharing your thoughts on this matter.

More untimely code dump draft than white paper, alas.

I already saw Brian gave you the design link ;)

Yes, and it's remiss of me not to have noticed it beforehand, although—well it's linked here now 😊. From where I'm coming from, it would seem to vindicate development choices for this batch of modules. I certainly agree that some native iterator construct—more specific than PORT!—should exist.

It's nice that you recognized tree as one of the iterable objects, not just series.

Indeed, and would have included faces if Rebol/View still worked on any systems that anyone (including me) still used 😔. In-place recursion really makes for messy, unclear code. Like you, tabbing is one of those tasks that has been in the back of my mind.

You seem to be using a stack of series at their current index as a hint for next on where to go.

Yes, and I recognize that this can be problematic if the underlying series are modified elsewhere or if the series are big. It's worked well enough for the things I've been using these modules to do, which has largely been my standard to this point.

I'm using a different approach in tree-hopping. Instead of a stack, it uses a rolling plan of further actions.

I'm not quite clear on the distinction. Forgive me for not grasping it on first glance, I will try again. I certainly have no reason to stick to stacks if there's something more appropriate.

So my immediate question is: why not take the next step and replace the generic while [iterate/next folder] [...] scaffolding with your own e.g. foreach-item folder [...]?

I'm ever reluctant to add new general-purpose functions if an existing function has the capacity (or semantic license) to take on the same purpose. If foreach node node-walker [...] is the best idiomatic phrasing, then we should be doing that (and I'm not certain if that is the case—I know that JavaScript sets iterators into the FOR ... OF framework, so perhaps it is). For my purposes for the moment, WHILE is clear enough in intent.

It also seems like in your iterators interface the only standardized convention is /next, the rest is up to the implementation. Correct?

Pretty much. That wouldn't be my end point if I were to go push further, I suspect. Somewhere in there was a /close method that, for example, closed the current directory if you wished to skip it.

Another question I have is about globbing. It seems useful as an iterator, but I can't convince myself yet: does it have any use on a local drive? Because even 1M files, say with 100 chars long path each, is just gonna take little over 100MB of RAM.

Am I right in understanding 'globbing' as a term for the deep directory listings? A question is, what alternatives (in Rebol or Red) are there if you have that 1M files? It would seem that whatever mechanism you use to obtain the files, an iterator is as good and efficient as any interface for actually processing them, right?

hiiamboris commented Nov 28, 2025

A question is, what alternatives (in Rebol or Red) are there if you have that 1M files? It would seem that whatever mechanism you use to obtain the files, an iterator is as good and efficient as any interface for actually processing them, right?

The alternative is to let a function list everything and give you back a block to deal with. One could argue that a native iterator implementation would indeed be both efficient and flexible enough to be used on remote drives. But being implemented on Redbol level I expect it would add an unnecessary overhead, so in this case I'm not convinced that iterators bring anything to the table.

Forgive me for not grasping it on first glance, I will try again. I certainly have no reason to stick to stacks if there's something more appropriate.

On a directory listing example (simplified):

add root listing to the plan block
visit the first entry in the plan, add its files if it's a directory to the plan
visit the second entry in the plan (it may be the first file listed by the previously visited directory), add its files if it's a directory to the plan
visit next entry, and so on...
(ideally it would free the slots in the plan it already processed, but for my needs so far that would only have been a slowdown, so I didn't go that far)

My original idea was to just do plan and let it grow itself with new code :) But Red couldn't manage such dynamicity, even with do/next, so I had to wrap every command into a separate block 🤷

But do not confuse this tree-hopping kludge with the iterator design. In it, depending on the desired traversal mode (breadth-first or depth-first) you may write something like for each file after glob %dir [code] or for each file below glob %dir [code], where:

for <iterator> [code] is the loop evaluator
each file <iterator> is a transformation of the globbing iterator into one that assigns each item to the file word
after <node> is an iterator that traverses the tree breadth-first, starting from <node>
below <node> is an iterator that traverses the tree depth-first, starting from <node>
glob %dir is a hypothetical function that makes a tree node from a file! value (ideally, it should be as lazy as possible in accessing the file system)

It may seem a bit more wordy than just foreach file glob %dir [code] (where glob %dir would just list the dir deeply) but is infinitely composable.

Author

rgchris commented Nov 28, 2025

The alternative is to let a function list everything and give you back a block to deal with. One could argue that a native iterator implementation would indeed be both efficient and flexible enough to be used on remote drives. But being implemented on Redbol level I expect it would add an unnecessary overhead, so in this case I'm not convinced that iterators bring anything to the table.

The possibilities for efficiency and flexibility are just one part—it's the interface where I find the value to be. Not to belabour the point on recursion, but taking just two examples, collecting files for archive and nixing nested files of a given filename: traditionally it is on the user to manage paths as they traverse a tree, whereas the iterator can provide this metadata almost inherently; the full path to operate on, the relative path for reference, the filename to evaluate. Could call it cognitive efficiency, perhaps. Sure, there is some cost in maintaining this information per cycle, but for most purposes the user is bearing this cost anyway.

It may seem a bit more wordy than just foreach file glob %dir [code] (where glob %dir would just list the dir deeply) but is infinitely composable.

Yeah, I don't see a problem with that, the word usage is logical and clearly reflects intent.

hiiamboris commented Nov 28, 2025

Understood. Thanks for your comments, and for the lazy listing idea for my consideration :)

rgchris/iterate.md

Select an option

No results found

Select an option

No results found

Iterators, Part I

What are Iterators?

‘DOM’

Sample Content

Initiation

Navigation

Paths

String

Example

Values

Example

Folders

Example

Closing Thoughts

hiiamboris commented Nov 27, 2025

Uh oh!

rgchris commented Nov 27, 2025

Uh oh!

hiiamboris commented Nov 28, 2025

Uh oh!

rgchris commented Nov 28, 2025

Uh oh!

hiiamboris commented Nov 28, 2025

Uh oh!