Skip to content

Instantly share code, notes, and snippets.

@DavisVaughan
Created February 12, 2026 18:19
Show Gist options
  • Select an option

  • Save DavisVaughan/8641753295b622626055ec093e942de3 to your computer and use it in GitHub Desktop.

Select an option

Save DavisVaughan/8641753295b622626055ec093e942de3 to your computer and use it in GitHub Desktop.
dplyr-revdep-original-plan.md

Reverse Dependency Fix Plan

Overview

This plan outlines an automated approach to fix reverse dependency (revdep) failures caused by breaking changes in an upstream R package. The goal is to create a first-pass fix for each affected package that can then be reviewed and submitted as pull requests.

Parameters

Before executing this plan, the following must be specified:

Parameter Description Example
UPSTREAM_PACKAGE The package with breaking changes dplyr
REVDEP_ISSUE_URL GitHub issue listing affected packages (grouped by category) https://github.com/tidyverse/dplyr/issues/7763
REVDEP_BASE_DIR Base directory for all revdep work ~/Desktop/revdeps
RELEASE_DATE Target CRAN release date for the upstream package January 15, 2026

The REVDEP_ISSUE_URL should contain packages grouped by category of breaking change. Each subprocess will use this categorization to guide its diagnosis and fix approach.

Architecture

Subprocess Model

Each reverse dependency will be processed by a dedicated subprocess (Claude Code agent) to:

  • Ensure isolation between packages
  • Enable parallel processing when appropriate
  • Contain failures to individual packages
  • Maintain clear state per package

Directory Structure

{REVDEP_BASE_DIR}/
├── _summary.md                 # Overall progress tracking
├── _config.json                # Parameters for this run
├── {package_name}/
│   ├── library/                # Package-specific R library
│   ├── {package_name}/         # Git checkout of package source
│   │   └── .Rprofile           # Sets library path
│   ├── status.json             # Processing status
│   └── notes.md                # Agent notes and findings

Workflow Per Package

Each subprocess will follow this workflow:

Phase 1: Setup

  1. Create directory structure

    mkdir -p {REVDEP_BASE_DIR}/{package}/library
  2. Fork and clone repository

    cd {REVDEP_BASE_DIR}/{package}
    gh repo fork {source_url} --clone=false
    gh repo clone {forked_repo} {package}
    cd {package}
    git checkout main  # or master

    For packages with mailto URLs (CRAN-only), use:

    gh repo fork https://github.com/cran/{package} --clone=false
  3. Add .Rprofile for library isolation (use hardcoded absolute path)

    .libPaths(c("{REVDEP_BASE_DIR}/{package}/library", .libPaths()))
  4. Install dependencies

    cd {REVDEP_BASE_DIR}/{package}/{package}
    Rscript -e "pak::pak()"

Phase 2: Diagnose

  1. Identify the specific failure

    • The package's category from REVDEP_ISSUE_URL indicates the likely problem
    • Run Rscript -e "devtools::test()" to confirm the error
    • Search codebase for usage of deprecated/removed functions
  2. Search for affected code

    • Use grep/ripgrep to find usage of the problematic function/feature
    • Check NAMESPACE file for imports
    • Check DESCRIPTION for dependencies

Phase 3: Fix

Apply fixes based on the category from REVDEP_ISSUE_URL. The subprocess should:

  1. Identify all files containing the problematic pattern
  2. Apply the documented fix pattern
  3. Update NAMESPACE if imports changed
  4. Update DESCRIPTION if new dependencies needed

Important: Fixes must work with both:

  • Development version of the upstream package
  • Current CRAN version of the upstream package

Phase 4: Validate

  1. Install development version of upstream and test

    Rscript -e "pak::pak('{upstream_github_repo}')"
    Rscript -e "devtools::check()"
  2. Install CRAN version of upstream and test

    Rscript -e "pak::pak('{UPSTREAM_PACKAGE}')"
    Rscript -e "devtools::check()"
  3. Verification criteria:

    • R CMD check passes (0 errors, 0 warnings ideally)
    • Tests pass with both upstream versions
    • No new NOTEs introduced
  4. Quick validation alternative

    • For faster iteration, use Rscript -e "devtools::test()"
    • Run full devtools::check() only for final validation

Phase 5: Document

  1. Record status in status.json

    {
      "package": "package_name",
      "status": "fixed|failed|needs_review",
      "category": "category_name",
      "files_changed": ["R/file.R", "NAMESPACE"],
      "cran_upstream_check": "pass|fail",
      "dev_upstream_check": "pass|fail",
      "notes": "Any special considerations"
    }
  2. Create diff for review

    git diff > {REVDEP_BASE_DIR}/{package}/fix.patch
  3. Create PR message (for fixed packages only)

    Write {REVDEP_BASE_DIR}/{package}/message.txt:

    Hi there, we are working on the next version of {UPSTREAM_PACKAGE} and your package was flagged in our reverse dependency checks.
    
    {Brief description of the problem and how it was resolved}
    
    {UPSTREAM_PACKAGE} will be released on {RELEASE_DATE}. If you could please send an update of your package to CRAN before then, that would help us out a lot! Thanks!
    

    The description should be specific to the fix made, e.g.:

    • "The dplyr::id() function has been removed. We updated your NAMESPACE to remove the import and added id to globalVariables() since it's used as a column name."
    • "The add argument to group_by() has been removed. We updated the call to use .add instead."

    CRITICAL: The message.txt file MUST follow this exact template format. Do not deviate from this structure. The message should:

    • Start with the exact greeting: "Hi there, we are working on the next version of dplyr..."
    • Include a brief, specific description of the problem and fix (1-3 sentences)
    • End with the exact closing about the release date and CRAN update request
    • NOT include any additional sections, headers, or formatting beyond this template

Execution Strategy

Option A: Sequential Processing

Process packages one at a time:

  • Recommended for initial runs to observe patterns
  • Lower resource usage
  • Easier to monitor and debug
For each package in package_list:
    spawn_subprocess(package)
    wait_for_completion()
    record_result()

Option B: Parallel Processing

Process multiple packages concurrently:

  • Group by fix category (similar fixes have similar solutions)
  • Max concurrency: 5 packages at a time (STRICT LIMIT - never exceed this)
  • Wait for agents to complete before launching new ones
  • Timeout: 10 minutes per package (mark as failed and move on if exceeded)
  • Useful after workflow is validated

Prioritization

Suggested processing order:

  1. Tidyverse/related packages - highest visibility, may reveal patterns
  2. Simple fixes - quick wins, build confidence
  3. Largest categories - maximize impact
  4. Complex cases - may need manual intervention

Subprocess Implementation

Each package will be processed by an independent subprocess using Claude Code's Task tool with run_in_background=true:

Task(
    subagent_type="general-purpose",
    prompt="<detailed prompt with full workflow>",
    run_in_background=true
)

Important: Background execution is required so that:

  • Agents can run without requiring user permission for each tool call
  • Multiple packages can be processed in parallel
  • The main process can monitor progress and launch new agents as others complete

Each subprocess receives a comprehensive prompt containing:

  • Package name and source URL
  • Fix category (from REVDEP_ISSUE_URL)
  • Complete workflow instructions (setup, diagnose, fix, validate, document)
  • Directory structure requirements
  • Validation criteria
  • Upstream package dev install command

The subprocess operates fully independently - no shared state between packages except the summary file. This ensures:

  • Clean isolation between packages
  • No cross-contamination of fixes
  • Easy retry of individual packages if needed
  • Clear audit trail per package

Progress Tracking

Maintain {REVDEP_BASE_DIR}/_summary.md:

# Revdep Fix Progress

## Configuration
- Upstream package: {UPSTREAM_PACKAGE}
- Issue: {REVDEP_ISSUE_URL}
- Started: {date}

## Statistics
- Total: N
- Fixed: X
- Failed: Y
- In Progress: Z
- Pending: W

## By Category
| Category | Fixed | Failed | Pending |
|----------|-------|--------|---------|
| ...      | ...   | ...    | ...     |

## Package Status
| Package | Category | Status | Notes |
|---------|----------|--------|-------|
| ...     | ...      | ...    | ...   |

Error Handling

Package cannot be forked

  • Check if CRAN mirror exists at https://github.com/cran/{package}
  • Record as "source_unavailable" and skip

Dependencies fail to install

  • Try installing with dependencies = FALSE
  • Record specific failing dependencies
  • May indicate upstream issues unrelated to our changes

Fix requires architectural changes

  • Mark as "needs_manual_review"
  • Document the issue clearly
  • These will be handled separately

Tests fail for reasons unrelated to upstream package

  • Document pre-existing failures
  • Verify our changes don't make things worse
  • Mark as "partial_fix" if upstream-specific issues resolved

Package uses internal/unexported functions

  • These are harder to fix as there may not be a public API replacement
  • Document what internal function was used
  • Mark as "needs_manual_review" if no clear replacement exists

Output for Review

After all packages are processed:

  1. Summary report in {REVDEP_BASE_DIR}/_summary.md
  2. Per-package patches in {REVDEP_BASE_DIR}/{package}/fix.patch
  3. Status files in {REVDEP_BASE_DIR}/{package}/status.json

Review diffs with:

cd {REVDEP_BASE_DIR}/{package}/{package}
git diff

Create PRs manually after review.

Notes

  • Do not add NEWS.md or changelog entries to fixes
  • Work autonomously - do not stop to ask for permission, make reasonable decisions and proceed
  • If a package fails or needs manual review, document it and move on to the next package
  • Keep processing until all packages are complete or have a final status
  • Prefer devtools::test() for quick iteration; use devtools::check() for final validation
  • Some test failures may be pre-existing or due to missing API keys/credentials - document these but don't block on them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment