ClearML Remote Training with GitHub

Problem

ClearML remote workers clone your repository in an isolated environment.
They do not have access to your local SSH keys, SSH config, or GitHub login state.

This commonly causes jobs to fail with errors like:

fatal: Could not read from remote repository
Please make sure you have the correct access rights

This document describes a stable, low-friction setup that works in shared school infrastructure and avoids constant Git authentication problems.

Core Idea

Your local Git setup and the repository URL ClearML uses do not need to be the same.

You use SSH locally so you can push without pain
ClearML clones using HTTPS so the remote worker does not need credentials

These concerns are intentionally separated.

Requirements

One team member must:

Set up a public Git repo
Add team members as contributors
Share git repo link with team members

Local Git Setup (Do This Once)

Clone your repository using SSH:

git remote set-url origin git@github.com:OWNER_USERNAME/YOUR_REPO.git

From this point on, all local work (commit, push, pull) happens normally over SSH.

ClearML Setup (Force HTTPS Explicitly)

In your training script, explicitly tell ClearML which repository to clone.

Example:

from clearml import Task

task = Task.init(
    project_name='Pendulum-v1/yourname',
    task_name='Pendulum_test',
)

task.set_base_docker('deanis/2023y2b-rl:latest')

task.set_script(
    repository='https://github.com/YOUR_USERNAME/YOUR_REPO.git',
    branch='your_branch_name',
    working_dir='.',
    entry_point='RL_pendulum_training.py',
)

task.execute_remotely(queue_name='default')

This ensures:

ClearML does not attempt to infer your local SSH setup
The remote worker clones over HTTPS
Public repos clone without authentication

Daily Workflow

Make changes locally
Commit everything you want to run remotely
Push using SSH
Run the training script
ClearML queues the job and clones over HTTPS

You do not switch remotes back and forth.

Key Understandings for ClearML Projects

ClearML queues jobs on a remote server
A job should represent a specific experiment, usually defined by a set of hyperparameters
ClearML works best when each job explores a clear parameter configuration

Key Parameters to Explore

Learning rate
Batch size
Number of steps
Reward function

Reward functions are the hardest to tune because they interact strongly with all other parameters.

Git Hygiene Requirements

ClearML expects a relatively clean Git state.

Uncommitted changes often cause jobs to fail
Large diffs or experimental local code confuse reproducibility
Always commit before enqueueing a job

Running ClearML from a messy working tree is a common source of failure.

Recommended Team Repository Structure

A shared team repository should have the following properties:

Shared access for all team members
A clean, stable core training wrapper
One branch per team member for experimentation

Branch Discipline

Each team member explores parameters in their own branch
Experimental branches should remain reasonably clean
Avoid long-lived, chaotic branches with unrelated changes

Team Workflow for Hyperparameter Exploration

As a group, agree on which hyperparameters to explore
Each team member focuses on one parameter space
Avoid exhaustive grid searches; they are usually inefficient
Prefer parameter sweeps over sensible ranges
When promising values emerge, other team members adjust their sweeps
Gradually converge toward better configurations

This allows parallel exploration without stepping on each other’s work.

Troubleshooting Appendix (Advanced)

Multiple GitHub Accounts and SSH Aliases

Some users have multiple GitHub accounts configured via SSH aliases:

Host buas_git
HostName github.com
User git
IdentityFile ~/.ssh/id_rsa_school

Do not use SSH aliases in the repository URL that ClearML sees.

Instead:

Keep the remote URL standard:
git@github.com:YOUR_USERNAME/YOUR_REPO.git
Pin the SSH key for this repo only:
git config core.sshCommand "ssh -i ~/.ssh/id_rsa_school -o IdentitiesOnly=yes"

This affects only the current repository and does not interfere with other projects.

Summary

Use SSH locally
Force HTTPS for ClearML
Commit before enqueueing
Keep branches clean
Coordinate parameter exploration as a team

This setup avoids authentication issues and scales well for shared ClearML infrastructure in coursework.

txoof/clearml_setup.md

Select an option

No results found