Saturday, 3 August 2024

Using Colab with GitHub Files

Colab supports Jupyter notebooks from Github through OAuth out of the box. The notebook can also be pushed to GitHub using the Colab File -> Save a copy in GitHub menu.

However, I also have .py files that I created and imported by the notebook. To push these files to GitHub, a GitHub Access Token needs to be created. The instructions are available here.

Once the token is created, put the code in Colab:

GITHUB_ACCESS_TOKEN='put PAT here'
!git clone https://$GITHUB_ACCESS_TOKEN:x-oauth-basic@github.com/romenlaw/NaiveNeuralNetwork
%cd NaiveNeuralNetwork
Then the git push can be executed:
!git config --global user.email "my github user email"
!git config --global user.name "my github user name"
!git add NaiveValue.py
!git status

!git commit -m "commit from colab"

!git push origin main
This way, I can overcome the workplace firewall constraints and fully utilise Colab and other online IDEs including Kaggle.com.

For a development environment, it is crucial to enable auto-reload:
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


The only drawback of this approach is that the Colab virtual machine can be lost and reallocated. Therefore, the files in Colab will be wiped out. So make sure the push to GitHub every now and then.

❗There is actually another problem with this approach: the files saved using git command and the Colab notebook saved using Colab menu can create conflicts because they are treated as separate sessions by Github. Therefore, it's better to save the individual files throughout the session and only save the notebook at the end of it to avoid such conflicts.