Sunday, 5 May 2024

Accessing Google Drive from Colab

 One of the drawback of Google Colab free version is that you cannot store files to its file system permanently. Every time it refreshes itself, the files that you upload to its environment is lost.

Fortunately, there is an easy solution at hand - simply upload the files in your Google Drive and use one of the Google Drive libraries to access it from Colab programmatically. I use pydrive2 as an example

from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive

gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

filename='DataFile.csv'
mimetype='text/csv'
query = {'q': f"title = '{filename}' and mimeType='{mimetype}'"}
files = drive.ListFile(query).GetList()

import pandas as pd
from io import StringIO

#there should be onlly one file matching the criteria
file1=files[0]
print('title: {}, id: {}'.format(file1['title'], file1['id']))
content=file1.GetContentString()
dataset = pd.read_csv(StringIO(content))

From here on, we can process the file using panda and other libraries.