Try   HackMD

Configuring Rclone for Unity

By Tanya Lama

Objective

Configure rclone on Unity for the transfer of files to and from cloud storage.

Table of Contents

Note that there may be a file size limit on uploads, depending on what platform you are using for storage (e.g., the Box limit is 15GB). Any file larger will return an error. Google Drive does not appear to have an upper limit.

Ensure that you also have Rclone downloaded & installed on your local machine: https://rclone.org/downloads/

rclone is installed on Unity via module load.

Need to install rclone on your local machine? Follow these instructions:

Mac install rclone

Download the latest version of rclone.

sudo -v ; curl https://rclone.org/install.sh | sudo bash

This will install rclone into your local /usr/bin/

See what you have in your project space on the cluster

du -h --max-depth=1 ./myprojectspace/ | sort -n -r

In your home directory on Unity, check to be sure rclone is installed (via module or /usr/bin):

module load rclone/1.62.2

Time to configure our rclone remote

rclone config

Our remote is called sbugoogledrive

Tanya has already stepped through the install instructions below to set up remote backup with Google Drive.

Need to configure a new remote?

  1. No remotes found - make a new one
    n) New remote
    s) Set configuration password
    q) Quit config
    n/s/q> n
    name> sbugoogledrive
  2. Type of storage to configure.
    Choose a number from below, or type in your own value #
    Select #18 for Google Drive
  3. Google Application Client Id
    client_id> press Enter to leave blank
  4. OAuth Client Secret
    client_secret> press Enter to leave blank
  5. Scope that rclone should use when requesting access from drive.
    scope> select 1 for full access to all files
  6. ID of the root folder
    root_folder_id> press Enter for default
  7. Service Account Credentials JSON file path
    service_account_file> press Enter for default
  8. Edit advanced config? (y/n)
    y/n> n
  9. Remote config. Use auto config? ## important select NO here!!!
    Select No

Return to your local terminal

At this point rclone should direct you back to your local terminal to enter some code that looks something like this:
rclone authorize "drive" "eyJzY29wZsdfsdfyaXZlIn0"
This command will open a web browser, where you will login to GoogleDrive using your netid and password/DUO authentication and authorize rclone for access.

Rclone wants to access your Google Account

Select Allow

Enter verification code

Copy the code from your local terminal and enter it in the Unity terminal

Configure this as a Shared Drive?

y/n> n

[sbugoogledrive]

Yes this is OK
y/e/d> y

You should have sbugoogledrive listed under Current remotes:

Current remotes:

Name Type
==== ====
sbugoogledrive drive****

If unsure, run rclone config again.

How to back up whole files or folders

Please do this in an interactive job, not on the head node

srun --pty -t 120:00 --mem=8G -p cpu bash

Moving a single file

rclone copy filetomove.txt "sbugoogledrive:directorytocopyto/"

Moving a whole folder

rclone copy ./folder "sbugoogledrive:unity_backup/folder" --skip-links

Moving files from remote to remote (BOX to GoogleDrive or OneDrive to BOX etc)

The following command backups files directly from onedrive to google drive without downloading them locally
bsub -n 1 -R rusage[mem=2000] -W 120 -q short "rclone copy onedrive:migrated/10-30-2020-HP-backup googledrive:mghpcc_backup/10-30-2020-hp-backup"

This command is optimized to move big folders quickly (72G in 1hr):
bsub -n 8 -R rusage[mem=2000] -W 72:00 -q long "rclone copy onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs"

Checking the integrity of transferred files

Now that you’ve transfered files, we’ll use rclone check.
rclone check confirms that the files in the source and destination match by comparing sizes and hashes (MD5 or SHA1) and provides a report. Ensure that you are in the same directory as the copied files.

Checking a single file

rclone check file.vcf.gz "sbugoogledrive:unity_backup/project_bat1k_longevity/"

Checking a whole folder

rclone check foldername "sbugoogledrive:unity_backup/project_bat1k_longevity/" --one-way

Output should read:
NOTICE: sbugoogledrive root ‘sbugoogledrive:unity_backup/project_bat1k_longevity/’: **0 differences found**
NOTICE: sbugoogledrive root ‘sbugoogledrive:unity_backup/project_bat1k_longevity/’: **35 matching files**

You can now delete files from Unity that have been backed up successfully :)

My recommendation re: data management

I backup my cluster AND my local machine to googledrive using rclone. I then use the google drive desktop application on my local machine (Mac). This gives me full control over my files. NOTHING is stored locally on my Mac. I use external hard drives as secondary backup for raw reads and other important files.

Separate Instructions for Authorizing BOX

Execute the following on your local machine:

rclone authorize "box"

This opens a browser asking you to authorize box.
Rclone on your local machine terminal will give you an authorization code
You will paste the code into the **cluster terminal **. Note that this is super finnicky. Select EVERYTHING between the -> and <- arrows
->
{"access_token":"XXXXXXXXXXXXXXXXXXXXX","token_type":"bearer","refresh_token":"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","expiry":"2020-08-14T10:37:06.2877126+08:00"} <-

Now it should be configured to upload directly to Box

module load rclone/1.51.0
rclone copy filetomove.txt "box:directorytocopyto"

Check Box to see if the file has been uploaded.

Now that you’ve transfered files, we’ll use rclone check (ensure that you are in the same directory as the copied files).
Check confirms that the files in the source and destination match by comparing sizes and hashes (MD5 or SHA1) and provides a report:
#a single file

rclone check file.vcf.gz "box:box_backups/project_canada_lynx_WGS/"

#a whole folder

rclone check tbi "remote_box:box_backups/project_canada_lynx_WGS/" --one-way

Output should read:

NOTICE: box root ‘box_backups/project_canada_lynx_WGS’: 0 differences found
NOTICE: box root ‘box_backups/project_canada_lynx_WGS’: 35 matching files

You can now delete files from UMass SharedCluster that have been backed up successfully :)

Appendix and FAQ

Find this document incomplete? Leave a comment!

tags: tools `scripts'

scratch

my Unity backup on 2-2-2022

[in an interactive session]

bsub -n 1 -R rusage[mem=2000] -W 120 -q short "rclone copy onedrive:migrated/pictures ./pictures"

bsub -n 1 -R rusage[mem=2000] -W 120 -q short "rclone copy onedrive:migrated/msc-thesis googledrive:mghpcc_backup/msc_thesis"

bsub -n 1 -R rusage[mem=2000] -W 4:00 -q short "rclone copy ./pictures googledrive:mghpcc_backup/pictures"

bad:
rclone check ./fq.gz googledrive:project_red_squirrel_rad/data_red_squirrel_rad/raw_reads_novogene/*.fq.gz>
bsub -n 8 -R rusage[mem=2000] -W 72:00 -q long "rclone copy onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs"

bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/project_furbearer_book googledrive:mghpcc_backup/project_canada_lynx_wgs/project_furbearer_book"

bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/rsconnect googledrive:mghpcc_backup/project_canada_lynx_wgs/rsconnect"

bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/Eukaryotes_Genomes_from_NCBI_2019.csv googledrive:mghpcc_backup/project_canada_lynx_wgs/"

bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/SNPs_per_CHR_pop_assignment.xlsx googledrive:mghpcc_backup/project_canada_lynx_wgs/SNPs_per_CHR_pop_assignment.xlsx"

##super fast
bsub -n 8 -R rusage[mem=2000] -W 72:00 -q long "rclone copy onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs"

#72g in like an hour

googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs
Total objects: 32
Total size: 8.991 GBytes (9654030492 Bytes)
[tl50a@c40b03 ~]$ rclone size onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs
[tl50a@c40b03 ~]$ rclone size googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs

rclone size onedrive:migrated/box_backups (3T)
rclone size googledrive:mghpcc_backup/box_backups

onedrive:migrated/box_backups/project_canada_lynx_WGS

bsub -n 8 -R rusage[mem=2000] -W 48:00 -q long "rclone copy onedrive:migrated/box_backups/project_red_squirrel_RAD googledrive:mghpcc_backup/box_backups/project_red_squirrel_RAD"

onedrive:migrated/box_backups/project_canada_lynx_WGS/download_canada_lynx_wgs/VCFs/mLynCan4_v1.p_lynx

bsub -n 4 -R rusage[mem=2000] -W 0:45 -q short "rclone copy onedrive:migrated/box_backups/project_canada_lynx_WGS/scripts_canada_lynx_wgs/final-backup-01-06-2021 googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS/scripts_canada_lynx_wgs/final-backup-01-06-2021"

still running: rclone copy onedrive:migrated/box_backups/project_canada_lynx_WGS googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS

rclone check onedrive:migrated/box_backups/project_canada_lynx_WGS/download_canada_lynx_wgs/cleancopy_novogene googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS/download_canada_lynx_wgs/cleancopy_novogene

onedrive:migrated/box_backups/project_canada_lynx_WGS/R_canada_lynx_wgs/outliers_GEA googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS/R_canada_lynx_wgs/outliers_GEA