--- title: 'Configuring Rclone & Backing up to Google Drive/BOX/OneDrive' disqus: hackmd --- Configuring Rclone for Unity === By Tanya Lama ## Objective Configure rclone on Unity for the transfer of files to and from cloud storage. ## Table of Contents [TOC] Note that there may be a file size limit on uploads, depending on what platform you are using for storage (e.g., the Box limit is 15GB). Any file larger will return an error. Google Drive does not appear to have an upper limit. ## Ensure that you also have Rclone downloaded & installed on your **local** machine: https://rclone.org/downloads/ ### rclone is installed on Unity via module load. Need to install rclone on your local machine? Follow these instructions: ### Mac install rclone Download the latest version of rclone. ``` sudo -v ; curl https://rclone.org/install.sh | sudo bash ``` This will install rclone into your local /usr/bin/ ## See what you have in your project space on the cluster ``` du -h --max-depth=1 ./myprojectspace/ | sort -n -r ``` ## In your home directory on **Unity**, check to be sure rclone is installed (via module or /usr/bin): ``` module load rclone/1.62.2 ``` ## Time to configure our rclone remote ``` rclone config ``` ## Our remote is called sbugoogledrive Tanya has already stepped through the install instructions below to set up remote backup with Google Drive. ## Need to configure a new remote? 1. No remotes found - make a new one n) New remote s) Set configuration password q) Quit config n/s/q> n name> sbugoogledrive 2. Type of storage to configure. Choose a number from below, or type in your own value # Select #18 for Google Drive 3. Google Application Client Id client_id> press Enter to leave blank 4. OAuth Client Secret client_secret> press Enter to leave blank 5. Scope that rclone should use when requesting access from drive. scope> select 1 for full access to all files 6. ID of the root folder root_folder_id> press Enter for default 8. Service Account Credentials JSON file path service_account_file> press Enter for default 9. Edit advanced config? (y/n) y/n> n 10. Remote config. Use auto config? *## important -- select NO here!!!* Select No ## Return to your local terminal At this point rclone should direct you back to your **local** terminal to enter some code that looks something like this: `rclone authorize "drive" "eyJzY29wZsdfsdfyaXZlIn0"` This command will open a web browser, where you will login to GoogleDrive using your netid and password/DUO authentication and authorize rclone for access. ## Rclone wants to access your Google Account Select Allow ## Enter verification code Copy the code from your **local** terminal and enter it in the **Unity** terminal ## Configure this as a Shared Drive? y/n> n ## [sbugoogledrive] Yes this is OK y/e/d> y ## You should have sbugoogledrive listed under Current remotes: Current remotes: Name Type ==== ==== sbugoogledrive drive**** ## If unsure, run rclone config again. ## How to back up whole files or folders ### Please do this in an interactive job, **not on the head node** `srun --pty -t 120:00 --mem=8G -p cpu bash` ### Moving a single file ```rclone copy filetomove.txt "sbugoogledrive:directorytocopyto/"``` ### Moving a whole folder ``` rclone copy ./folder "sbugoogledrive:unity_backup/folder" --skip-links``` ### Moving files from remote to remote (BOX to GoogleDrive or OneDrive to BOX etc) The following command backups files directly **from onedrive to google drive** without downloading them locally `bsub -n 1 -R rusage[mem=2000] -W 120 -q short "rclone copy onedrive:migrated/10-30-2020-HP-backup googledrive:mghpcc_backup/10-30-2020-hp-backup"` This command is optimized to move big folders quickly (72G in 1hr): `bsub -n 8 -R rusage[mem=2000] -W 72:00 -q long "rclone copy onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs" ` ### Checking the integrity of transferred files Now that you’ve transfered files, we’ll use `rclone check`. `rclone check` confirms that the files in the source and destination match by comparing sizes and hashes (MD5 or SHA1) and provides a report. **Ensure that you are in the same directory as the copied files.** ### Checking a single file ``` rclone check file.vcf.gz "sbugoogledrive:unity_backup/project_bat1k_longevity/" ``` ### Checking a whole folder ``` rclone check foldername "sbugoogledrive:unity_backup/project_bat1k_longevity/" --one-way ``` Output should read: NOTICE: sbugoogledrive root `‘sbugoogledrive:unity_backup/project_bat1k_longevity/’: **0 differences found**` NOTICE: sbugoogledrive root `‘sbugoogledrive:unity_backup/project_bat1k_longevity/’: **35 matching files**` You can now delete files from Unity that have been backed up successfully :) ## My recommendation re: data management I backup my cluster AND my local machine to googledrive using rclone. I then use the google drive desktop application on my local machine (Mac). This gives me full control over my files. NOTHING is stored locally on my Mac. I use external hard drives as secondary backup for raw reads and other important files. ## Separate Instructions for Authorizing BOX Execute the following on your **local machine**: ``` rclone authorize "box" ``` This opens a browser asking you to authorize box. Rclone on your local machine terminal will give you an authorization code You will paste the code into the **cluster terminal **. Note that this is super finnicky. Select EVERYTHING between the ---> and <--- arrows ---> {"access_token":"XXXXXXXXXXXXXXXXXXXXX","token_type":"bearer","refresh_token":"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","expiry":"2020-08-14T10:37:06.2877126+08:00"} <--- Now it should be configured to upload directly to Box ``` module load rclone/1.51.0 rclone copy filetomove.txt "box:directorytocopyto" ``` Check Box to see if the file has been uploaded. Now that you’ve transfered files, we’ll use **rclone check** (ensure that you are in the same directory as the copied files). Check confirms that the files in the source and destination match by comparing sizes and hashes (MD5 or SHA1) and provides a report: #a single file ``` rclone check file.vcf.gz "box:box_backups/project_canada_lynx_WGS/" ``` #a whole folder ``` rclone check tbi "remote_box:box_backups/project_canada_lynx_WGS/" --one-way ``` Output should read: ``` NOTICE: box root ‘box_backups/project_canada_lynx_WGS’: 0 differences found NOTICE: box root ‘box_backups/project_canada_lynx_WGS’: 35 matching files You can now delete files from UMass SharedCluster that have been backed up successfully :) ``` ## Appendix and FAQ :::info **Find this document incomplete?** Leave a comment! ::: ###### tags: `tools` `scripts' ## scratch ## my Unity backup on 2-2-2022 [in an interactive session] bsub -n 1 -R rusage[mem=2000] -W 120 -q short "rclone copy onedrive:migrated/pictures ./pictures" bsub -n 1 -R rusage[mem=2000] -W 120 -q short "rclone copy onedrive:migrated/msc-thesis googledrive:mghpcc_backup/msc_thesis" bsub -n 1 -R rusage[mem=2000] -W 4:00 -q short "rclone copy ./pictures googledrive:mghpcc_backup/pictures" bad: rclone check ./fq.gz googledrive:project_red_squirrel_rad/data_red_squirrel_rad/raw_reads_novogene/*.fq.gz> bsub -n 8 -R rusage[mem=2000] -W 72:00 -q long "rclone copy onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs" bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/project_furbearer_book googledrive:mghpcc_backup/project_canada_lynx_wgs/project_furbearer_book" bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/rsconnect googledrive:mghpcc_backup/project_canada_lynx_wgs/rsconnect" bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/Eukaryotes_Genomes_from_NCBI_2019.csv googledrive:mghpcc_backup/project_canada_lynx_wgs/" bsub -n 1 -R rusage[mem=2000] -W 1:00 -q short "rclone copy onedrive:migrated/project_canada_lynx_wgs/SNPs_per_CHR_pop_assignment.xlsx googledrive:mghpcc_backup/project_canada_lynx_wgs/SNPs_per_CHR_pop_assignment.xlsx" ##super fast bsub -n 8 -R rusage[mem=2000] -W 72:00 -q long "rclone copy onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs" #72g in like an hour googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs Total objects: 32 Total size: 8.991 GBytes (9654030492 Bytes) [tl50a@c40b03 ~]$ rclone size onedrive:migrated/project_canada_lynx_wgs/output_canada_lynx_wgs [tl50a@c40b03 ~]$ rclone size googledrive:mghpcc_backup/project_canada_lynx_wgs/output_canada_lynx_wgs rclone size onedrive:migrated/box_backups (3T) rclone size googledrive:mghpcc_backup/box_backups onedrive:migrated/box_backups/project_canada_lynx_WGS bsub -n 8 -R rusage[mem=2000] -W 48:00 -q long "rclone copy onedrive:migrated/box_backups/project_red_squirrel_RAD googledrive:mghpcc_backup/box_backups/project_red_squirrel_RAD" onedrive:migrated/box_backups/project_canada_lynx_WGS/download_canada_lynx_wgs/VCFs/mLynCan4_v1.p_lynx bsub -n 4 -R rusage[mem=2000] -W 0:45 -q short "rclone copy onedrive:migrated/box_backups/project_canada_lynx_WGS/scripts_canada_lynx_wgs/final-backup-01-06-2021 googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS/scripts_canada_lynx_wgs/final-backup-01-06-2021" still running: rclone copy onedrive:migrated/box_backups/project_canada_lynx_WGS googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS rclone check onedrive:migrated/box_backups/project_canada_lynx_WGS/download_canada_lynx_wgs/cleancopy_novogene googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS/download_canada_lynx_wgs/cleancopy_novogene onedrive:migrated/box_backups/project_canada_lynx_WGS/R_canada_lynx_wgs/outliers_GEA googledrive:mghpcc_backup/box_backups/project_canada_lynx_WGS/R_canada_lynx_wgs/outliers_GEA