# Cloud solution on Google Cloud Platform (GCP)
[Github Repo:link:](https://github.com/xu-ziwei/Met-db)
set the enviorment
```
pip install adodbapi
pip install mysql-connector-python
pip install google-cloud-storage
```
:warning: if `adodbapi` install did not work by
`cannot import name 'build_py_2to3' from 'distutils.command.build_py'`
try `pip install setuptools = 57.5.0`
## Creating an Organization:
* Organizations are associated with a domain using Google Workspace or Cloud Identity
Once you have created your Google Workspace or Cloud Identity account and associated it with a domain, your organization resource will be automatically created for you. The resource will be provisioned at different times depending on your account status:
[Migrating projects between organization resources :link:](https://cloud.google.com/resource-manager/docs/project-migration)
[Adding Organization Administrators :link:](https://cloud.google.com/resource-manager/docs/creating-managing-organization#adding_an_organization_admin)
## Price
Check the price for bucket [price :link:](https://cloud.google.com/storage/pricing?hl=en&_ga=2.210296533.-1179392142.1695303245#europe)
## Manage your Google Cloud
* resource manager [quick start :link:](https://cloud.google.com/resource-manager/docs/quickstarts)
* [object storage with the gsutil tool :link:](https://cloud.google.com/storage/docs/discover-object-storage-gsutil#cloud-shell)
### Cloud CLI
For every action in the consle, there is a gcloud equivalent.
Use CLI to upload to cloud
* [CLI introduction :link:](https://cloud.google.com/sdk/gcloud)
* [install :link:](https://cloud.google.com/sdk/docs/install)
## CLoud Info :floppy_disk:
**MySql** :key:
**Instance ID** `met-db`
**Password:** `Metsystem`
### File Structure
**Uniexplorer structure**
* Job_id
* Acuqire_0
* well_name
* time
* .bmp
*
**google cloud storage sturcture**
* Job_name
* well_name
* Countinue 1
* time
* .bmp
the `e96_wells` file will be stord at `e96/` folder in cloud.
:question: Is `e96_wells` a necessary or requirement file to open Uniexplorer :question:
### Visualization and Sharing:
* App Engine or Cloud Run: If you aim to build a web-based interface to visualize or share your microscopy images, you can use App Engine for fully managed apps or Cloud Run for containerized applications. They can pull images from GCS for visualization.
* Google Data Studio: For visualizing metadata or analysis results from your microscopy images, Data Studio can help in creating interactive dashboards.
## Change data file structure. Python read data and analyse
Keep documentation on the schema, the unique identifier assignment, the automation script, create a robust solution to manage the storage and retrieval of your microscopy images.
* Automation Script
- [x] Generates a unique storage path (EXP name)
- [x] create tables for MySql on google
- [x] Uploads the image to Google Cloud Storage.
- [x] access to cloud set
- [x] `jason` key for
`from google.cloud import storage
client=storage.Client.from_service_account_json('<PATH_TO_SERVICE_ACCOUNT_JSON>')`
- [x] Inserts a new record in the Google Cloud SQL database
- [x] change schema for empty MySql
- [x] check datatype when seach `.sdf`
- [x] change datatype for insert
## For Marc or who wants to run code :wink:
Open `Conda Prompt`
At fist you should see
`(base) C:\Users\oCelloScope>`
change the enviorment for running python
`conda activate Met`
Then it should be
`(Met) C:\Users\oCelloScope>`
Change the disk path,
`D:`
you will see
`(Met) D:/>`
go into the DataStore folder by
`cd DataStore`
If you have `(Met) D:\DataStore>` on screen, Congracts :+1:. You should go uploading or downloading part.
Ensure that the config.json and met-raw-images.json are present. For example, if your SQLCE file path is `D:/DataStore/DataStore.sdf`, then the config and key files should be located at `D:/DataStore/config.json` and `D:/DataStore/met-raw-images.json`, respectively.
### Uploading (local to cloud)
* run at conda prompt`python cloud.py -m upload`
* The last update's information will be saved at `last_update.txt`. When you run the upload command, it continues from where it left off based on this file. (sometimes you only continue from half way. e.g if you see the number in `last_update.txt` is 23. it means it finished uploading all data in Job_id 23 and it cloud have uploading part of data of Job_id 24. So it will starts from 24 check and upload)
* The log of ERROR or WARNING will be storage at local path `migration_log.txt`
### Downloading (cloud to local)
`python cloud.py -m download`
* The log of ERROR or WARNING will be storage at local path `migration_log.txt`
### Interupt
Press`Ctrl` + `C` for keyboadInterupt the code
## For Silja or who wants to understand my code
`config.json` include
```
"db_host": google MySQL host ip,
"db_name": google MySQL databse name,
"db_user": google MySQL database user account,
"db_password": google MySQL database user password,
"service_account_file_path": google service key path,
"bucket_name": google storage buket name,
"data_source": local SQLCE .sdf file path
```
### `main():`
* Load configurations from config.json.
* Establish connections to the MySQL and SQL CE databases.
* Setup Google Cloud Storage client.
* Depending on the provided argument (upload or download), call the respective function.
* Finally, close all database connections and inform that the data migration is complete.
### `upload_data(...)`
* Query all jobs from SQL CE.
* Skip jobs that were already uploaded (based on the `last_uploaded.txt` file).
* For each job:
* Determine the local paths and corresponding cloud paths for the job data.
* If local data exists, upload it to Google Cloud Storage using the `copy_to_cloud()` function. Also, store the paths using the `store_paths()` function. Additionally, upload `e96_wells` data for the job using the `uploade96_to_cloud()` function.
* Insert the job into the MySQL database. This includes various data like JobTask, JobEvent, AcquireTask, AcquireSettings, InstrumentInformation, and ScanArea.
* If there's an error inserting the data (perhaps due to data already existing), it logs the error and moves on to the next job.
### `download_data(...)`
* Contains nested functions `fetch_filtered_data()`, `refresh_list()`, and `download_selected()`:
* `fetch_filtered_data()`: Fetch job data optionally filtered by a search term.
* `refresh_list()`: Refreshes the UI list of jobs based on a search term.
* `download_selected()`: Downloads selected jobs from the Cloud to local storage.
* Sets up a basic GUI for the user to select the jobs they want to download. This is done with checkboxes next to each job. Once the user selects the jobs, they can click on the `"Download Selected"` button to begin downloading the selected jobs.
* For each selected job, the `download_from_cloud()` function downloads the main job data and the `downloade96_from_cloud()` function specifically downloads the e96_wells data.
### Database
Insert to Job Table:
* For each job in SQL CE, check if a job with the same Name doesn’t already exist in the MySQL Job table.
* If it doesn’t exist, insert the job into the MySQL Job table and get the auto-generated Id in MySQL.
Insert to JobTask Table:
* For each job task in SQL CE related to the current job, prepare the data for insertion into the MySQL JobTask table.
* Use the newly generated Id from the MySQL Job table as the Job_id value for the JobTask records.
* Insert the job task data into the JobTask table in MySQL.
Same insert step to JobEvent, ScanArea, AcquireSettings, AcquireTask and instrumentInformation Tables.
what for Scan Table :question:
Insert to PathStorage Table
Use `Job_id` as index, insert the local path and cloud path
### Change `.sdf` schema for empty MySql (`create_tables.sql`)
Replaced `IDENTITY (1,1)` with `AUTO_INCREMENT`.
Replaced `NTEXT` with `TEXT`, and `NVARCHAR` with `VARCHAR` as MySQL does not have `NTEXT` and `NVARCHAR` types.
Added `PRIMARY KEY (Id)` to designate the Id column as the primary key.
Replaced `BIT` with `BOOLEAN` for boolean columns.
## Notes
:red_circle: Do not delete `DataStore.sdf` local file, since I do not upload to cloud. (Do I really need the sdf file on cloud?)
:red_circle: If you wish to up load from an New Disk, make sure created a new database on cloud and new `config.json` for connection.
:red_circle: Make sure the experiment name is unique. (e.g 2023_08_18_EXP7_Plate2_day0) Because I set thes names as primary key at `PathStorage` table.
> A system of folders and database files, where all UniExplorer data (images, analysis results etc.) is stored. See manual sections 2.2.3 and 4.4.
> Data handling by oCelloScope
The UniExplorer program controls all data storage. Since very large amounts of images can be recorded and analysed, an integrated database is used to organise all data.
> As all data is controlled by an internal database, it is important that you do NOT directly delete files from the data area on your hard disk, but that you perform the clean-up from
UniExplorer.
>If you need to use a network drive (DRIVER NOT CLOUD) for (temporary) data storage, you will always need to map the data drive used (assign a letter to the drive). Otherwise you will get an error. It is not recommended to use network drives for data storage, as this will most likely slow down data analysis considerably!
>All UniExplorer program files are stored in one folder named “C:\UniExplorer” (unless you
specified a different path during installation.) Several program files and some sub-folders are in this folder. All data (images, analysis results etc.) is saved in the DataStore folder (named “DataStore” unless you renamed it during installation.) The folder is in the path you specified during installation.
>You should never manipulate data directly from the Windows File Manager, but do all handling from within UniExplorer, as a database is used to keep track of all your jobs and their related files. If you manipulate data directly, it will lead to errors when using UniExplorer.