ecohealthalliance
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Using EHA's High-Performance Servers [![hackmd-github-sync-badge](https://hackmd.io/6Jj4m4CYRcKOXZjqeIWNlg/badge)](https://hackmd.io/6Jj4m4CYRcKOXZjqeIWNlg) EHA has two high-performance Linux servers which can be used for modeling and analysis work: - ***Aegypti***: 2 10-core 3.0Ghz processors, (40 virtual cores), 256GB of RAM - ***Prospero***: 2 18-core 3.7Ghz processors, (72 virtual cores), 512GM of RAM, 4 GPU (2 NVIDIA GTX, 2 NVIDIA 1080) These servers can be accessed from anywhere you have Internet, and are excellent for long-running, compute- or memory-intensive jobs. Please read this entire guide before using one of the servers, as well as the rest of the [EHA modeling and analytics guide](https://ecohealthalliance.github.io/eha-ma-handbook/). - [Getting an account](#getting-an-account) - [RStudio interface and R Setup](#rstudio-interface-and-r-setup) - [Hard Disks](#hard-disks) - [Moving files to and from the server disk](#moving-files-to-and-from-the-server-disk) - [Shell interface](#shell-interface) - [Making the most of the servers' computing power.](#making-the-most-of-the-servers-computing-power.) - [Server Etiquette and Communication](#server-etiquette-and-communication) - [Backup and Redundancy](#backup-and-redundancy) - [Shiny Apps](#shiny-apps) - [Installing more Software and Reporting Bugs](#installing-more-software-and-reporting-bugs) - [Architecture](#architecture) ## Getting an account To get an account on the server, contact admins (currently Robert Young (young@ecohealthalliance.org)) and provide a preferred username and password. ## RStudio interface and R Setup For much work, you can use the RStudio interface to the servers, which is very similar to RStudio Desktop. Just visit <https://aegypti.ecohealthalliance.org/rstudio> or <https://prospero.ecohealthalliance.org/rstudio>. Both machines have a large number of R packages pre-installed, but you are free to install additional packages that you need. These packages will store in your own directory and be accessible only to you - installing does not affect other users. Prospero has GPU-enabled versions of some R and Python packages, such as **xgboost** and **Tensorflow**. ## Hard Disks There are approximately 7 terabytes of space on the shared hard disk. Whether you log on to `prospero` or `aegypti`, you'll find your files are the same. The two computers share a common hard drive for user files, so you can easily switch computers without moving your work. Note that, since RStudio saves information about your session on the hard disk, you will likely experience some issues if you try to use RStudio for the same project on both machines at once. Because the disk is network-attached, it can be slightly slower than direct hard disk access. If you are running a process where hard disk speed is essential (this will be rare), place files in the `local/` folder inside your home directory. This always points to a location on the local hard disk and will have faster read/write speeds. If you are working on a project together with others and wish to use a common, large dataset that is too large for GitHub, you can place it under the `shared/` directory in your home directory. Note that everyone has the ability to read and write in this directory (with the exception of some subdirectories), so be careful what you edit! Make your own project subdirectory. If you have files on the server drive that you aren't actively working on but will likely be picked up again, transfer them to the `archive/` folder in your home directory. This stores the file on a separate, 4TB hard drive, but it is slower than other locations. ### Moving files to and from the server disk The preferred way to transfer code between a local computer and the server disk is git and GitHub. If you are working on a project locally, push it to GitHub, and pull it down from the server. RStudio on the server also has buttons in the "Files" pane to move files up and down. There's an "Upload" button in the Pane, and an "Export" button under "More" to download. Finally, you can use the `scp` command from the shell to move files in bulk (see below and [this tutorial](https://hpc-carpentry.github.io/hpc-intro/15-transferring-files/index.html)). ## Shell interface When running long-running jobs, it is usually preferable to run a self-contained script in the background on the server rather than using the RStudio console. A script is less likely to hang, easier to kill, and running it does not block you from continuing to use the RStudio interface. The simplest way to do this is to write R scripts that are self-contained, and run them from the shell with the command `Rscript`, like so: $ Rscript path/to/my/script.R Note that you should always navigate to the top level of your project directory when running code from the terminal. To avoid issues related to the path you should [use the **here** package in your R scripts](https://github.com/jennybc/here_here) to simplify file paths. - If you are not already familiar with using the shell, take some time to learn it. We suggest the [Software Carpentry](https://swcarpentry.github.io/shell-novice/) lessons. - You can use the Shell from the RStudio interface by going to the *Tools \> Terminal \> New Terminal* menu, and a terminal pane will open next to your R console. - You can connect the shell directly without RStudio from your computer using them SSH (secure shell) program. From a shell terminal on your computer run `ssh -p 22022 username@computer.ecohealthalliance.org`. - To avoid having to enter a password, you should set up a *public/private keypair*, where you store a public key on the server that matches a private key on your computer. Instructions to set this up are found [here](https://hacker-tools.github.io/remote-machines/#ssh-keys). - We also recommend you set up an SSH `config` file to simplify connecting to the host. Create a file in your computer's home directory called `~/.ssh/config`. On a mac, you can do so by opening the terminal and typing mkdir ~/.ssh touch ~/.ssh/config open -e ~/.ssh/config Paste the following in this file, change `yourusername` to your user name, save and close: Host aegypti aegypti.ecohealthalliance.org HostName aegypti.ecohealthalliance.org User yourusername Port 22022 StrictHostKeyChecking no UserKnownHostsFile /dev/null Host prospero prospero.ecohealthalliance.org HostName prospero.ecohealthalliance.org User yourusername Port 22022 StrictHostKeyChecking no UserKnownHostsFile /dev/null With this in place, you can log in by typing `ssh aegypti` or `ssh prospero` in the terminal. It will also ensure that when we change the servers' configuration, your computer doesn't panic and think there's a security issue. - `mosh` is an alternative to SSH that is more robust to intermittent Internet connections. Once you have done the `.ssh/config` set up above, you can drop in mosh instead of ssh to connect, e.g., `mosh aegypti` or `mosh prospero`. Get mosh at <https://mosh.org/> - When you log in via SSH or the RStudio terminal, you will be dropped into a program called [`byobu`](http://byobu.co/), which is a thin overlay over the shell. The important thing about `byobu` is that it persists even if you close the window, so your scripts can run when you are not connected. You can press F1 to configure `byobu` or opt not to use it. Between `mosh` and and `byobu`, you'll almost never lose your session. (More on `byobu`, including using it for multiple sessions, [here](https://simonfredsted.com/1588)) - If you want to check your jobs on the go you can reach the servers from your phone! The RStudio interface doesn't work that well on mobile screens, but you can use the shell interface. [JuiceSSH](https://juicessh.com/) is a good SSH and mosh client for Android. Recommendations for iOS apps welcome. - The default editor for the shell on the servers is [`micro`](https://micro-editor.github.io/), which is minimal and fairly easy to use. (Ctrl-S to save, Ctrl-Q to quit, Ctrl-E for help.). [`nano`](https://www.nano-editor.org/) and [`vim`](https://www.vim.org/) are also installed if you want to [change your default editor](https://www.a2hosting.com/kb/developer-corner/linux/setting-the-default-text-editor-in-linux). - Some other helpful resources for SSH, shell and related tools include [this Software Carpentry lesson](http://v4.software-carpentry.org/shell/ssh.html), [this series of course notes](https://hacker-tools.github.io/lectures/), and the book *Bioinformatics Data Skills*, which you can borrow from Noam. ## Making the most of the servers' computing power. In most cases, R tasks will not go any faster on a high-performance machine. To take advantage of more power, you generally have to *parallelize* your code. I am partial to this [quick intro to parallelization in R](http://librestats.com/2012/03/15/a-no-bs-guide-to-the-basics-of-parallelization-in-r/), and the [**furrr**](https://davisvaughan.github.io/furrr/) package is a quick way to get going with parallelization if you're already using the **purrr** package for repeat tasks. Note that, as the servers run Linux, you can use the simpler parallelization options such as `mclapply()` and `future(multiprocess)`, which do not require setting up a cluster like on Windows. The servers are also useful if you are running code that needs a large amount of memory (often big geospatial analyses), or just something that needs to run all weekend while your computer does other things. If you are running big jobs, there are probably good ways to make them more efficient and use fewer resources and finish faster. The short online book [*Efficient R Programming*](https://csgillespie.github.io/efficientR/index.html) is an excellent primer on how to speed up code. Finally, your scripts should save your results without intervention, and it is good practice to have them save results from intermediate steps and parallel processes in case they are interrupted and so you can monitor progress. We recommend using the [`targets`](https://ecohealthalliance.github.io/eha-ma-handbook/2-projects.html#targets) package for many such jobs. ## Server Etiquette and Communication Aegypti and Prospero are shared resources and only work if we use them politely. The servers are *not* good for: - Storing private collections of data. *There is no expectation of privacy on the servers*. Admins have access to everyone's folders. If you need them for a sensitive project, please contact an admin. - Long term data or code archival. The servers have a lot of space but it can quickly fill up with simulated or short-term data sets from other sources. In general, you should be storing your code on GitHub, and data in other locations such as Amazon S3, Azure Blob Storage, or the archive disk (see above under "Hard Disks"). Please make way for others to use the servers. We have an \#eha-servers Slack room for coordinating their use. Check in there if you have questions or before running a big job. Several shell utilities are useful for checking on your own and others' usage of the servers. - The `byobu` status bar at the bottom of the terminal shows a quick summary of machine usage. For instance, `40x10.18 251.8G25% 7.0T14%` means that 10.18 of 40 cores are being used, 25% of 251.8GB of memory is being used, and 14% of 7TB of hard disk is being used. - Running `htop` in the terminal will pull up a detailed display of total CPUs and memory usage, as well as a list of all running processes in the machine and their individual CPU and memory usage. You can sort and filter processes by usage or user, and also kill your own processes here. Press `?` for help on keyboard shortcuts and `q` to quit. You should use `htop` to check on currently running jobs before starting yours, and to monitor your own usage during a big run. - Running `ncdu` in any folder pulls up a nice interface for finding your biggest files and deleting them. Press `?` for help and `q` to quit. - The `nice` command lets you run scripts at a lower priority. If you have a long-running job, use this to get out of the way of short scripts. `nice -n 10 Rscript path/to/my/script.R` will run your script at lower priority. Niceness ranges from 0-20. You can also adjust the niceness of an ongoing process from `htop`. Note that users can only make processes nicer, not higher-priority. - In R, you can view memory used by objects in the RStudio "Environment" pane or in total with `lobstr::mem_used()`, or [profile](https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-with-RStudio#profiling-memory-example) your code to see how it uses memory. Beware some common disk and memory hogs: - When thinking about how much RAM you might need, remember that many implementations of parallelized code will make copies of the objects being manipulated to send to each core. This potentially multiplies your RAM usage by the number of cores you run on: if you were only using 8GB, but parallelize to 20 cores, you could end up taking up 160GB of space! Before running a large job over many cores, it makes sense to take a look at its memory footprint on just one core. - Your RStudio environment can take up considerable RAM, so please clear it when not in use. Exiting the browser window from an RStudio session will not immediately quit your session, and you will still be holding on to memory from the objects in your environment. To explicitly exit your RStudio session in a browser, go to `File -> Quit Session` in the menu bar. This will quit your session and free up RAM space for other users. This also frees up disk space, as RStudio keeps a mirror of memory on disk to recover from crashes or in long idle periods. - Very large files can slow down operations considerably when they are being tracked by git, as RStudio continually scans them for updates. Add large files or directories of many files to a [`.gitignore`](https://git-scm.com/docs/gitignore) file to avoid tracking them. - `.Rproj.user` directories in each of your R projects hold large clones of your sessions. They might be orphaned after a session crash -- these files can be safely deleted, as can file named `core` that are generated after a crash. - Don't save your session environment. The `.Rdata` files created can take up a lot of space. Explicitly save only important objects you create individually. `saveRDS()` is a better choice for saving objects than `save()` because it avoids name conflicts on load. Save objects without `saveRDS(object, "filename")` and load them with `object <- readRDS("filename")`. ## Backup and redundancy While servers are not for long-term storage, we do back up information on the servers off-site for rapid recovery after disaster or other catastrophic loss. We can restore from backup user data changed or deleted in in the past 30 days. In the event of loss of the servers, we can restore the server virtual machines and backed-up data to cloud-based servers. Contact a server admin if you need to restore lost data. Note that RStudio project states (`.Rproj.user/`) and user `tmp/` directories are not backed up. ## Shiny Apps We no longer host Shiny Apps on the servers. If you need to host Shiny Apps, R Markdown reports, or similar use our RStudio Connect server (connect.eha.io). ## Scheduled jobs The servers can run regularly scheduled jobs using `cron`, but permission to do this is assigned on a per-user basis. Contact an admin if you need to run scheduled jobs. ## Installing more Software and Reporting Bugs Users can install some software on the system using [Homebrew](https://docs.brew.sh/Homebrew-on-Linux). It will install the software in your home directory, so it will be available only you. You will first need to run the following command in the shell on either aegypti or prospero to add the binary directory to your path: eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)" This only needs to be run once. After that, installing most software with Homebrew is as simple as typing the following: brew install name-of-software-package You can also use the `brew` command to get information before installing. For instance: ``` ➜ ~ brew search bayes ==> Formulae mrbayes ✔ ➜ ~ brew info mrbayes mrbayes: stable 3.2.7 (bottled), HEAD Bayesian inference of phylogenies and evolutionary models https://nbisweden.github.io/MrBayes/ /home/noamross/.linuxbrew/Cellar/mrbayes/3.2.7_2 (21 files, 7.2MB) * Poured from bottle on 2021-12-09 at 10:06:52 From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/mrbayes.rb License: GPL-3.0-or-later ==> Dependencies Build: pkg-config ✔ Required: beagle ✔, open-mpi ✘ ==> Options --HEAD Install HEAD version ==> Analytics install: 131 (30 days), 279 (90 days), 295 (365 days) install-on-request: 131 (30 days), 279 (90 days), 295 (365 days) build-error: 0 (30 days) ➜ ~ brew install mrbayes ``` Note that, like R packages, software installed in this way may need to be updated or reinstalled when we upgrade the server operating system. If there's a program you cannot install with Homebrew or you're having problems with a Homebrew install please [file an issue in this repository](https://github.com/ecohealthalliance/eha-servers). Administrative users can often install needed programs quickly, but if you want something as part of the long-term set up for all servers you should note this, so it can become part of our regular upgrades. Note that things like user shell configurations (or anything else configured outside of user home directories) will reset, as well. ## Architecture and Maintenance Technically, users only have access to Docker containers (virtual machines) running on top of the base computers. Those Docker containers are defined in files in the [`reservoir` repository](https://github.com/ecohealthalliance/reservoir/). If you wish to make a change to the config like adding a new program, you can make a pull request or file an issue in that repository. We expect to restart the servers for maintenance occasionally and will send a warning to the \#eha-servers channels occasionally. This will usually occur a few weeks after the release of a new R version, though not only then. We manage the containers, the base machines, and other components like user accounts using [Ansible](https://www.ansible.com/). This repository contains the Ansible configurations, and the [`admin.md`](admin.md) file provides administrative documentation. Servers are backed up with [duplicity](http://duplicity.nongnu.org/) to [Backblaze B2](https://www.backblaze.com/b2/cloud-storage.html).

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully