# JULIA FOR STATISTICS AND DATA SCIENCE. YOUR FIRST STEPS IN STATISTICS AND INTEGRATION WITH YOUR CURRENT PLATFORM.
*An [SSA](https://www.statsoc.org.au/) workshop by [Yoni Nazarathy](https://yoninazarathy.com/), July 8-9, 2021.*
Last edit: Day of July 8. (Subject to mild updates).
---
Welcome to the workshop!
The workshop spans two days where in each day the schedule is as follows (times are AEST):
* **9am – 11am** - Lecture by instructor (encouraging online live Q&A).
* **11am -11:30am** – Structured activity with online support by the instructor.
* **11:30am-3:00pm** – Time **without** instructor. Participants are encouraged to continue working on the activity and prepare questions. The Zoom link can still be used for chat between peers.
* **3pm – 4pm** – Meeting again with the instructor to share progress of the activity and get feedback with further exploration.
Please use this [Zoom link](https://us02web.zoom.us/j/88573062519?pwd=M2c5TllmdVZ0b0dLNzRkMjk2L3BSdz09) only after you registered for the event.
Code and data for the workshop is in this [GitHub Repo](https://github.com/yoninazarathy/2021SSA-Julia/). You can also see the main Jupyter notebook for day 1 via [NBViewer](https://nbviewer.jupyter.org/github/yoninazarathy/2021SSA-Julia/blob/master/2021-SSA-Julia-Workshop-Nazarathy.ipynb). And similarly for [day2](https://nbviewer.jupyter.org/github/yoninazarathy/2021SSA-Julia/blob/master/Day2.ipynb).
Note that on the first day (Thursday), there will be an **install party** starting at 8am (also on the Zoom link). Basically the instructor will be available to try and help with installations to those that still haven't installed the software.
## Day 1 (Thursday) focuses purely on Julia in the Jupyter environment.
The topics and activities for this day are:
1. **About Julia**: A quick summary of what to expect and key resources.
a. [Julia downloads](https://julialang.org/downloads/) and [Platform specific instructions](https://julialang.org/downloads/platform/)
b. [Main Julia page](https://julialang.org/)
c. [Julia docs](https://docs.julialang.org/en/v1/) see also [The Julia Express](https://github.com/bkamins/The-Julia-Express)
d. [Juliacon](https://juliacon.org/2021/)
e. [Julia Discourse](https://discourse.julialang.org/)
f. [Julia packages](https://juliapackages.com/)
g. [Julia VSCode](https://www.julia-vscode.org/)
h. [Statistics with Julia book](https://statisticswithjulia.org/)
i. [UQ Course "Math for programming" to use Julia](https://courses.smp.uq.edu.au/MATH2504/)
3. **Installation and basic usage** (see also installation notes below). We won't use much of the REPL, and with the exception of a demonstration, we won't use much of [Julia for VSCode](https://www.julia-vscode.org/), but these will be demonstrated before moving onto [Jupyter notebooks](https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf).
4. **Julia language basics**: variables, types, flow control, functions, structs, arrays, broadcasting, and more.
5. **Basic scripting activities**: plotting, basic statistics, linear algebra, working with files, working with data frames. Using the in-built packages `Random`, `LinearAlgebra`, and `Statistics`, and additional key packages [`Plots.jl`](http://docs.juliaplots.org/latest/), [`DataFrames.jl`](https://dataframes.juliadata.org/stable/), [`CSV.jl`](https://csv.juliadata.org/stable/), [`HTTP.jl`](https://github.com/JuliaWeb/HTTP.jl), and [`JSON.jl`](https://github.com/JuliaIO/JSON.jl).
6. **Using the package manager** and using the built-in package [`Pkg`](https://docs.julialang.org/en/v1/stdlib/Pkg/).
7. **Calling R or Python from Julia**: Usage of [RCall.jl](https://juliainterop.github.io/RCall.jl/stable/) and [PyCall.jl](https://github.com/JuliaPy/PyCall.jl).
8. **Start of main activity**: a *Monte Carlo epidemic simulation* (see information below). This activity will be worked on individually during the 11:30 - 3:00pm time slot.
9. **Solution:** Then at 3:00pm the instructor will join to help with a solution and specific questions.
## Day 2 (Friday) is about further Julia packages and integration of Julia code in R or Python.
The topics and activities for this day are:
1. **Recap** of the *Monte Carlo epidemic simulation* from the previous day, focusing on several other Julia features.
1. **A few key stats packages**: Usage of the following packages, relating them to the main activity.:
a. [StatsBase.jl](https://juliastats.org/StatsBase.jl/stable/).
b. [StatsPlots.jl](https://github.com/JuliaPlots/StatsPlots.jl).
c. [Distributions.jl](https://juliastats.org/Distributions.jl/latest/).
d. [GLM.jl](https://juliastats.org/GLM.jl/stable/).
e. [Flux.jl](https://fluxml.ai/Flux.jl/stable/).
1. **Language integration in your current language:** Further, integration in R and Python using [JuliaCall](https://cran.r-project.org/web/packages/JuliaCall/index.html) for R and [PyJulia](https://pyjulia.readthedocs.io/en/stable/) for Python. Note that with Julia 1.6 there is currently a [problem](https://discourse.julialang.org/t/segfault-when-jl-init-in-r-also-python-with-julia-1-6-on-macos/58110). So using Julia 1.5 is needed. See [JuliaCall CRAN Ref](https://cran.r-project.org/web/packages/JuliaCall/JuliaCall.pdf).
1. **Start of main activity:** Statistical analysis of output from the Monte Carlo experiment in *either* you current language (R or Python) or directly in Julia, or both.
## The main *Monte Carlo epidemic simulation*
![Epidemic simulation animation](https://github.com/yoninazarathy/2021SSA-Julia/blob/master/epidemic-animation.gif?raw=true)
We create an agent based simulation of epidemics where agents move in rectangular space and infect each other.
Each agent has states `S` (susceptible), `I` (infected), or `R` (removed/recovered). That is, this is a form of an SIR model. Time is in discrete steps. The infection duration is a random variable.
#### Key parameters
* Number of agents (will be used with $10^n$ with $n=1,2,3,4,5$)
* Density of agents assumed fixed. And physical space is fixed. So the number of agents is roughly fixed.
* Aspect ratio of space (assumed quite low). The space is **a rectangular discrete grid** with a far from $1$ aspect ratio (very narrow rectangle).
* Initial infection probability assumed fixed.
Agents move randomly from a slot to all possible 9 slots (including staying at place). But agents do not cross the boundary. Hence a thin rectangle plays a role on infections.
An ideal end goal is to investigate how the aspect ratio affects the course of the infection and particularly final number of infected. Regression analysis will be used for this. Intermediate goals are to observe the coarse of the infection
> A key point with such a simulation is that it is **not** computationally cheap. Doing in it R or Python will often yield much slower run times than doing it in Julia. However, many of the participants may be more comfortable to do the statistical (output) analysis in Julia.
## What to **install** before the workshop
This is a hands-on workshop so it is really recommended that you have a working installation of Julia prior. You can install Julia on Linux, Mac, or Windows quite easily. Having the latest Julia (currently `1.6.1`) is most recommended but `1.4` or above should be fine.
1. If you don't have Julia installed [download Julia](https://julialang.org/downloads/) for your platform and install it. You will then have an application, "Julia" which you can run.
2. Running "Julia" (e.g. by clicking the icon in the desktop or similar) opens a REPL (Read Evaluate Print Loop) window, and in this window you type commands.
3. E.g. In the REPL you can type `1+1` followed by the enter/return key and see `2`.
4. In the REPL type: `using Pkg; Pkg.add("IJulia")` and enter/return. This will start installing the [`IJulia.jl`](https://github.com/JuliaLang/IJulia.jl) package for running Jupyter notebooks.
5. Then type `using IJulia` and enter/return.
6. Then type `notebook()` and enter/return. On this first call to `notebook()`, you may be asked:`install Jupyter via Conda, y/n? [y]:`. Hit `y` and enter/return. This will install `conda` and then spawn the Jupyter environment in your web browser. You now have Jupyter running and can create new Jupyter notebooks (hit "New"->"Julia 1.6.1" and a new tab will open). Note that in the next time you run `notebook()` you won't be asked about installation again.
10. If you haven't used Jupyter notebooks previously here is a [tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/), one of many.
Every time you run Julia and wish to restart you can repeat steps (2), (5), and (6) above.
It is also recommended you install the Julia packages mentioned above. You can do this in the REPL by hitting `]` and going into the package manager mode, but you may also do it in Jupyter.
In Jupyter just use `using Pkg` (to say you wish to use functionality from the in-built `Pkg` package which is used for packages), and then (in the next line of the same cell for example) `Pkg.add("Plots")` where `"Plots"` is the name of the package. You can repeat this for each an every package mentioned above. Namely in addition to `Plots`, do it for `DataFrames`, `CSV`,`HTTP`, `JSON`,`RCall` (only if you have R installed),`PyCall`, `StatsBase`, `StatsPlots`,`Distributions`,`GLM`, and `Flux`.
In the REPL you get more information about package installation than in Jupyter. You can use the `Pkg` commands as above. However if you are in the REPL, then going into the package manager mode is even nicer. Just hit `]`. Then use `add Plots` (or replace `Plots` with any other package). To leave the package manager mode hit backspace.

Safe Blues at The University of Auckland Update: Aug 23, 2021. See more updates here By the Safe Blues team An experiment running at the University of Auckland could provide near-real-time information on the likely spread of Covid. In the experiment, virtual (not real!) viruses are spread within an Android app that members of the experiment run on their phone. When two members of the experiment get close to one another, their phones communicate by Bluetooth, mimicking the spread of real epidemics. Information on virtual virus counts is available almost in real time, as opposed to real infections from Covid, which may only be detected several days after exposure. The researchers running the experiment use statistics and machine-learning techniques to predict what is happening with the real virus based on the information from the virtual viruses, which they call Safe Blues strands, since the virtual viruses are spread using Bluetooth. Each phone running the Safe Blues app carries many different strands. It is the combination of the strands that gives the experiment the ability to predict how the real virus is spreading. In the figure: The evolution of several Safe Blues strands (virtual epidemics). The green curves are counts of exposed participants (infected but still not infectious). The initial weekly cycle can be clearly seen, with fewer numbers at the weekends. The blue curves are infected participants. The vertical red line is the date of the lockdown. Now, after 5 days of lockdown, the system’s live measurements are already showing that the number of exposed participants is decreasing, as reflected in the number of incubating participants. This indicates that the lockdown is working, but this information is not available to decision makers except through Safe Blues. Meanwhile, the number of infected participants is holding approximately constant.

8/23/2021Safe Blues at The University of Auckland Update: Aug 21, 2021. See more updates here By the Safe Blues team An experiment running at the University of Auckland could provide near-real-time information on the likely spread of Covid. In the experiment, virtual (not real!) viruses are spread within an Android app that members of the experiment run on their phone. When two members of the experiment get close to one another, their phones communicate by Bluetooth, mimicking the spread of real epidemics. Information on virtual virus counts is available almost in real time, as opposed to real infections from Covid, which may only be detected several days after exposure. The researchers running the experiment use statistics and machine-learning techniques to predict what is happening with the real virus based on the information from the virtual viruses, which they call Safe Blues strands, since the virtual viruses are spread using Bluetooth. Each phone running the Safe Blues app carries many different strands. It is the combination of the strands that gives the experiment the ability to predict how the real virus is spreading. In the figure: The evolution of several Safe Blues strands (virtual epidemics). The green curves are counts of exposed participants (infected but still not infectious). The blue curves are infected participants. The vertical red line is the date of the lockdown. Now, 4 days into the lockdown, the system’s live measurements are already showing that the number of exposed participants is decreasing, as reflected in the number of incubating participants. This indicates that the lockdown is working, but this information is not available to decision makers except through Safe Blues. Meanwhile, the number of infected participants is holding approximately constant.

8/22/2021
Published on ** HackMD**