# JULIA FOR STATISTICS AND DATA SCIENCE. YOUR FIRST STEPS IN STATISTICS AND INTEGRATION WITH YOUR CURRENT PLATFORM. *An [SSA](https://www.statsoc.org.au/) workshop by [Yoni Nazarathy](https://yoninazarathy.com/), July 8-9, 2021.* Last edit: Day of July 8. (Subject to mild updates). --- Welcome to the workshop! The workshop spans two days where in each day the schedule is as follows (times are AEST): * **9am – 11am** - Lecture by instructor (encouraging online live Q&A). * **11am -11:30am** – Structured activity with online support by the instructor. * **11:30am-3:00pm** – Time **without** instructor. Participants are encouraged to continue working on the activity and prepare questions. The Zoom link can still be used for chat between peers. * **3pm – 4pm** – Meeting again with the instructor to share progress of the activity and get feedback with further exploration. Please use this [Zoom link](https://us02web.zoom.us/j/88573062519?pwd=M2c5TllmdVZ0b0dLNzRkMjk2L3BSdz09) only after you registered for the event. Code and data for the workshop is in this [GitHub Repo](https://github.com/yoninazarathy/2021SSA-Julia/). You can also see the main Jupyter notebook for day 1 via [NBViewer](https://nbviewer.jupyter.org/github/yoninazarathy/2021SSA-Julia/blob/master/2021-SSA-Julia-Workshop-Nazarathy.ipynb). And similarly for [day2](https://nbviewer.jupyter.org/github/yoninazarathy/2021SSA-Julia/blob/master/Day2.ipynb). Note that on the first day (Thursday), there will be an **install party** starting at 8am (also on the Zoom link). Basically the instructor will be available to try and help with installations to those that still haven't installed the software. ## Day 1 (Thursday) focuses purely on Julia in the Jupyter environment. The topics and activities for this day are: 1. **About Julia**: A quick summary of what to expect and key resources. a. [Julia downloads](https://julialang.org/downloads/) and [Platform specific instructions](https://julialang.org/downloads/platform/) b. [Main Julia page](https://julialang.org/) c. [Julia docs](https://docs.julialang.org/en/v1/) see also [The Julia Express](https://github.com/bkamins/The-Julia-Express) d. [Juliacon](https://juliacon.org/2021/) e. [Julia Discourse](https://discourse.julialang.org/) f. [Julia packages](https://juliapackages.com/) g. [Julia VSCode](https://www.julia-vscode.org/) h. [Statistics with Julia book](https://statisticswithjulia.org/) i. [UQ Course "Math for programming" to use Julia](https://courses.smp.uq.edu.au/MATH2504/) 3. **Installation and basic usage** (see also installation notes below). We won't use much of the REPL, and with the exception of a demonstration, we won't use much of [Julia for VSCode](https://www.julia-vscode.org/), but these will be demonstrated before moving onto [Jupyter notebooks](https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf). 4. **Julia language basics**: variables, types, flow control, functions, structs, arrays, broadcasting, and more. 5. **Basic scripting activities**: plotting, basic statistics, linear algebra, working with files, working with data frames. Using the in-built packages `Random`, `LinearAlgebra`, and `Statistics`, and additional key packages [`Plots.jl`](http://docs.juliaplots.org/latest/), [`DataFrames.jl`](https://dataframes.juliadata.org/stable/), [`CSV.jl`](https://csv.juliadata.org/stable/), [`HTTP.jl`](https://github.com/JuliaWeb/HTTP.jl), and [`JSON.jl`](https://github.com/JuliaIO/JSON.jl). 6. **Using the package manager** and using the built-in package [`Pkg`](https://docs.julialang.org/en/v1/stdlib/Pkg/). 7. **Calling R or Python from Julia**: Usage of [RCall.jl](https://juliainterop.github.io/RCall.jl/stable/) and [PyCall.jl](https://github.com/JuliaPy/PyCall.jl). 8. **Start of main activity**: a *Monte Carlo epidemic simulation* (see information below). This activity will be worked on individually during the 11:30 - 3:00pm time slot. 9. **Solution:** Then at 3:00pm the instructor will join to help with a solution and specific questions. ## Day 2 (Friday) is about further Julia packages and integration of Julia code in R or Python. The topics and activities for this day are: 1. **Recap** of the *Monte Carlo epidemic simulation* from the previous day, focusing on several other Julia features. 1. **A few key stats packages**: Usage of the following packages, relating them to the main activity.: a. [StatsBase.jl](https://juliastats.org/StatsBase.jl/stable/). b. [StatsPlots.jl](https://github.com/JuliaPlots/StatsPlots.jl). c. [Distributions.jl](https://juliastats.org/Distributions.jl/latest/). d. [GLM.jl](https://juliastats.org/GLM.jl/stable/). e. [Flux.jl](https://fluxml.ai/Flux.jl/stable/). 1. **Language integration in your current language:** Further, integration in R and Python using [JuliaCall](https://cran.r-project.org/web/packages/JuliaCall/index.html) for R and [PyJulia](https://pyjulia.readthedocs.io/en/stable/) for Python. Note that with Julia 1.6 there is currently a [problem](https://discourse.julialang.org/t/segfault-when-jl-init-in-r-also-python-with-julia-1-6-on-macos/58110). So using Julia 1.5 is needed. See [JuliaCall CRAN Ref](https://cran.r-project.org/web/packages/JuliaCall/JuliaCall.pdf). 1. **Start of main activity:** Statistical analysis of output from the Monte Carlo experiment in *either* you current language (R or Python) or directly in Julia, or both. ## The main *Monte Carlo epidemic simulation* ![Epidemic simulation animation](https://github.com/yoninazarathy/2021SSA-Julia/blob/master/epidemic-animation.gif?raw=true) We create an agent based simulation of epidemics where agents move in rectangular space and infect each other. Each agent has states `S` (susceptible), `I` (infected), or `R` (removed/recovered). That is, this is a form of an SIR model. Time is in discrete steps. The infection duration is a random variable. #### Key parameters * Number of agents (will be used with $10^n$ with $n=1,2,3,4,5$) * Density of agents assumed fixed. And physical space is fixed. So the number of agents is roughly fixed. * Aspect ratio of space (assumed quite low). The space is **a rectangular discrete grid** with a far from $1$ aspect ratio (very narrow rectangle). * Initial infection probability assumed fixed. Agents move randomly from a slot to all possible 9 slots (including staying at place). But agents do not cross the boundary. Hence a thin rectangle plays a role on infections. An ideal end goal is to investigate how the aspect ratio affects the course of the infection and particularly final number of infected. Regression analysis will be used for this. Intermediate goals are to observe the coarse of the infection > A key point with such a simulation is that it is **not** computationally cheap. Doing in it R or Python will often yield much slower run times than doing it in Julia. However, many of the participants may be more comfortable to do the statistical (output) analysis in Julia. ## What to **install** before the workshop This is a hands-on workshop so it is really recommended that you have a working installation of Julia prior. You can install Julia on Linux, Mac, or Windows quite easily. Having the latest Julia (currently `1.6.1`) is most recommended but `1.4` or above should be fine. 1. If you don't have Julia installed [download Julia](https://julialang.org/downloads/) for your platform and install it. You will then have an application, "Julia" which you can run. 2. Running "Julia" (e.g. by clicking the icon in the desktop or similar) opens a REPL (Read Evaluate Print Loop) window, and in this window you type commands. 3. E.g. In the REPL you can type `1+1` followed by the enter/return key and see `2`. 4. In the REPL type: `using Pkg; Pkg.add("IJulia")` and enter/return. This will start installing the [`IJulia.jl`](https://github.com/JuliaLang/IJulia.jl) package for running Jupyter notebooks. 5. Then type `using IJulia` and enter/return. 6. Then type `notebook()` and enter/return. On this first call to `notebook()`, you may be asked:`install Jupyter via Conda, y/n? [y]:`. Hit `y` and enter/return. This will install `conda` and then spawn the Jupyter environment in your web browser. You now have Jupyter running and can create new Jupyter notebooks (hit "New"->"Julia 1.6.1" and a new tab will open). Note that in the next time you run `notebook()` you won't be asked about installation again. 10. If you haven't used Jupyter notebooks previously here is a [tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/), one of many. Every time you run Julia and wish to restart you can repeat steps (2), (5), and (6) above. It is also recommended you install the Julia packages mentioned above. You can do this in the REPL by hitting `]` and going into the package manager mode, but you may also do it in Jupyter. In Jupyter just use `using Pkg` (to say you wish to use functionality from the in-built `Pkg` package which is used for packages), and then (in the next line of the same cell for example) `Pkg.add("Plots")` where `"Plots"` is the name of the package. You can repeat this for each an every package mentioned above. Namely in addition to `Plots`, do it for `DataFrames`, `CSV`,`HTTP`, `JSON`,`RCall` (only if you have R installed),`PyCall`, `StatsBase`, `StatsPlots`,`Distributions`,`GLM`, and `Flux`. In the REPL you get more information about package installation than in Jupyter. You can use the `Pkg` commands as above. However if you are in the REPL, then going into the package manager mode is even nicer. Just hit `]`. Then use `add Plots` (or replace `Plots` with any other package). To leave the package manager mode hit backspace.