Introduction to Python GIS

--- tags: PythonGIS title: Introduction to Python GIS --- # Introduction to Python GIS 7-10.3.2022 :::info * **Course environment:** https://notebooks-beta.rahtiapp.fi/welcome or :::spoiler own installation We highly recommend using the notebooks environment. If you do not want to do that, please have all packages needed for the course installed on your own computer. We recommend to use Miniconda to install the [needed packages](https://github.com/Automating-GIS-processes/notebooks/blob/master/environment.yml) ::: * **Material**: https://autogis-site.readthedocs.io/en/csc/ * **Teachers:** * Håvard Aagesen (University of Helsinki) L1-L5 * Samantha Wittke (CSC) L6 * Kylli Ek (CSC) L7 * **Contact**: giscoord@csc.fi * [Public course page](https://ssl.eventilla.com/event/pENQa?) ::: :::spoiler Course program Day 1, Monday 7.3.2022 9:00-09:30 Practicalities and Introduction round 9:30-10:30 Lesson 1: GIS in Python; Spatial Data Model, Geometric Objects, Shapely 10:30-10:45 Coffee break 10:45-12:15 Lesson 1 continues 12.15-13:00 Lunch break 13:00-14:30 Lesson 2: Working with (Geo)DataFrames 14:30-14:45 Coffee break 14:45-16:15 Lesson 2 continues Day 2, Tuesday 8.3.2022 9:00-10:30 Lesson 3: Geocoding and spatial queries 10:30-10:45 Coffee break 10:45-12:15 Lesson 3 continues 12.15-13:00 Lunch break 13:00-13:15 Running Python scripts on CSC's Puhti supercluster 13:15-14:30 Lesson 4: Geometric operations, reclassifying data 14:30-14:45 Coffee break 14:45-16:15 Lesson 4 continues Day 3, Wednesday 9.3.2022 9:00-10:30 Lesson 5: Visualization, static and interactive maps 10:30-10:45 Coffee break 10:45-12:15 Lesson 5 continues 12.15-13:00 Lunch break 13:00-14:30 Lesson 6: Raster data processing in Python 14:30-14:45 Coffee break 14:45-16:15 Lesson 6 continues Day 4, Thursday 10.3.2022 Optional for course participants, open to everybody. Zoom link for Thursday will be published here latest 8.3.2022. 12:30-13:30 Lesson 7: Running Python code in CSC's Puhti supercomputer 13:30-13:45 Coffee break 13:45-15:15 Lesson 7 continues with hands-on exercise ::: --- ## Course environment: JupyterLab For type-along and exercises we will use **JupyterLab in CSCs Notebook service**, for which you will only need an updated webrowser (Firefox, Chrome or Safari are recommended, others may or may not work) * Please try logging in to https://notebooks-beta.rahtiapp.fi/welcome using your HAKA, Virtu or CSC user account. If you do not have any of these accounts (people from companies) you will recieve a separate email (coming latest Friday before course start). * First day only: 1. click 'Join workspace' in the top bar and insert code `csc-xujqlrzi` (please do not share the code outside this course). * Now you should see 'Python GIS course' in the list on the dashboard. 2. Click `Start session` under 'Python GIS course' 3. **Wait** for a moment, JupyterLab will open. 4. Click on `my-work` in left panel to switch into it 5. use the "paint brush looking symbol" in symbol bar and paste this link into the pop-up: ```https://github.com/csc-training/notebooks.git``` * ![](https://i.imgur.com/uBywsWT.png) 7. Now you should see the course notebooks under my-work ### After each course day * Close the web-browser tab with JupyterLab * Click Delete session in Notebooks Dashboard. Even if you do not do this, the JupyterLab session will automatically end after 8h from starting it. * Everything you have in `my-work` will be stored in the cloud during the course; if you want to download a single file to your own computer you can do so by right click on it > download. If you want to download everything in `notebooks` you will need to use the terminal and the command `tar -zcvf ./my-work/notebooks.tar ./my-work/notebooks ` ## Practicalities ### Recommended set up * 2 screens helps to keep Zoom, HackMD/Material and JupyterLab in workable sizes. * If you have only one screen, consider using a tablet/smartphone as a second screen. * Please ask the presenter to increase font size etc, if not readable for you * Headphones / Headset, for better audio. ### During the course * If you have questions: - Preferably ask with audio - If less related to current topic / also later answer ok, ask here in HackMD * If you need help during exercises: - Ask with audio or chat. - Share your screen if asked for troubleshooting * If you are ready with an exercise, change your name in Zoom participants list, for example to "Maria E(xercise)3 done" * Please mute yourself when not talking but keep your video on during the whole course, if your Internet connection allows ## Optional Lesson7 on Thursday about Python in Puhti HPC * On **Thursday** we will provide an extra optional session for exercises and questions around using **Python on Puhti HPC** for parallel code execution (12:30-15:15) with a separate [Zoom link](https://cscfi.zoom.us/j/68275862802?pwd=TGI5TExkVHExZUpBZzBGdWo0ajJmUT09), this part is open and free for everyone, in case you have interested colleagues. * [Python in Puhti lesson page](https://hackmd.io/RfoAOIGUTjuvQD07-Y7HnA) ## Setting up a similar environment on your own computer * Easiest way is to install Anaconda / Miniconda, it is not the only way, and if you are familiar with others it is not necessary to switch to Anaconda * Some background info: Anaconda is an environment and package manager; Miniconda is its lightweight brother * package: in our case, Python packages/libraries; these packages often have dependencies (other packages that they need to run, eg geopandas depends on pandas, ie it uses parts of pandas); these packages and dependencies might not always work well together or differrent packages depend on different versions of other packages; Conda, Anacondas package manager, finds packages and versions of them that work together, so that you do not need to worry * environment: you may need different packages and maybe even different versions of packages for different projects you are working on which are not interoperable; on your system you can only install one version of Python / Python packages, but we can use environments (which are closed and do not interact with other environments) to make this possible * find more information [on Conda help pages](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) * For installation instructions, see [AutoGIS course page](https://autogis-site.readthedocs.io/en/csc/course-info/installing-miniconda.html) * TL;DR (short version): 1. install and setup *conda for your Operating Sytem (OS) and shell 2. run `conda env create -f https://raw.githubusercontent.com/csc-training/notebooks/master/environment.yml` 3. run `conda activate autogis-environment` -> now you are within the environment with all needed packages 4. if you get any 'package not found' when runninng the notebooks, make sure you have done 3.; if error still there, use `conda install -c conda-forge packagename` to install missing packages into environment ## Questions and Answers * You can ask a question like this * And answer like this * And more like this * also with name (if you want) [name=Samantha] > You can add a note/tipp like this ```python= #and add code like this x= 15 b= 10 c= a+b ``` ### Day 1 ### General * Validity of guest accounts: one month (expire 05.04.2022) * PythonGIS notebooks environment : one month (expire 05.04.2022) * how to run code cells in Jupyter: shift + enter ### Lesson 1 > [Shapely documentation](https://shapely.readthedocs.io/en/stable/project.html) * I get "ImportError" when I try to run * Common for this error: check for typos in the import call > Some info about [f strings in Python](https://realpython.com/python-f-strings/) * Could we just define `xcoords = line.xy[0]` instead of `xcoords = list(line.xy[0])` * in this case (using shapely) working with list makes it easier than array (what we get without using `list`) * how to create new code cells? * have cursor in cell * press 'plus sign' in top toolbar * you can also change the type of the cell from code to markdown (text) from the dropdown in the top toolbar > List comprehension: short way of writing for loop within one line, eg `[[p.x, p.y] for p in [point1, point2, point3]]` > More info on [Python list comprehension](https://realpython.com/list-comprehension-python/) * List of tuples vs list of lists when creating a polygon: Does it matter? * shapely accepts both ways, so either way is fine > Polygons with holes definition: `Polygon(shell=xxx,holes=yyy)` with xxx list of coordinates for the outer polygon and yyy a list of list of hole coordinates (there could be multiple holes) for the holes; shell and hole keywords can also be left out > type `help(Polygon)` for getting help on the Polygon class * What does buffer mean? * create polygon around another element (which can be point, line, polygon) at a certain distance from the original element * see for example [some general information on buffers](https://www.gislounge.com/buffers-in-gis/) * we will also get back to this in later lessons > quick way to assign multiple variables: > ``` min_x , min_y = 0, -90``` > assigns `min_x = 0` and `min_y = -90` > Some information on [Topology](https://docs.qgis.org/3.16/en/docs/gentle_gis_introduction/topology.html?highlight=topology) -> what is checked with `is_valid` ### Lesson 2 * `data.rename(columns=colnames, inplace=True) <ipython-input-17-f16364132c2b>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame ` * just a warning, can be ignored at this point * Just curious: what would happen if we didn't use "inplace=True"? * you would need to store the result in another variable, the current data variable would not be changed > useful for those of us not so familiar with pandas: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf * .nunique() vs .value_counts() vs .shape[0] * for the check-your-understanding all work > Note within 'f prints', note to use different quotation marks for the string that you want to print and strings within that string * did you know the units of the polygon beforehand? * from metadata (CRS) where we downloaded the dataset * why sometimes r strings and sometimes f strings? * f strings: a way to format strings for printing * r strings: defines read access to a string * confidence level for CRS? * only looking at metadata: * different software write the wkt crs string differently, so it can be written in a tiny bit different way depending on software * confidence level shows the confidence with which EPSG code the wkt string can be summarized ### Day2 ### General > Materials of past CSC GIS courses: https://research.csc.fi/gis-learning-materials Inc. PyQGIS. ### Lesson 3 * at nominatim request: what to put for user_agent? * use "csc_xx" with xx being your initials * it is used by nominatim for distinguishing requests * at nominatim request: AdapterHTTPError * run cell again or * adjust timeout setting to give more time * make sure you have unique user_agent name * nomatim trouble: try again later, might be that there has been too much traffic coming from notebooks server * Where did the coordinates came from * we got those from nominatim * it returns coordinates from adresses * why are we joining the data? * we have different data columns in the dataframes, we join them to get them both together in one dataframe, eg we want to keep the id for each adress * How to do conversion from x,y,z and lat,lon? * > Original question: How do we make a coordinate transformation, if we have the coordinates in WGS84 ECEF rectangular coordinates? (origin in the center of earth) * if those have an EPSG code/ proj4 string, you can use conversion strategy shown in Lesson 2 * https://proj.org/operations/conversions/cart.html?highlight=cartesian "Convert geodetic coordinates to cartesian coordinates (in the forward path)." * https://gis.stackexchange.com/questions/366676/proj-pyproj-convert-between-ocentric-and-ographic-latitudes * https://epsg.io/4978 should be the EPSG code for ECEF. * sjoin/join * sjoin (spatial join): looks at geometry * join: can be done by other fields, eg ID of a polygon * also check out `a = gpd.sjoin_nearest(df2, df1)` (not in lesson material) :::info **Quote of the day** "It is a good idea not to have typos" - H.A. ::: ### CSC services Question to all users and future users: What would you need to get started using CSC services? * Competetive prices (e.g compared to AWS), ease of use/training, possibility of dedicated computation resource * * * * * Q: If I want to process large datasets (that are stored on university network drive), do I need to move those datasets into Puhti, or can Puhti somehow read data from a network drive outside CSC? * It likely is best to move to Puhti, basically it can read from any HTTPS/S3 also. ### Lesson 4 * why apply(functionname, arguments) and not call the function 'normally'? * apply method is from pandas to apply the function to the whole dataframe at once, without apply one would need to loop through the dataframe and apply to every cell seperately * is there a difference first loading wgs84 = RS.from_epsg(4236) and then running to_crs(wgs84) or just doing districts.to_crs(4236). Both seemed to produce the same results (or maybe I have some values preloaded or something) * both work fine in this case ### Day 3 * see variables in Jupyter? * For seeing all currently defined variables in jupyter: Type '%whos' in a code cell * -> %xx is called a magic command and jupyter has some really nice and useful ones (except if you plan to export a notebook to python script, then it is better to avoid them) * for more information about a specific variable you can also use '%pinfo varname' * another option is to install an extension like this one: https://github.com/lckr/jupyterlab-variableInspector * Is it possible to have a certificate or something for participating the course? If I would want to get 0,5 or 1 credit from my University. * yes you will get a certificate in the end of the course, it is an automatic system, so please let us know if you have not gotten it in a week or so via servicedesk@csc.fi ### Lesson 5 * Would it be easy to get rid of decimals from the legend? * It is possible to loop over legend elements and change the eg the type ```python= #An example of how to do it for label in ax.get_legend().get_texts(): label_text = label.get_text() lower = label_text.split()[0] lower = lower[:-1] upper = label_text.split()[1] new_text = f'{int(float(lower))}, {int(float(upper))}' label.set_text(new_text) ``` ### Raster lesson New material available from shared, which we need to copy to our my-work directory from terminal: `cp -R /home/jovyan/shared/Raster /home/jovyan/my-work/notebooks/Raster` * * * --- do not write below this line!