owned this note
owned this note
Published
Linked with GitHub
# Software Carpentry Workshop: Lesson3: Python Programming User Defined Functions
---
**Instructor**: Jees Augustine
**Time**: 1 Hour 30 Minutes
**Lesson Overview** We have covered the basic [Python](https://python.org) usage here in this section
- Creating and Manipulating Variables
- Data Types and Data Structures
- Importing Data Files using Pandas
- Calling Built-in Functions
- Subsetting Data with Pandas
- Writing simple python scripts
**Now we will cover:**
- General structure of python program
- Writing user-defined(your own) functions
- Placement of function definitions within a program
- Default arguments/scope
- Passing arguments to Python scripts from command line
- Final product : a script with well-defined parts that include user-defined functions and accepts command-line arguments
### General structure of python program
Let's start with the python script Balan made yesterday morning:
```python
##This is PlotLifeExp.py script
####### import statements ##########
import pandas as pd
import matplotlib.pyplot as plt
####### reading data into Python ##########
#read data into python
my_file = pd.read_table("gapminder.txt")
####### data analysis #########
#select information about Canada
Canada = my_file.loc[my_file['country'] == "Canada"]
#plot lifeExp
Canada.plot.line(x='year',y='lifeExp',label = "Life Expectancy",figsize=(8, 6))
plt.suptitle('Life Expectancy in Canada Over the years', fontsize = 20)
plt.xlabel('Year', fontsize = 16)
plt.ylabel('Life Expectancy', fontsize = 16)
plt.show()
```

We can subdivide python program into few parts here:
1. `Import statements` tell us what libraries are used in the program
2. `Reading data` tells us what data is used in the program
3. `Data analysis` tells us how data is analyzed. In this script, we select parts of the dataset we are interested in and plot it
Let's focus on `Data analysis` part of this script. Here, we simply select lines that contain inforamtion about Canada. But what if you wanted to know about Sweden? You can just write another line of code:
```python
Sweden = my_file.loc[my_file['country'] == "Sweden"]
```
But notice that we are repeating most of the code… What would be nice is to have a function, say called Get_countryData that we can use to get information for any country of our choice
## Introduction example
```python
def celsius_fahrenheit():
# F = (C * 9/5) + 32
fahrenheit = (22 * (9 / 5)) + 32
return fahrenheit
```
```python
celsius_fahrenheit()
```
71.6
* you can see here that we have a function that converts celsius to fahrenheit, using a celsius measure of 22 (room temp)
* note how we use def to define a function of a give name, in this case celsius_fehrenheit
* the parentheses allow you to pass arguments to the function, which we haven't done in this case
* once the function has been created, you can run it by typing function_name()
* if arguments are desired, they would be included inside the parentheses
* one big issue with this function is that it is way too limited, only converting one measure of celcius to fahrenheit
* let's make it a bit more flexible by adding an argument
```python
def celsius_fahrenheit(C):
# F = (C * 9/5) + 32
fahrenheit = (C * (9 / 5)) + 32
return fahrenheit
```
```python
celsius_fahrenheit()
```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-51-b76cf3b6336a> in <module>()
----> 1 celsius_fahrenheit()
TypeError: celsius_fahrenheit() missing 1 required positional argument: 'C'
* notice the error we receive if we called it like we did before, which is telling us that an argument is required
* let's give it one
```python
celsius_fahrenheit(C=22)
```
71.6
* you can see this recapitulated what we did before by passing 22 celsius and getting fahrenheit back
* this is more flexible, however, because we aren't constrained to using only 22
```python
celsius_fahrenheit(30)
```
86.0
another important point is that you can define a default value, which is useful in avoiding errors and making fuctions 'dummy-proof', though it isn't always a good idea
```python
def celsius_fahrenheit(C=22):
# F = (C * 9/5) + 32
fahrenheit = (C * (9 / 5)) + 32
return fahrenheit
```
```python
celsius_fahrenheit()
```
71.6
* you can see that if no argument is passed, it automatically uses the 5 specified
* but we can still pass an argument and override the default
---
## Challenge
* create a function that converts fahrenheit to celsius, much like the one we already have
* also set default argument values so that it will produce some result, no matter what
* lastly, determine what the difference is between the supplied temperature, which you are converting to celsius, and * a 2nd argument that supplies a temperature that is already in celsius
```python
def fahrenheit_celsius(F=67, check=22):
# C = (F - 32) x (5 / 9)
celsius = (F - 32) * (5 / 9)
diff_cel = check - celsius
return diff_cel
```
```python
fahrenheit_celsius()
```
2.5555555555555536
```python
fahrenheit_celsius(F=5)
```
37.0
```python
fahrenheit_celsius(5)
```
37.0
```python
fahrenheit_celsius(check=6)
```
-13.444444444444446
```python
fahrenheit_celsius(F=33, check=67)
```
66.44444444444444
```python
fahrenheit_celsius(32, 48)
```
48.0
## Advanced example
* so far what we have done is pretty simple, but you now know the basics of creating modular code with functions
* let's go back to our full dataset of demography and human health data for US counties and write a meaningful function based on that
* thinking back to yesterday, what do you have to do first to start working with that data?
* now you can do the actual data import
```python
##This is PlotLifeExp.py script
####### import statements ##########
import pandas as pd
import matplotlib.pyplot as plt
####### reading data into Python ##########
#read data into python
my_file = pd.read_table("gapminder.txt")
####### data analysis #########
#select information about Canada
Canada = my_file.loc[my_file['country'] == "Canada"]
#plot lifeExp
Canada.plot.line(x='year',y='lifeExp',label = "Life Expectancy",figsize=(8, 6))
plt.suptitle('Life Expectancy Over the years', fontsize = 20)
plt.xlabel('Year', fontsize = 16)
plt.ylabel('Life Expectancy', fontsize = 16)
plt.show()
```

## Lets write some user-defined functions
* only difference is that this includes data for only one contry that you wanted
* however, what if you want to plot for a new country other than 'Canada'
* you could make a new script that does what you desire and run it in the shell
but let's do this totally in python using a function
### Lets write functions for the follwing functionality
* lets say we wanted to get data for one country
* create a fucntion to abstract all the plotting functions
```python
def get_country(country):
####### data analysis #########
#select information about the 'country name '
country_df = my_file.loc[my_file['country'] == country]
return country_df
```
```python
country_df = get_country('India')
country_df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>country</th>
<th>continent</th>
<th>year</th>
<th>lifeExp</th>
<th>pop</th>
<th>gdpPercap</th>
</tr>
</thead>
<tbody>
<tr>
<th>696</th>
<td>India</td>
<td>Asia</td>
<td>1952</td>
<td>37.373</td>
<td>372000000</td>
<td>546.565749</td>
</tr>
<tr>
<th>697</th>
<td>India</td>
<td>Asia</td>
<td>1957</td>
<td>40.249</td>
<td>409000000</td>
<td>590.061996</td>
</tr>
<tr>
<th>698</th>
<td>India</td>
<td>Asia</td>
<td>1962</td>
<td>43.605</td>
<td>454000000</td>
<td>658.347151</td>
</tr>
<tr>
<th>699</th>
<td>India</td>
<td>Asia</td>
<td>1967</td>
<td>47.193</td>
<td>506000000</td>
<td>700.770611</td>
</tr>
<tr>
<th>700</th>
<td>India</td>
<td>Asia</td>
<td>1972</td>
<td>50.651</td>
<td>567000000</td>
<td>724.032527</td>
</tr>
</tbody>
</table>
</div>
```python
def plotter(country_dataframe):
#plot lifeExp
country.plot.line(x='year',y='lifeExp',label = "Life Expectancy",figsize=(8, 6))
plt.suptitle('Life Expectancy of over the years', fontsize = 20)
plt.xlabel('Year', fontsize = 16)
plt.ylabel('Life Expectancy', fontsize = 16)
plt.show()
```
```python
plotter(country)
```

#### Lets Try to group them all together in a single script
```python
##This is PlotLifeExp.py script
####### import statements ##########
import pandas as pd
import matplotlib.pyplot as plt
####### User Defined Functions Begin ##########
def get_country(country):
####### data analysis #########
#select information about the 'country name '
country_df = my_file.loc[my_file['country'] == country]
return country_df
def plotter(country_dataframe):
#plot lifeExp
country.plot.line(x='year',y='lifeExp',label = "Life Expectancy",figsize=(8, 6))
plt.suptitle('Life Expectancy of over the years', fontsize = 20)
plt.xlabel('Year', fontsize = 16)
plt.ylabel('Life Expectancy', fontsize = 16)
plt.show()
####### User Defined Functions End ##########
#read data into python
my_file = pd.read_table("gapminder.txt")
####### data analysis #########
#select information about Canada
canada = my_file.loc[my_file['country'] == "Canada"]
#plot lifeExp
plotter(canada)
####### data analysis #########
#select information about Greece
greece = my_file.loc[my_file['country'] == "Greece"]
#plot lifeExp
plotter(greece)
```


### But lets do some more tests to understand what functions return
```python
type(country)
```
pandas.core.frame.DataFrame
> So its a DataFrame
### Lets see if every fucntions return something everytime
> #### Challenge 2
> Try to see what your plotter fucntion might return
> Hint: it is returning nothing
>> #### Solution Challenge 2
>> Hint: it is returning nothing
>> type(plotter(greece))
What if you want to
```python
```