String exercises in python

# B2.1.2 String exercises in python :::info This is a note part of the coding course for IB Computer Science program. If you want to check out other parts of the same course you can see them in [This link of teaching material](/68GDv_RgT-yh9oERMvdnFw) ::: :::warning :warning: Note still under construction :warning: ::: ## Review of types Remember strings? Right when we did some basic data types * Numbers (bytes, integers, doubles, floats). In Python we have `int`, `float` and `complex`. * Booleans (true or false). In Python we have `bool`. * Characters (numbers that represent characters such as "A", "ñ", " ", "?", ":+1:" ). In Python they are represented as `str` of one character. * Strings (that are sequences of characters). In Python they are represented by `str` type. :::success You should be able to discuss different types in an exam. Such as which type we would need for representing any specific type of data such as number of balloons in a party, name of the person who came first to that party or if the party was ok or not. ::: ## Now let's try some stuff with strings There is one first line of code that is try to print "Hello world!" In Python as simple as: ```python! print("Hello world!") ``` As you may remember from [Functions in Python](/Apsl9c53SSKET1Mlj7iM2Q), print is a function (or method) that ask for at least one parameter that needs to be a `str` or something casteable to a string and it's going to **output it** through the console. For these exercises we're going to use print to see how we transform it. Since it would be a bit boring using just the regular "Hello World!" we're going to use part of the lyric of a song. Try this: ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(lyric) ``` If you don't know which song is it here you have a hint: ![image](https://hackmd.io/_uploads/S1csT7QWWl.png) :::warning I suggest to substitute this lyric for something that resonates more with you, for the funsies. ::: ## Treat the string as a list Remember [Lists in python](/RWrVgWBOTOebitrb0MSFVw)? We have indexes. We can use them here too! ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(lyric[1]) print(len(lyric)) sublyric = lyric[24:] print(sublyric) ``` ## Accessing and making substrings ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" sublyric = lyric[40:71] print(sublyric) ``` If you need more info, you can check [Slicing in Python](/PNGqCnHNRPmCp-6WHUHH2Q) ### Reversing character by character First the magic spell as it is. Try it! ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" sublyric = lyric[::-1] print(sublyric) ``` Witchcraft, right? Now the explanation. This is done with the "slicing" tool. When we slice we can use one colon (`:`) usually to mark the start and the end of the string. ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" sublyric = lyric[0::-1] print(sublyric) ``` For more information (or refreshment) go to [Slicing in Python](/PNGqCnHNRPmCp-6WHUHH2Q) ## finding Here we can find the **first** appearance of a specific substring ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(lyric.find("up")) ``` If we want the last we can use REVERSE find (rfind) ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(lyric.rfind("up")) ``` If we want to do something in the middle (such as the second up) we need to try something else. For example try to find inside the substring ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" firstIndexOfUp = lyric.find("up") nextRelativeIndexOfUp = lyric[firstIndexOfUp + 1:].find("up") totalIndexofSecondOfUp = firstIndexOfUp + nextRelativeIndexOfUp +1 print(totalIndexofSecondOfUp) ``` Steps of this in pseudocode: ``` * Find the first index where we have "up" * Create a substring that starts with "p, up, up, it's..." because we start with the index of the first up + 1. Use that substring to find the index of up. This index is starting with 0 as the "p" of "p, up, up, it's..." * Calculate the first index + the relative and we add 1 (sinde we substract it from the slice) * output it ``` In a one liner solution it would look like this: ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(lyric.find("up") + lyric[firstIndexOfUp + 1:].find("up") +1) ``` or like this (A bit more reabale but still very concise) ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print( lyric.find("up") + lyric[lyric.find("up") + 1:].find("up") + 1) ``` ## Replacing The method replace will ask for the target of the replacement and the new one. ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(lyric.replace("up", "down") ``` Replacing like this is **tricky** because it may lead to errors if you replace without context. :::success **Story time** ![image](https://hackmd.io/_uploads/ryp5e_bVbe.png) _Now the **points** are named **Renfecitos**_ One spectacular case here in Spain happened with renfe that they changed their loyalty points "puntos renfe" (Renfe points) to "Renfecitos" (small Renfes) and someone in the web administration decided to substitute all the refereces to "puntos" to "Renfecitos", leading to a situation where in PROD you could see the web that said something like "Puede adquirir su billete en los Renfecitos de venta" (you can buy your ticket in the salling small Renfe) that didn't make any sense. Apart from being cautious with replacing strings, this also teaches the importance of having QA to ensure that this doesn't happen. Sources: https://www.eleconomista.es/transportes-turismo/noticias/12897554/07/24/renfe-se-rie-de-su-metedura-de-pata-con-los-renfecitos.html https://www.20minutos.es/noticia/5528557/0/renfe-humor-revuelo-renfecitos-redes/ ::: ## Spliting :::info If you want to make a list of characters you can just send it to the list constructor. This is called **casting** ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" newlist = list(lyric) print(newlist) ``` Will print: ![image](https://hackmd.io/_uploads/rkxTVxmW-l.png) ::: ### Split() ```python= newList = lyric.split() print(newList) print(len(newList)) ``` ```python= newList = lyric.split("e") print(newList) print(len(newList)) ``` #### Process data from a line in a csv file This is going to be expanded [here](https://hackmd.io/LG5DF-vbStKA5nUfx6-Nsg?both#Exam-exercise-the-CSV-of-countries) but let's make it simple. I have this lines of strings ![image](https://hackmd.io/_uploads/ByQPmIvU-l.png) And I want to process them. In this case I have ```python= sampleLine = "072, Botswana, PAISES,23.8028,-22.1802" ``` And I want a list that I can process. I don't want the "072" nor PAISES, but I want the longitude and latitude. (the numbers at the end). Something like this: ![image](https://hackmd.io/_uploads/BJct9LPIZl.png) So I could do this: ```python= sampleLine = "072, Botswana, PAISES,23.8028,-22.1802" sampleList = list() splitLine = sampleLine.split(",") #split by the comma #only adding the elements that we need sampleList.append(splitLine[1]) sampleList.append(splitLine[3]) sampleList.append(splitLine[4]) print(sampleList) ``` Or we could do this: ```python! sampleLine = "072, Botswana, PAISES,23.8028,-22.1802" sampleList = sampleLine.split(",") #split by the comma #taking out the elements that we DON'T need. sampleList.pop(0) sampleList.pop(1) #tricky since initially "PAISES" index is 2, but since we pop the first one now is one print(sampleList) ``` Other option is using slicing and find and rfind.Here is the code: ```python! sampleLine = "072,Botswana,PAISES,23.8028,-22.1802" sampleList = [ sampleLine[sampleLine.find(",")+1:sampleLine.find(",PAISES")], sampleLine[sampleLine.find(",PAISES,")+8:sampleLine.rfind(",")], sampleLine[sampleLine.rfind(",")+1:] ] print(sampleList) ``` When we do this in a whole file, we will need to traverse a sequence of lines (with a loop). :::warning :arrow_right_hook: Little deviation ##### Casting The problem with these implementations is that those numbers (that we probably want to use to make some calculation o draw a map) are considered to be strings, so we cannot operate them. So if I write this code: ```python= sampleLine = "072, Botswana, PAISES,23.8028,-22.1802" sampleList = list() splitLine = sampleLine.split(",") #split by the comma #only adding the elements that we need sampleList.append(splitLine[1]) sampleList.append(splitLine[3]) sampleList.append(splitLine[4]) print(sampleList) print(sampleList[1]*sampleList[2]) #multiplying the 2 numbers ``` I'm going to get this error ![image](https://hackmd.io/_uploads/SyhRUIvLbl.png) For doing that we need to use the ancient thechnique of **casting**. Usually for this you send to a constructor function whatever you want feed in. Do you want something to be converted into a string? `str(potato)`. In this case these numbers are `float` so we add this when we save them or use them. (Better when we save them so we don't cast every time that we use them). This would be the working code :::success ```python! sampleLine = "072, Botswana, PAISES,23.8028,-22.1802" sampleList = list() splitLine = sampleLine.split(",") #split by the comma #only adding the elements that we need sampleList.append(splitLine[1]) sampleList.append(float(splitLine[3])) #casting sampleList.append(float(splitLine[4])) #casting print(sampleList) print(sampleList[1]*sampleList[2]) ``` Output: ![image](https://hackmd.io/_uploads/ByPPw8vL-l.png) ::: ## Concatenating :::info Concatenate is to put one element after the other. ::: Put together Using + operator Remember that adding is commutative but concatinating is not. ## Exercises ### Counting how many words we have in a string Count how many words we have in the string lyric Solution :::spoiler ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" print(len(lyric.split())) ``` ::: ### Reversing the lyric word by word So if it is `We're going' up` it should be `up going 're We` ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" myList = lyric.split() newLyric = "" for x in range(len(myList)): newLyric = newLyric + myList[len(myList)-1-x] + " " print(newLyric) ``` Using reversing lists: ```python! lyric = "We're goin' up, up, up, it's our moment You know together we're glowin' Gonna be, gonna be golden" myList = lyric.split()[::-1] newLyric = "" for word in myList: newLyric = newLyric + word + " " print(newLyric) ``` ### Check if a word (or a string) is a palindrome A palindrome is a word that have the same characters reading from left to right or right to left. For example `radar` is a palindrome and `palindrome` is not. A phrase could be `A nut for a jar of tuna` (without considering spaces as characters). In this case let's keep it just for leters. #### Solution in pseudocode This solution describes what the code should do and from this description you should be able to type it. For this we're going to need a **flag variable** that it's going to start being true unless we see that anything is not being correct to be a palindrome. Then we need to make a loop through the string. We need the indexes so we need to use range or enumerators. In each loop we need to check if that letter in that position is equal to the correspondent letter in the other way of reading. So for example, in the word `radar`, we're going to check the first r with the last r, the first a with the last a and d with itself. If it's not equal, it means that it's not a palindrome and we can break the loop. ```python= word = "andana" palindrome = True for index in range(len(word)//2): if word[index] != word[len(word)-1-index]: palindrome = False break print(palindrome) ``` There is another way that is reversing the list and check if both are the same. ```python! word = "radar" print(True if word[::-1]==word else False) ``` Credit E.C. ### Construct a string that repeat each word So if we have "Hello world!" we should get a string that says "Hello Hello world! world!" ```python= phrase = "Hello world!" mylist = phrase.split() print(mylist) newPhrase = "" for word in mylist: newPhrase = newPhrase + word + " " + word + " " print(newPhrase) newPhrase = "" for word in phrase.split(): newPhrase += word + " " + word + " " print(newPhrase) ``` ### Construct a method that validates if you're writing an email. For doing this we're going to go step by step, by the end we're going to have the full validator, but we're setting constraints to the string one by one. As you may know, emails need to be name@domain.extension, such as fakeemail@blabalaba.com or nameit@emailfromchile.cl For this we're going to start simple, we need to find if name@domain.com so let's see if we can find the `@` --- Now that we have found the `@` we need to make sure it's not the first nor the last index of the string @name is not a valid email nor name@. --- Now we need to find if we have a dot in the string. If it's not then we don't have an email. --- Now we need to make sure that the dot is happening later than the `@` symbol. --- And also that the dot has at least 2 other characters before the end of the string. The domain names needs to be at least 2 caracter long (the .es, .cat, .st, .ru etc) --- To end this, we're going to make sure that there is no spaces in between. :::info **Regular expressions** It's more common to use regular expressions (a especific type of language for making this instructions) It's a bit complicated but useful. For validating if a string is an email you can use `^[\w-\.]+@([\w-]+\.)+[\w-]{2,}$` You can test it here. Regular expressions are outside of the syllabus but can be a good addition for your IA. https://regexr.com/3e48o https://en.wikipedia.org/wiki/Email_address#Valid_email_addresses ::: ## Examlike exercises ### Hidden message The following algorithm performs a task using string methods in python. ![image](https://hackmd.io/_uploads/ByMr_5DB-x.png) Copy and complete the trace table for this algorithm. The values for columns I, J, C, X, Y and Z for the first row have been done for you. [6] | I | J | C | X | Y | Z| R | S | | -------- | -------- | -------- | -------- | -------- |-------- | -------- |-------- | | 3 | | 0 | "ADONUS" | 3 | 2 | :::spoiler | I | J | C | X | Y | Z| R | S | | -------- | -------- | -------- | -------- | -------- |-------- | -------- |-------- | | 3 | | 0 | "ADONUS" | 3 | 2 | "NU" | "UN"| | 6 | "UN" | 1 | "FERGUS" | 0 | 3 | "FER" | "REF"| | 9 | "REFUN" | 2 | "NASREEN" | 1 | 4 | "ASRE" | "ERSA "| | 12 | "ERSAREFUN" | 3 | "TUPPENCE" | 0 | 3 | "TUP" | "PUT"| | 15 | "PUTERSAREFUN" | 4 | "DAMOCLES" | 2 | 3 | "MOC" | "COM"| | 15 | "COMPUTERSAREFUN" | ::: ### Checking passwords A function is required to stablish new passwords. The new passwords must be at least eight characters in length and there must be no two consecutive repeated characters. For example, the password “fEedBack” would be accepted, but the password “FEEDBACK” would fail because of the two consecutive repeated ‘E’ characters. Construct in python a function called checkPassword() that given a password parameter, returns True if the conditions are met and else otherwise. [4] Solution: :::spoiler ```python= def checkPassword(password): if len(password) < 8: return False for index in range(len(password)-1): if password[index] ==password[index +1]: return False return True ``` For validation, one strategy is the following ```python= def validate(whatever): if condition1 to not be valid: return False if condition2 to not be valid: return False # if conditionN to not be valid: return False return True ``` ::: ### isAlpha exercise In Python it exists a method (a function) called isalpha(). This method can be used in any string to check if all the characters that are inside a string are letters(a-z or A-Z). It will return True if that is the case and false otherwise. For example a script like this ![image](https://hackmd.io/_uploads/ByTK4W5UWe.png) Will output “Cat” but if we add any special characters or numbers in the string, it will output False. For example, “My Sharona”.isalpha() would return false because there is an space between My and Sharona. Also “MySharona10”.isalpha() would return false. a) Write the output of this program [1] ![image](https://hackmd.io/_uploads/SJ5cV-cI-e.png) _this requires to know slicing_ Solution: :::spoiler False True ::: In word processing, there are subprograms that check what has been written either to infer or correct the user input. We’re assuming that we have stored in a word processor program a variable called “lastPhrase” and we want to verify if everything in that sentence, except for the spaces, are letters. b) Construct in python in Python the method “onlyLettersAndSpaces() that given lastPhrase as parameter is going to check if, besides the spaces, all the characters are letters(a-z)(A-Z). It will return True if all characters are either spaces or alphabetical characters and False otherwise. You may use the described method isalpha() in your solution [4] ### Counting trees 7) Construct a method in Python called treeCounter() that counts and outputs how many times the word “tree” is written (exactly as it is) in a given sentence sent to the method as a parameter (you may call the parameter sentence). [3] ```python! def treeCountrer(sentence): #write the parameter! words = sentence.split() #split, not slice count = 0 for word in words: if word == "tree": #quotes important count = count +1 print(count) ``` ### Longest word 9) Construct in python the method “longestWord(s)” that given as a parameter a string it will return the longest word it has. For example “Luxury is vulgarity” would return “vulgarity”, “Please, lend me your ear” would return “Please,”. [4] Hint: :::spoiler For this you will need to split the string and then find the maximum length of each string using len() ::: Solution: ```python= ```