Toby Hodges
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Introduction to Regular Expressions 2020-03-18 Supriya Khedkar & Toby Hodges Download example files here: https://oc.embl.de/index.php/s/kQavCkbomGu4zyM Please add an 'x' under this line when you've got the example files xXXxxXXXxXXXXx ## Sign-in Please add your name below - Patrick Hasenfeld - Jia Le Lim - Laura R. de la Ballina - Patricia Carvajal - Pamela Ferretti - Loukia Georgatou-Politou - Luca Santangeli - Jesus Alvarado - Umut Yildiz - Martin Gutierrez - Tafsut Tala-Ighil - Biruhalem Taye - Abi Wood - Christopher Hall - Silvia Natale - Anusha Gopalan - Kimberly Meechan - William Andrés López Arboleda - Renato Alves - Katharina Zirngibl - Filipa Torrao - Mariya Timotey Miteva ## Markdown essentials - type headings with '# ' at the start of the line - bullet points with '- ' like this - __bold__/**bold** and *italic*/_italic_ - [links with a combination of `[` and `(` brackets](https://bio-it.embl.de/) - `code` (i.e. monospaced font) with backticks ``` and multiple lines of code with triple backticks on their own lines ``` ```python import numpy def print_explanation: explanation = ''' you can even turn on syntax highlighting for most languages by naming the language immediately after the opening three backticks ''' print(explanation) ``` ## Notes _please use this area to take notes throughout the course_ ### Introduction - Regular expressions are a way to describe patterns in text. - Can be used in many different applications - You can use them in Microsoft Word Example: how to identify a phone number in a long document? (set of 12 numbers) Search for each occurence of 0, or generate all possible 12-digit long numbers ? Too long. Use a token describing the the phone number, and looking for the pattern of a 12-digit long number, whatever its value. Needs a regular expression engine: tools learnt today will be applicable to mutliple programming languages. install a text editor like notepad++, atom, textwrangler, ... https://regex101.com/ - Can give a text string as an input and a character string (regular expression) to look for, in this document. - Is able to look for RE in texts, but not to replace them: need to use a text editor. ### Fundamentals regex101.com - copy & paste content of example.gff file - search for HDAC (can also search for numbers) - search is case sensitive (to make it case insensitive, with certain languages, we can use -i) #### Exercise 2.1 **Find every instance of a histone deacetylase (HDAC) in the file, based on the gene name.** Are there any potential problems with your search? Think about how you might make your search as specific as possible, to avoid spurious matches. _If there are any repeats of 'HDAC' in a field other than gene name, that could be problematic. To reduce spurious matches, we could search specifically for `Name=HDAC` to specify the field as well as the value._ **How to combine a search** To look for HDAC1 and HDAC2 at the same time, enter `Name=HDAC[12]`. It will not fetch HDAC12 (will be a partial match of the first 5 characters) Can also give ranges: `[A-Z]`,`[0-9]` #### Exercise 2.2 a) In total, how many lines mention HDAC 1-5 in `example.gff`? Solution: `Name=HDAC[1-5]` To avoid that HDAC11-59 are recognized, we can add a ^ which prevents the matching `Name=HDAC[1-5][^0-9]` To look for the character string `1-5` and not the range of numbers between 1 and 5, insert a hyphen before `1-5` to specify that you are not looking for a range: `Name=HDAC[-1-5]` b) Which of the following expressions could you use to match any four-letter word beginning with an uppercase letter, followed by two vowels, and ending with 'd'? ``` i) [upper][vowel][vowel]d ii) [A-Z][a-u][a-u]d iii) [A-Z][aeiou][aeiou]d iv) [A-Z][aeiou]*2;d ``` c) Try playing around with the character ranges that you have learned above. What does `[A-z]` evauluate to? What about `[a-Z]`? Can you create a range that covers all letter and number characters? Solution:`[A-z]` will match all letters, not depending on the case. To have all letter and number characters: `[A-z0-9]` `[0-z]` will also work, because characters are ordered as: numbers, uppercase letters, lowercase letters. i ii iii xxXxxXxxxxxxx iv e.g. `[upper][vowel][vowel][d]` would match "pwld" (because 'p' is in the first set, w in the second, l in the third) _Please type 'X' under this line when you're done with these exercises_ xxxxxxxxxxxx When specifying character ranges such as `[1-z]` the order assumed is based on the [ASCII table](https://www.asciitable.com/). The order is: `` !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~``. #### Exercise 2.3 Use an inverted set to only match the human autosomes (chr1-22), i.e. filtering out chromosomes chrX, chrY and chrM. How many records with autosomes can you find in file `example.gff`? _Please type 'X' under this line when you're done with these exercises_ Xxxxx Solution: `chr[^XYM]` #### Summary: Sets and Ranges Groups of characters that can be matched in a certain position are specified between [] • Letters, numbers, symbols [&k8Y] • Uppercase [A-Z] • Lowercase [a-z] • Numbers [0-9] More examples: • [N-Z] • [5-9] • [a-f0-9] For inverted sets (non-matching characters) use ^ as shown here [^268] ### Tokens & Wildcards Token | Matches | Set Equivalent | ------|---------------------------------------------------------|----------------| `\d` | Any digit | `[0-9]` | `\s` | Any whitespace character (space, tab, newlines) | `[ \t\n\r\f]` | `\w` | Any 'word character' - letters, numbers, and underscore | `[A-Za-z0-9_]` | `\b` | A word boundary character | n/a | `\D` | The inverse of `\d` i.e. any character except a digit | `[^0-9]` | `\S` | Any non-whitespace character | `[^ \t\n\r\f]` | `\W` | Any non-word character | `[^A-Za-z0-9_]`| #### Exercise 3.1 Match dates of the form 31.01.2017 (DAY.MONTH.YEAR) in the example file `person_info.csv`. Pay attention to not wrongly match phone numbers. How many matches do you find? _Please type 'X' under this line when you're done with these exercises_ xxxxxxxxxxxTest-gist.md --- Test xxx Solution: `\d\d[.]\d\d[.]\d\d\d\d` or `\d\d\.\d\d\.\d\d\d\d` Note: . any character, \. the dot character #### Exercise 3.2 How can you refine the regex from the previous exercise to prevent matching strings with preceding or succeeding digits, such as 131.01.20171? Solution: `\b\d\d\.\d\d\.\d\d\d\d\b` #### Exercise 3.3 When designing a regular expression to match exactly four digits, what would be the difference between using the two regular expressions `\b\d\d\d\d\b` and `\D\d\d\d\d\D`? Solution: `\D\d\d\d\d\D` also matches non-digit flanking characters _Please type 'X' under this line when you're done with these exercises_ xxxxxxxxxxXxx #### Exercise 3.4 ~~Count how many sequences in `example_protein.fasta` are of transcript_biotype "protein_coding". Hint: sequence records have a header that starts with the character "`>`".~~ _Please type 'X' under this line when you're done with these exercises_ xxxxxx ### Repeats Symbol | Behaviour | Example | Matches | Doesn't Match | -------|-----------------------------------------------|---------|----------|---------------| `+` | Match one or more of preceding character/set | `AGC+T` | AGCCCT | AGT | `?` | Match zero or one of preceding character/set | `AGC?T` | AGT | AGCCT | `*` | Match zero or more of preceding character/set | `AGC*T` | AGCCCCCT | AGG | ### Excercise 4.1 a) Which of the follow strings will be matched by the regular expression MVIA*CP? i) MVIAACP ii) MVICP iii) MVIACP iv) all of the above a) i) ii) iii) iv) xxxxxx b) Write a regular expression to match the string "ATGCTTTCG" and "ATCTCG" but not "ATGGCCG" Solution: `ATG?CT+CG` _Please type 'X' under this line when you're done with these exercises_ xxxxxxxxxxx #### Exercise 4.2 use `{}` to search `example_protein.fasta` for trytophan (W) followed by tyrosine (Y), with an increasing number of leucine (L) residues in between. Start by searching for this pattern with three leucines (i.e. 'WLLLY'), then reduce this to two, and one. Is this working as you expect? How would you search for a range of lengths of the leucine repeat? Try searching for any patterns with at between one and four leucines. What happens if you leave out the upper limit inside the `{}`? Or the lower limit? Can you leave out both? Please type 'X' under this line when you're done with these exercises xxxxxxxxx Solution: `WL{1,4}Y` Note: [] - set () - group {} - number of occurences of preceding character/set, can also have upper an lower limit, like {1,3} ### Capture Groups Example: reorder band names alphabetically. To do so, we need to put "The" at the end of each line. First: select the lines that we want to reframe Capture groups because we can capture a region of the input, and reuse it for the replacement value To define a capture group, put it in parenthesis: (+) To reintroduce it in the replacement line: $1 or \1 Limited to 9 capture groups. #### Exercise 5.1 The file `example_protein_malformed.fasta` is missing the `>` character at the beginning of the headers. Use a capture group to add them. Solution: `(^ENSP.+)` regex for identifying and grouping >$1 for subsitution #### Exercise 5.2 The file `file_paths.txt` contains file paths of image files. The files are organised by folders based on vacations, but the files themselves have cryptic names. You want the files to be prefixed by the vacation and move them into a shared folder. At the end the list should look like: ``` /Users/Jane/shared/vacation-pics/France-2015-IMG-06650.jpg /Users/Jane/shared/vacation-pics/France-2015-IMG-06651.jpg ... /Users/Jane/shared/vacation-pics/France-2017-IMG-08449.jpg ... /Users/Jane/shared/vacation-pics/Greece-2016-IMG-07895.jpg ... ``` Use a capture group to transform the file paths accordingly. Solution: Pictures\/(w+-\d{4})\/(IMG-\d+\.jpg) Substitution: shared/vacation-pics/$1-$2 Tip: When designing regular expressions be as specific as possible, greedy regular expression definitions like .+ could cause problems ### Alternative Matching Example: `Homo Sapiens|Mus Musculus`: enables to look for both: the `|` signifies "or" #### Exercise 6.1 If you study the contents of the file `person_info.csv`, you will see that some variation exists in the address formatting. For example, some of the addresses use 'First Street' while others use '1st Street' or some other variation. Can you find all the lines containing information on a person living on 1st/First Street/street, using a single regular expression? Solution(s):`[Ff]irst [Ss]treet|1st [Ss]treet` `([F|f]irst|1st)\s([Ss]treet)` Please type 'X' under this line when you're done with these exercises xxxxxxxxxxxxxxxx #### Exercise 6.2 The FASTQ file example.fastq contains sequence reads with quality scores, in the format ``` @sequence header line with barcode sequence sequence + quality scores for basecalls ``` Unfortunately, the barcode sequences in the header lines are wrong, and the barcodes are still attached to the front of the sequences. There are three barcodes that we are interested in: AGGCCA, TGTGAC, and GCTGAC. a) how many reads are there in the file for each of these barcodes? >Solution: >AGGCCA: 25 >TGTGAC: 29 >GCTGAC: 19 b) write a regular expression that will find and these barcodes and a replacement string that will remove them from the start of the sequences in which they are found >Solution:^(AGGCCA|TGTGAC|GCTGAC) >Replacement string: nothing, because we want to remove this region c) of course, the format of the file means that you should probably remove the quality scores associated with those sequence positions too. Rewrite your regex so that the barcode sequence AND its corresponding quality scores (i.e. the first six characters on the sequence and quality lines) are removed. >Solution: `^(AGGCCA|TGTGAC|GCTGAC)([ATGCN]+\n\+\n)(.{6})` >Replacement string: `$2` d) finally, can you build on the regex and replacement string from part c), to replace the incorrect index sequences in the header lines with the barcodes for each relevant sequence record? > Solution: `(\d:\w:\d:)[ATGCN]{6}\n(AGGCCA|TGTGAC|GCTGAC)([ATGCN]+\n\+\n)(.{6})` > Replacement: `$1$2\n$3` Tip: to group character strings, to apply "or", but not consider the group as one, use `?:` at the beginning of the parenthesis as so: `(?:)` Please type 'X' under this line when you're done with these exercises x ### Feedback https://de.surveymonkey.com/r/denbi-course?sc=hdhub&id=000259 In question 11, we would particularly love to hear about what we could do to improve the way we teach **online** in the future - we will probably need to do this a lot in 2020! :) ### Applied Examples #### Command line `perl -ane '<perl expression>'` - `-a`: automatically split string into groups based on delimiter (default delimiter is whitespace, I think) - `-n`: parse every line - `-e <perl_expression>`: run _perl_expression_ on every line It is Very Important to use single quotes around your Perl expression R:x xxxX Python: XXxx #### R To use regular expressions in R one could use stringr package (on its own and also in combinationtion with tidyverse) Most of the commands that we learned today work with stringr (ofcourse with a need to use "\\w") as shown in this cheat-sheet http://edrub.in/CheatSheets/cheatSheetStringr.pdf stringr commands can be used with filter (tidyverse command) For example: Solution to applied examples question 1 with stringr would be - 'data %>% filter(str_detect(Organism_name,"^Vibrio_*"))' #### Python - as with everything in Python, regular expressions are all about objects! - you need to `import re` (a module from the standard library) to work with them - use raw strings to describe your regex pattern - type 'r' before you open the string - this saves you from _Backslash Hell_ - e.g. `r'\w\s[A-Z]'`, which is equivalent to `'\\w\\s[A-Z]'` - two approaches you can take: - compile a regex object with `re.compile(pattern)` - use functions and pass pattern as an argument - regex object option: ```python import re pattern = r'example pattern goes here' regex_object = re.compile(pattern) dir(regex_object) # match: see if regex matches *at the start* of the target string # search: see if regex matches exist anywhere in target string # findall: returns list of substrings matching regex from target string # finditer: as above, but returns iterator instead of list print(regex_object.match('example string without match goes here')) print(regex_object.match('example string with match goes here')) ``` - more about match objects later... - regex functions option: ```python import re pattern = r'example pattern goes here' print(re.match(pattern, 'example string with match goes here')) print(re.search(pattern, 'example string with internal match goes here')) ``` - if a match is found, a match object is returned (otherwise, `None`) - have several useful methods ```python pattern = r'[A-Z][a-z]{4:6}' patternObj = re.compile(pattern) matchObj = patternObj.match('Where are we going?') matchObj.group() matchObj.start() matchObj.end() matchObj.span() ``` - as `None` is returned when no match is found, use `if`/`else` to handle multiple patterns ```python import re m = re.match(r'\d\d-\d\d-2019', ' 04-09-2019') if m: print('Match found: ', m.group()) else: print('No match') ``` - if using capture groups, the `group` method becomes more interesting ```python import re pattern = r'([CG]{3})+(TATA[AT]A[AT])([CG]+)' match = re.match('CGCTATAAAAGGGC') match.group() match.group(0) match.group(2) ``` - more at https://docs.python.org/3/howto/regex.html ## Recommended Resources & Further Reading (feel free to add your own) - [The material for this course](https://tobyhodges.gitbooks.io/introduction-to-regular-expressions/content/) - in case you want to check back in the future (or recommend it to others &#128521; ) - [Library Carpentry Regular Expressions lesson](https://librarycarpentry.org/lc-data-intro/) - another nice introductory lesson, with some good exercises - [regular-expressions.info](http://www.regular-expressions.info) - the definitive guide to regex. - [Debuggex](https://www.debuggex.com) - a handy interface to visualise how your regex will match a pattern. - [Regex Crossword](https://regexcrossword.com) - play crosswords with regular expressions. Provides a very good opportunity to practise your skills. - [regex101](https://regex101.com/) - feature-rich regular expression tester / explorer - [Awesome Regex](https://github.com/aloisdg/awesome-regex) - curated list of regex resources ### Copy of Zoom Chat 10:29:37 From Toby : a few people may still not have the link to the notes, in which case - https://hackmd.io/@JKgItcTtSZmVTddVipOPMA/regex2020/edit 10:35:51 From Pamela_Ferretti : /hand 10:35:58 From Pamela_Ferretti : Can you repeat? This last aprt 10:36:00 From patrick hasenfeld : why the hyphen? 10:36:03 From Pamela_Ferretti : yes 10:36:07 From Silvia : Where are questions exercise? 10:36:37 From Silvia : The file 10:37:09 From Toby : The exercises are added to the notes as we go along 10:37:26 From Toby : these exercises start at line 108 10:38:56 From Laura : /hand 10:42:29 From Jia Le Lim : Quick question, is the \ necessary in front of the d => like \d 10:43:08 From Jia Le Lim : Ah okay, thank you 10:45:18 From Anusha Gopalan : /hand 10:45:36 From patrick hasenfeld : interestingly [1-z] also matches uppercase letters 10:46:07 From Toby : it does! 10:59:20 From Renato Alves : I added a note about the order of the characters in ranges such as [1-z] to the collaborative document. 11:05:20 From supriya : I have added summary of the regex fundamentals to the shared document 11:06:13 From Toby : exercise on line 144 11:08:48 From Jia Le Lim : chr[^XYM] 11:08:56 From Anusha Gopalan : chr[^oMXY] 11:11:03 From Patricia Carvajal : /hand 11:11:27 From Patricia Carvajal : chr[0-9] matches only one character after chr 11:11:46 From Patricia Carvajal : how do we match x number of characters? 11:12:14 From Laura : /hand. Sorry, just one question. Is ^ both a gap and a negation? 11:20:08 From Anusha Gopalan : /hand : why isn’t it matching the second instance from “is an..” 11:21:09 From Anusha Gopalan : thank you! 11:27:37 From Jia Le Lim : /hand : is \bis\b equivalent to ^is$ 11:31:20 From Jia Le Lim : Thank you! 11:31:49 From Festus Nyasimi : how do you match ^ in a line 11:37:06 From Festus Nyasimi : And again matching ^ at the start of a line 11:44:07 From Anusha Gopalan : \hand: So both \. And [.] match literally “.”? 11:44:07 From Patricia Carvajal : /hand 11:44:50 From Patricia Carvajal : I got a little lost, how would you match a character "^" at the end of a line? 11:45:23 From Anusha Gopalan : It doesn’t define a particular ascii? 11:46:37 From Festus Nyasimi : the . can't be escaped in a set? 11:46:38 From Jia Le Lim : \^$ ? 11:47:21 From Patricia Carvajal : thanks!! 12:09:13 From Martin Gutierrez : /hand could you explain what ? does again? 12:10:05 From Martin Gutierrez : ok thanks! 12:19:35 From patrick : could youshow the exercide again? 12:25:13 From Pamela_Ferretti : As you prefer 12:25:15 From Pamela_Ferretti : :) 12:25:21 From patrick : yes 12:28:03 From Patricia Carvajal : /hand 12:28:21 From Patricia Carvajal : can't unmute the mic 12:28:23 From Patricia Carvajal : ok 12:28:35 From Patricia Carvajal : T{3}, would match 3 Ts 12:28:53 From Patricia Carvajal : {1,3} what does it indicate? 12:29:29 From Patricia Carvajal : thanks 12:29:57 From Anusha Gopalan : /hand : can you look for repeats of combinations of characters - like TA repeating three times a sequence like GCCTATATACGGA ? 12:29:59 From Patricia Carvajal : yes, it does 12:29:59 From Mariya to Toby (Privately) : yes Toby this is the keyboard - Alt+è do not work, also +è do not work 12:30:43 From Anusha Gopalan : Ok thank you 12:30:58 From jesus alvarado : (AT){3}? 12:33:22 From jesus alvarado : See you 12:33:29 From Toby : back at 13:33 12:33:39 From Renato Alves : Enjoy lunch /bye 12:34:01 From Patricia Carvajal : Enjoy 12:34:16 From Luca Santangeli : Enjoy :) 13:38:57 From supriya : We are at exercise 4.2 (shared document line251) 13:42:19 From Laura : /hand. Sorry, could you say the answer to the last part of the exercise? What happens if you leave out the upper limit inside the `{}`? Or the lower limit? Can you leave out both? 14:00:27 From Silvia : Is it possible to copy and paste everything in excel? 14:02:33 From Toby : @Silvia: are you asking whether it’s possible to use regular expressions in Excel, or suggesting that the kind of “group_1, group_2, group_3” reorganisation would be possible with Excel too? 14:04:05 From Silvia : Since your answer.. both :) 14:05:06 From Toby : the answer to 1. is “yes, I think so, but it looks kinda complicated to set up”: https://analystcave.com/excel-regex-tutorial/ 14:07:04 From Toby : and to 2: yes, this kind of example, where every line looks the same and groups are anyway separated by some standard character e.g. “,” can be solved easily enough with Excel too. But that approach is less useful where the pattern only occurs on a selection of lines, and/or where the patterns appear at unpredictable points in the line(s) 14:07:46 From Toby : (see the FASTA file in the exercise for example) 14:09:46 From Silvia : Thank you! 14:10:24 From Patricia Carvajal : identifies well, but I have not been able to add the simbol ">" at the substitution part 14:17:49 From patrick : - is not in w+ 14:19:12 From Renato Alves : Careful that the . in .jpg needs to be escaped. 14:23:51 From Anusha Gopalan : /hand 14:24:28 From Christopher Hall : /hand can I count the / and choose between them in Regex, or this a problem for slices and programming languages 14:27:40 From patrick : /hand for dcumentation, whats was the term if you use loose regeular expressions lie .+ and not specific ones 14:28:01 From Renato Alves : greedy was the word Toby used 14:28:10 From patrick : thanks 14:30:54 From Patricia Carvajal : what file is this one? 14:31:24 From Toby : we don’t have a file for this, sorry. 14:32:05 From Patricia Carvajal : ok, thanks 14:32:18 From Toby : I’ve pasted it at the bottom of the shared notes (line 338) 14:32:29 From Toby : in case you want to use exactly what Supriya had 14:40:11 From Anusha Gopalan : /hand 14:40:21 From Laura : I think you are also missing the first F 14:41:49 From Filipa : could we do it like this? or is it not safe? (1st|First) (Street|street) 14:42:20 From Patricia Carvajal : how about ([F|f]irst|1st)\s([Ss]treet) 14:42:50 From Filipa : ah yes, ok thank you 14:43:04 From Toby : I like Patricia’s 14:43:06 From Festus Nyasimi : ([Ff]irst|1st) ([sS]treet|[Aa]ve) 14:43:11 From Toby : think it would work 14:51:08 From Patricia Carvajal : /hand 14:51:13 From Patricia Carvajal : we used 14:51:28 From Patricia Carvajal : | for some sort of an "or" expression 14:51:46 From Patricia Carvajal : do we have an equivalent for "and" 14:51:47 From Patricia Carvajal : ? 14:52:45 From Toby : “and” would be an extension of the regular expression i.e. “patternApatternB” is effectively the same as “patternA AND patternB” already 14:54:36 From Patricia Carvajal : it would be more like a "starts with AND finishes with" (regardless on what's in the middle) 14:57:28 From Renato Alves : Patricia: in the example Toby mentioned you can use "patternA.*patternB" the dot represents any character and * as many matches as possible. 15:00:19 From supriya : Patricia you could also use anchor to denote begins with and ends with and * for any characters in between as Renato suggested above 15:01:57 From patrick : the quality line is not \w 15:02:01 From Festus Nyasimi : The quality scores are mostly symbols 15:08:35 From patrick : why is $1 the whole line, you just capered the last part of the string 15:14:51 From Festus Nyasimi : How do group counting occur when groups are embedded in groups 15:16:24 From Toby : https://de.surveymonkey.com/r/denbi-course?sc=hdhub&id=000259 15:18:31 From supriya : file_applied_examples_proks.txt 15:18:36 From Toby : @Festus: from the order that the opening parentheses appear 15:18:51 From supriya : exercise_applied_examples.txt 15:19:26 From Festus Nyasimi : Thanks 15:19:48 From Toby : i.e. with (fes(tus ny)asim)i $1 = festus nyasim & $2 = tus ny 15:20:05 From Toby : see here for more info https://stackoverflow.com/questions/1313934/how-are-nested-capturing-groups-numbered-in-regular-expressions 15:20:59 From Renato Alves : See also https://www.regular-expressions.info/named.html if you prefer to use a name or tag instead of $1, $2, .... 15:21:34 From Toby : +1 thanks, Renato 15:22:30 From Laura : /hand, in my computer it says regex command not found, do we need to install something before? 15:25:01 From Toby : the “mac-bork35:regex khedkar$” part is Supriya’s prompt - you shouldn’t type this part 15:25:15 From Toby : (the command begins at “cat …”) 15:25:20 From Laura : I’ve just realised that, sorry :) 15:25:25 From Laura : Got confused 15:25:26 From Toby : no worries 15:26:39 From Patricia Carvajal : sorry, got confused on the perl part 15:27:00 From Toby : could you explain the option flags again? 15:27:31 From patrick : maybe with awk 15:27:32 From Laura : No, no I got confused 15:28:51 From patrick : is F[0] the first element of the line? 15:32:45 From patrick : lookd like a japanese emoji 15:32:56 From Toby : haha 15:35:19 From Toby : “the file itself is not clean” is the tagline of “Bioinformatics: The Movie” 15:35:26 From Toby : :( 15:36:11 From Laura : /hand 15:36:12 From patrick : are you mossing the genus_family_ATCC lines? 15:36:20 From patrick : *missing 15:37:19 From Toby : to learn more about this, you might want to read our Linux command line course materials :) 15:37:20 From patrick : no worries, they are includes 15:37:36 From Toby : https://bio-it.embl.de/course-materials/ 15:37:56 From Patricia Carvajal : I have to go. Thanks a lot for the course =) 15:38:29 From Laura : Great, thanks! 15:40:20 From Martin Gutierrez : I have to leave. Thanks for the great course! 15:46:51 From william A. lópez : Sorry 16:00:36 From Toby : https://de.surveymonkey.com/r/denbi-course?sc=hdhub&id=000259 16:01:10 From patrick : thanks, very well presented 16:01:13 From Umut Yildiz : Thanks for the course! See you! 16:01:26 From Katharina Zirngibl : Thank you very much!! 16:01:27 From Laura : Thanks a lot!! =) 16:01:29 From user : Thank you very much! 16:01:29 From jesus alvarado : Thanks!! 16:01:30 From Renato Alves : Wooo! Thanks Suprya and Toby 16:01:31 From william A. lópez : Thanks 16:01:35 From Jia Le Lim : Bye! Thanks!

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully