Homework 3 - Regular Expression Puzzles

All questions were completed using text editor, BBEdit

Question 1:

What we have:

First String    Second      1.22      3.4
Second          More Text   1.555555  2.2220
Third           x           3         124

What we want:

First String,Second,1.22,3.4
Second,More Text,1.555555,2.2220
Third,x,3,124

How to do it: Find spaces between characteres and replace with commas.

- Find: \s{2,}
- Replace: ,

Question 2:

What we have:

Ballif, Bryan, University of Vermont
Ellison, Aaron, Harvard Forest
Record, Sydne, Bryn Mawr

What we want:

Bryan Ballif (University of Vermont)
Aaron Ellison (Harvard Forest)
Sydne Record (Bryn Mawr)

How to do it: Find the first character/word (1), find a comma, find a space, find the second character/word (2), find a space, add parenthesis and find everything else (3), and reorder the list.

- Find: (\w+),\s*(\w+),\s(.*)
- Replace: \2 \1 (\3)

Question 3:

What we have:

0001 Georgia Horseshoe.mp3 0002 Billy In The Lowground.mp3 0003 Winder Slide.mp3 0004 Walking Cane.mp3

What we want:

0001 Georgia Horseshoe.mp3
0002 Billy In The Lowground.mp3
0003 Winder Slide.mp3
0004 Walking Cane.mp3

How to do it: Find first character/word that is in front of “.mp3” (1) and add a line break.

- Find: (.mp3+)\s
- Replace: 1\n\

Question 4:

What we have:

0001 Georgia Horseshoe.mp3
0002 Billy In The Lowground.mp3
0003 Winder Slide.mp3
0004 Walking Cane.mp3

What we want:

Georgia Horseshoe_0001.mp3
Billy In The Lowground_0002.mp3
Winder Slide_0003.mp3
Walking Cane_0004.mp3

How to do it: Find the first four numbers, find the space, take everything in front of “.mp3”, then add an underscore, and put “.mp3” back in its original spot.

- Find: (\d{4})\s(.*[^.mp3])(.mp3)
- Replace: \2_\1\.mp3

Question 5:

What we have:

Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55

What we want:

C_pennsylvanicus,44
C_herculeanus,3
M_punctiventris,4
L_neoniger,55

How to do it: Find the all of the characters, numbers, and commas within the string of text and remove them, then take the first letter of the first character/word (1), add an underscore, then add the third character/word (3) in the string of text.

- Find: (\w)(\w+),(\w+),(\w*..)
- Replace: \1_\3

Question 6:

What we have:

Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55

What we want:

C_penn,44
C_herc,3
M_punc,4
L_neon,55

How to do it: Find the all of the characters, numbers, and commas within the string of text and remove them, and replace them by putting the first character/word (1) with an underscore, then add the third character/word (3) in the string of text.

- Find: (\w)\w+.(\w{4})\w+,\d+.\d(,\d+)
- Replace: \1_\2\3

Question 7:

What we have:

Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55

What we want:

Campen, 44, 10.2
Camher, 3, 10.5
Myrpun, 4, 12.2
Lasneo, 55, 3.3

How to do it: Find the first (1) and second (2) character/word (1) (2) along with the commma between them, then find the remaining numbers in the string of text, then isolate the first three letters in both the first (1) and the second (2) words, and reorder them with the fourth (4) character first, add a comma, then the third (3) character.

- Find: (\w{3})\w+,(\w{3})\w+,(.*),(\d+)
- Replace: \1\2, \4, \3

Question 8:

Problem 1: Look at the pathogen_binary column-what input are you expecting to see in this column?

I expect to see “0s” in the place of “NAs”, so I replaced them using the following method:

- Find:(.*)(,NA,)
- Replace:\1,0,

Problem 2: What are the main errors in bombus_spp and host_plant?

There are a bunch of additional characters included in both bombus_spp and host_plant.
First, I wanted to remove additional characters and did that using the following (method 1.)
However, I noticed that this removed the underscores in the first row. I tried to isolate the first row and only remove characters from lines 2-21. I spent about 5 hours trying to figure this out with no luck, but here is the code I used to isolate the first row from the remaining lines (method 2.)

 Method, part 1:
- Find: [^,/.a-zA-Z\d\s:]
- Replace: (don't replace anything)

Method, part 2:
- Find: (?<=\n).*(?=\n) 
- Replace: (don't replace anything)

Problem 3: Is there anything visibly wrong with bee_caste?

There are additional “(space),” behind “worker”, so I removed them using the following method:

- Find: ,
- Replace:,

Homework 3 - Regular Expression Puzzles

Grace (Rei) Jia

2025-01-29