All questions were completed using text editor, BBEdit
Question 1:
What we have:
First String Second 1.22 3.4
Second More Text 1.555555 2.2220
Third x 3 124
What we want:
First String,Second,1.22,3.4
Second,More Text,1.555555,2.2220
Third,x,3,124
How to do it: Find spaces between characteres and replace with commas.
- Find: \s{2,}
- Replace: ,
Question 2:
What we have:
Ballif, Bryan, University of Vermont
Ellison, Aaron, Harvard Forest
Record, Sydne, Bryn Mawr
What we want:
Bryan Ballif (University of Vermont)
Aaron Ellison (Harvard Forest)
Sydne Record (Bryn Mawr)
How to do it: Find the first character/word (1), find a comma, find a space, find the second character/word (2), find a space, add parenthesis and find everything else (3), and reorder the list.
- Find: (\w+),\s*(\w+),\s(.*)
- Replace: \2 \1 (\3)
Question 3:
What we have:
0001 Georgia Horseshoe.mp3 0002 Billy In The Lowground.mp3 0003 Winder Slide.mp3 0004 Walking Cane.mp3
What we want:
0001 Georgia Horseshoe.mp3
0002 Billy In The Lowground.mp3
0003 Winder Slide.mp3
0004 Walking Cane.mp3
How to do it: Find first character/word that is in front of “.mp3” (1) and add a line break.
- Find: (.mp3+)\s
- Replace: 1\n\
Question 4:
What we have:
0001 Georgia Horseshoe.mp3
0002 Billy In The Lowground.mp3
0003 Winder Slide.mp3
0004 Walking Cane.mp3
What we want:
Georgia Horseshoe_0001.mp3
Billy In The Lowground_0002.mp3
Winder Slide_0003.mp3
Walking Cane_0004.mp3
How to do it: Find the first four numbers, find the space, take everything in front of “.mp3”, then add an underscore, and put “.mp3” back in its original spot.
- Find: (\d{4})\s(.*[^.mp3])(.mp3)
- Replace: \2_\1\.mp3
Question 5:
What we have:
Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55
What we want:
C_pennsylvanicus,44
C_herculeanus,3
M_punctiventris,4
L_neoniger,55
How to do it: Find the all of the characters, numbers, and commas within the string of text and remove them, then take the first letter of the first character/word (1), add an underscore, then add the third character/word (3) in the string of text.
- Find: (\w)(\w+),(\w+),(\w*..)
- Replace: \1_\3
Question 6:
What we have:
Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55
What we want:
C_penn,44
C_herc,3
M_punc,4
L_neon,55
How to do it: Find the all of the characters, numbers, and commas within the string of text and remove them, and replace them by putting the first character/word (1) with an underscore, then add the third character/word (3) in the string of text.
- Find: (\w)\w+.(\w{4})\w+,\d+.\d(,\d+)
- Replace: \1_\2\3
Question 7:
What we have:
Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55
What we want:
Campen, 44, 10.2
Camher, 3, 10.5
Myrpun, 4, 12.2
Lasneo, 55, 3.3
How to do it: Find the first (1) and second (2) character/word (1) (2) along with the commma between them, then find the remaining numbers in the string of text, then isolate the first three letters in both the first (1) and the second (2) words, and reorder them with the fourth (4) character first, add a comma, then the third (3) character.
- Find: (\w{3})\w+,(\w{3})\w+,(.*),(\d+)
- Replace: \1\2, \4, \3
Question 8:
Problem 1: Look at the pathogen_binary column-what input are you expecting to see in this column?
- Find:(.*)(,NA,)
- Replace:\1,0,
Problem 2: What are the main errors in bombus_spp and host_plant?
There are a bunch of additional characters included in both bombus_spp and host_plant.
First, I wanted to remove additional characters and did that using the following (method 1.)
However, I noticed that this removed the underscores in the first row. I tried to isolate the first row and only remove characters from lines 2-21. I spent about 5 hours trying to figure this out with no luck, but here is the code I used to isolate the first row from the remaining lines (method 2.)
Method, part 1:
- Find: [^,/.a-zA-Z\d\s:]
- Replace: (don't replace anything)
Method, part 2:
- Find: (?<=\n).*(?=\n)
- Replace: (don't replace anything)
Problem 3: Is there anything visibly wrong with bee_caste?
- Find: ,
- Replace:,