Data is messy, and there are several reasons why it can be useful to extract the character after a number in a string in R. In my case, I was looking at a string that included times dropped in without much structure. Oftentimes, the times indicated 8A or 8P for AM or PM. I needed to extract the A or P in order to know which it was.
The function below accounts for several conditions such as when a string doesn't include any numbers.
This also treats a number that is more than 1 digit long as a single number. For instance 12 counts as one number and not 2 seperate numbers.
extract_chars <- function(strings) {
results <- list()
for (i in seq_along(strings)) {
string <- strings[i]
if (is.na(string)) {
results[[i]] <- character(0)
next
}
# Use str_extract_all() to extract the character after each number, counting 2-digit numbers as 1 number
result <- str_extract_all(string, "(?<=\\d{1,2})\\D", simplify = TRUE)
# Check if result is empty
if (length(result) == 0) {
results[[i]] <- character(0)
next
}
# Combine 2-digit numbers with the previous number
for(j in 2:length(result)+1) {
if(!is.na(result[j-1]) && nchar(result[j-1]) == 2) {
result[j-1] <- paste0(result[j-1], result[j])
result[j] <- ""
}
}
# Remove empty elements from result
result <- result[result != ""]
results[[i]] <- result
}
return(results)
}
In this code, we use the regular expression (?<=\\d{1,2})\\D to extract any non-digit character that follows a sequence of one or two digits. This should correctly identify the 2-digit numbers in the string as a single entity.
Once we have the result vector, we loop through it in order to combine 2-digit numbers with the previous number, and remove any empty elements from the result vector.
We can then execute the code as below:
strings <- c("ab12c34de56", "1a23b45c67d")
result <- extract_chars(strings)
print(result)