The Smart Chef

Extract character after number in a string

Data is messy, and there are several reasons why it can be useful to extract the character after a number in a string in R. In my case, I was looking at a string that included times dropped in without much structure. Oftentimes, the times indicated 8A or 8P for AM or PM. I needed to extract the A or P in order to know which it was.

The function below accounts for several conditions such as when a string doesn't include any numbers.

This also treats a number that is more than 1 digit long as a single number. For instance 12 counts as one number and not 2 seperate numbers.

  


extract_chars <- function(strings) {
  results <- list()

  for (i in seq_along(strings)) {
    string <- strings[i]

    if (is.na(string)) {

      results[[i]] <- character(0)
      next
    }
    # Use str_extract_all() to extract the character after each number, counting 2-digit numbers as 1 number
    result <- str_extract_all(string, "(?<=\\d{1,2})\\D", simplify = TRUE)

    # Check if result is empty
    if (length(result) == 0) {
      results[[i]] <- character(0)
      next
    }

    # Combine 2-digit numbers with the previous number
    for(j in 2:length(result)+1) {
      if(!is.na(result[j-1]) && nchar(result[j-1]) == 2) {
        result[j-1] <- paste0(result[j-1], result[j])
        result[j] <- ""
      }
    }

    # Remove empty elements from result
    result <- result[result != ""]

    results[[i]] <- result
  }

  return(results)
}

In this code, we use the regular expression (?<=\\d{1,2})\\D to extract any non-digit character that follows a sequence of one or two digits. This should correctly identify the 2-digit numbers in the string as a single entity.

Once we have the result vector, we loop through it in order to combine 2-digit numbers with the previous number, and remove any empty elements from the result vector.

We can then execute the code as below:

  
strings <- c("ab12c34de56", "1a23b45c67d")
result <- extract_chars(strings)
print(result)