Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex and strncpy not working as I understood them

I’m running into multiple issues with the following code:

  1. When I’m running this code on a file containing multiple URLs per line, the regex while loop runs indefinitely.
  2. On a file containing only one HTTP URL on each line, valgrind complains that matches[count] = malloc(sizeof(char) * match_length); and matches[count][match_length] = 0; are Invalid size 1 write. I think my allocation code sucks (sizeof(matches) / sizeof(matches[0]) after my while loops compute to 1).
  3. I thought strncpy would add a null byte at the end of my string automatically but it appears it doesn’t 🤷
  4. Might be related to 3. : After those while loops, if I print each element of the array, they contains a newline, which shouldn’t be the case, only way to get rid of those is to run matches[count][match_length] = 0;

I think I misunderstood lots of part about regex, strncpy and malloc. I haven’t done C programming since college and wanted to get back at it.

    char **matches = (char **)malloc(sizeof(char *));
    char *line = NULL;
    size_t l = 0, count = 0, size = 1;
    ssize_t read = 0;
    
    regex_t rgx;
    if (regcomp(&rgx, "((gopher|https?)://[[:alnum:][:punct:]]+)", REG_ICASE | REG_EXTENDED)) {
        return 0;
    }
    
    while ((read = getline(&line, &l, file)) != -1) {
        regmatch_t match = { 0, 0 };
        int match_length, error;
        if ((error = regexec(&rgx, line + match.rm_eo, 1, &match, 0)) != 0)
            continue;
    
        do {
            printf("Error: %d, Line: %s\n", error, line + match.rm_eo);
            match_length = match.rm_eo - match.rm_so;
            matches[count] = malloc(sizeof(char) * match_length);
            strncpy(matches[count], line + match.rm_so, match_length);
            matches[count][match_length] = 0;
    
            if (++count == size) {
                size *= 2;
                matches = realloc(matches, sizeof(char *) * size);
                if (!matches) {
                    regfree(&rgx);
                    free(line);
                    return 0;
                }
            }
        } while ((error = regexec(&rgx, line + match.rm_eo + 1, 1, &match, REG_NOTBOL)) == 0);
    }

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

There are multiple problems:

  • you should allocate one more byte in matches[count] = malloc(sizeof(char) * match_length); for the null terminator.

  • strncpy does not set a null byte at the end of the destination array if the source string has a length greater or equal to the third length argument. This function semantics are confusing and error prone, learn why you should stop using strncpy already!

  • instead of strncpy(matches[count], line + match.rm_so, match_length); you should use:

    memcpy(matches[count], line + match.rm_so, match_length);
    matches[count][match_length] = '\0';
    

    Note that allocating and copying can be achieved with:

    matches[count] = strndup(line + match.rm_so, match_length);
    
  • matches[count][match_length] = 0; is causing the valgrind error message as you write beyond the allocation length of match_length.

  • matches = realloc(matches, sizeof(char *) * size); causes a memory leak if realloc fails: the original pointer matches is lost and the memory it pointed to is still allocated.

  • you can strip the newline at the end of line before applying the regex matcher.

  • the positions stored in the match structure are relative to the beginning of the string argument to regexec, but you pass line + match.rm_so and line + match.rm_eo + 1 so the offsets are not relative to the line array and cannot be used.

  • after the loop, you cannot use (sizeof(matches) / sizeof(matches[0]) to determine the number of entries in the array. Both matches and matches[0] are pointers, hence the value 1. You must pass count back to the caller along with the matches pointer.

Here is a modified version:

char **filter_urls(FILE *file, size_t *countp) {
    size_t count = 0, size = 1;
    char **matches = (char **)malloc(sizeof(char *) * size);
    char *line = NULL;
    size_t line_size = 0;
    int fail = 0;
    ssize_t len;
    regex_t rgx;

    if (matches == NULL)
        return NULL;

    if (regcomp(&rgx, "((gopher|https?)://[[:alnum:][:punct:]]+)", REG_ICASE | REG_EXTENDED)) {
        free(matches);
        return NULL;
    }
    
    while ((len = getline(&line, &line_size, file)) != -1) {
        // strip the trailing newline if any (optional)
        //if (len > 0 && line[len - 1] == '\n') {
        //    line[--len] == '\0'
        //}
        regmatch_t match = { 0, 0 };
        int flags = 0;
        char *ptr = line;
        while (regexec(&rgx, ptr, 1, &match, flags) != 0) {
            size_t match_length = match.rm_eo - match.rm_so;
            matches[count] = strndup(ptr + match.rm_so, match_length);
            if (matches[count] == NULL) {
                fail = 1;
                break;
            }
            if (++count == size) {
                size_t new_size = size * 2;
                char **new_matches = realloc(matches, sizeof(char *) * new_size);
                if (new_matches == NULL) {
                    fail = 1;
                    break;
                }
                size = new_size;
                matches = new_matches;
            }
            ptr += match.rm_eo;
            flags = REG_NOTBOL;
        }
    }
    regfree(&rgx);
    free(line);
    if (fail) {
        // allocation failure: free all
        while (count > 0) {
            free(matches[--count]);
        }
        free(matches);
        return NULL;
    }
    matches[count] = NULL;
    *countp = count;
    return matches;
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading