• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Dirname is Evil

    February 11th, 2016
    tech  [html]
    I recently was writing some code [1] that needed to know the parent directory of a file:

      size_t final_slash = filename.find_last_of('/');
      return filename.substr(0, final_slash);
    

    Why do this when there's dirname(3)? Because dirname is evil:

    • It has big traps.
    • Different implementations have different traps.

    On some systems, dirname modifies its input. For example, here's an implementation that's nearly [2] posix conforming:

      char* dirname(char* path) {
        static char dot[] = ".";
        if (!path) return dot;
        char* last_slash = NULL;
        for (char* p = path; *p; p++) {
          if (*p == '/') last_slash = p;
        }
        if (!last_slash) return dot;
        *last_slash = '\0';
        return path;
      }
    
    There are nice things about this: it doesn't need to allocate any memory and it's thread safe. This is what glibc does and is probably the most common behavior. Still, modifying your input string may not be what you want!

    Systems can choose, however, to define it in other ways. For example, here's an implementation that leaves its input alone, but instead isn't thread-safe.

      char* dirname(char* path) {
        static char buffer[PATH_MAX];
        static const char dot[] = ".";
        if (!path) return dot;
        size_t last_slash_pos = -1;
        for (size_t i; path[i]; i++) {
          if (i >= PATH_MAX) return dot
          if (path[i] == '/') last_slash_pos = i;
        }
        if (last_slash_pos == -1) return dot;
        strncpy(buffer, path, last_slash_pos);
        buffer[last_slash_pos] = '\0';
        return buffer;
      }
    

    Instead of modifying its argument, this version of dirname uses internal storage. This means that it's not thread safe, and you can't trust its return value to stick around if you call anything that might possibly also call dirname.

    One more thing: dirname returns a char* not a const char* but it's not always safe to modify its return value. For example, glibc does:

      char *dirname (char *path) {
        static const char dot[] = ".";
        ...
        /* This assignment is ill-designed
           but the XPG specs require to
           return a string containing "."
           in any case no directory part is
           found and so a static and constant
           string is required.  */
        path = (char *) dot;
        return path;
      }
    

    This means if you give dirname a slashless string and pass the output to something that modifies its input, you'll pass compile-time const checking but you're in for problems at runtime. [3]

    So if you're going to use dirname you have to treat it as being both thread unsafe and input modifying. At which point it's much easier to use something else that's better specified.

    (Warning: I haven't actually tried running or even compiling these code samples.)


    [1] Update 2016-02-12: that code no longer needs anything like dirname at all because I rewrote it to handle everything with pipes instead of PID files.

    [2] I've left out the bit where it's supposed to ignore trailing '/' characters.

    [3] Either changing the return value of dirname for future calls, or undefined behavior, I'm not sure which.

    Comment via: google plus, facebook

    Recent posts on blogs I like:

    More on the Deutschlandtakt

    The Deutschlandtakt plans are out now. They cover investment through 2040, but even beforehand, there’s a plan for something like a national integrated timetable by 2030, with trains connecting the major cities every 30 minutes rather than hourly. But the…

    via Pedestrian Observations July 1, 2020

    How do cars fare in crash tests they're not specifically optimized for?

    Any time you have a benchmark that gets taken seriously, some people will start gaming the benchmark. Some famous examples in computing are the CPU benchmark specfp and video game benchmarks. With specfp, Sun managed to increase its score on 179.art (a su…

    via Posts on Dan Luu June 30, 2020

    Quick note on the name of this blog

    When I was 21 a friend introduced me to a volume of poems by the 14th-century Persian poet Hafiz, translated by Daniel Ladinsky. I loved them, and eventually named this blog for one of my favorite ones. At some point I read more and found that Ladinsky’s …

    via The whole sky June 21, 2020

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact