Friday, September 18, 2015

malloc vs mmap

While reading the Mach paper, I remembered a question I had while taking my undergrad OS. We built a memory allocator that did the same operations as malloc: allocating and freeing space on the heap. To implement it, we asked for space from the OS using mmap.

I'm confused about the operation of mmap. I can't tell how it relates to files because some of the stackoverflow posts I have been reading seem to suggest that you can use mmap to read files (I know you have to go through the filesystem and use an fread call, I think). Maybe setting up a scenario would help: I have a file (~/Downloads/hclinton.txt) that I want to read in a program. When would I want to use mmap to allocate memory for this file, and when would I want to use malloc?

6 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. There are two choices to operate on files. The first is buffer malloc combined with fread or fwrite to or from the buffer. The other, as you said, is via the use of the mmap system call. mmap actually loads only the file metadata initially. mmap establishes a mapping in the process memory for the file, but the file is lazily read in the main memory as and when a file data access leads to a new page in the memory paged table. The page fault handler loads the page and avoids subsequent faults on access to file data within this page.
    The perplexing bit is that all the articles I found on the comparison between fread/fwrite and mmap seem to point to mmap being the optimal choice. When is fread/fwrite preferable over mmap?

    ReplyDelete
  3. Okay, I think I figured it out. This was taken from a SO post in :
    __
    Second, malloc() is essentially implemented in terms of mmap(), but it's a sort of intelligent library wrapper around the system call: it will request new memory from the system when needed, but until then it will pick a piece of memory in an area that's already committed to the process.
    ___

    So both malloc and mmap will at some point request memory from the kernel, it's just that malloc handles the parceling out of data in much small sized chunks as well as handling fragmentation to some degree.

    So malloc says, 'here's a tiny chunk of memory, and when you free it, I'll take care of fragmentation'

    mmap says, "here's a page. deal with it"

    ReplyDelete
  4. http://stackoverflow.com/questions/8870008/what-if-i-allocate-memory-using-mmap-instead-of-malloc

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. The discussion of when to use read/write (or the buffered library versions, fread/fwrite) vs. mmap is complicated. We can talk about that in class. And mmap has other uses, like setting up shared memory regions between processes. And causing memory to be allocated at a specific virtual address.

    Note that malloc typically acquires new virtual memory using sbrk not mmap. Only in weird cases does malloc use mmap.

    ReplyDelete