What is memory? (part 2): The anatomy of a process

  1. Everything is just data and context. When hardware and software components are interacting with each other in a computer system, they’re interpreting bytes that have a specific meaning in context. The example we covered was a text file, which contained ASCII encoded bytes that were rendered by our text editor as Latin characters, numbers, and some punctuation. We’ll see some new examples of this principle in practice in this article.
  2. We also learned that virtual memory is an abstraction that gives a process the illusion that it has exclusive access to all memory on the system via a huge, sparse array called its virtual address space. The process requests memory from the operating system, which fills the array with chunks of memory that the process can read and write. Today, we’ll learn about segments, which are chunks of a process’s address space that are initialized before it starts to run, and which form the “skeleton” of a process. I’ll explain what I mean by “skeleton” below.

What happens when you “run” a process?

$ ./run_program

A note on ELF and executable file formats

  1. The files that are output by compilers and linkers after parsing and compiling your source code.
  2. Parsed by loaders when bootstrapping a new process’s address space.
  1. We’ll be describing everything in terms of ELF, because most of the tooling for creating and analyzing ELF files is free and open source in Unix environments.
  2. The job of a loader is to parse binary executable files, and initialize the address space of a new process based on the contents of those files. Note that there are other types of ELF files too that are not just executable files, such as shared libraries. Don’t worry about them for now, we’ll discuss them more at some point in another article.

Segments form the skeleton of a process

  1. Creating a list of all of the segments it finds in the program’s ELF file.
  2. Allocating space for the segment in the new process’s address space by asking the operating system for a chunk of memory at the appropriate address.
  3. Initializing that newly allocated space with the contents of the segment.
  4. Asking the operating system to ensure that the segment has the correct access permissions.
Traced by User:Stannered, original by en:User:Dysprosia, BSD <http://opensource.org/licenses/bsd-license.php>, via Wikimedia Commons

An example: printing our lucky_number

// lucky_number.c
#include <stdio.h>

static int lucky_number = 7;

int main() {
printf("My lucky number is %d!\n", lucky_number);
lucky_number = 2;
printf("My lucky number is now %d!\n", lucky_number);
return 0;
}
$ clang -o lucky_number lucky_number.c
$ ./lucky_number
My lucky number is 7!
My lucky number is now 2!
  1. code: As we mentioned above, the program’s instructions will reside in a segment. In this case, the segment will contain the bytes corresponding to instructions in the main() function, and also some other code that we don’t see in our source, but in fact runs before and after main(). The CPU has to read and execute the process’s instructions, and we don’t want anything to be able to change the contents of our process’s code after it’s compiled and started to run, so the code segment should have read and execute permissions, but not write permissions.

    In case you’re wondering, the bytes for printf() are actually located in another code segment. This is because printf() is defined in a separate shared library. As mentioned above, you can disregard this for now – we’ll come back to shared libraries in a later post.
  2. Writable data: We’ve also defined some data in our program. Specifically, we have a global int called lucky_number, which is present for the entirety of the runtime of the program, and which is read twice (when printing) and written once (between the printf() calls). That variable will therefore be stored in the process’s writable data segment. Because we need to read and write that variable, it will have read and write permissions. We will never be executing anything stored in the lucky_number variable, so the segment should not have execute permissions. Note that not having execute permissions is very important for writable segments. Otherwise, a malicious piece of code could write some instructions there, and potentially hijack a program by having it execute code that it never meant to.
  3. Read-only data: In addition to lucky_number, we’ve actually also defined some static, global data that are constant, and never mutated or executed. Those variables are the "My lucky number is %d!\n" and "My lucky number is now %d!\n" print format strings. That’s right, the first parameters we’ve passed to printf() are in fact global, read-only variables. Specifically, in this case, they’re two arrays of char's. We don’t need to know the specifics of how strings work in systems code for this post – we’ll talk more about strings later. Because those strings are neither written nor executed, they should only have read permissions.

Analyzing lucky_number ELF file

$ readelf --segments lucky_number

Elf file type is EXEC (Executable file)
Entry point 0x401040
There are 11 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0x00000000004002a8 0x00000000004002a8 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000438 0x000438 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x000215 0x000215 R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000160 0x000160 R 0x1000
LOAD 0x002e10 0x0000000000403e10 0x0000000000403e10 0x000224 0x000228 RW 0x1000

...

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
...
Elf file type is EXEC (Executable file)
Entry point 0x401040
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R 0x8
  • Type tells us the type of segment. In this case, the segment is of type “PHDR”, which means that it contains the file’s program / segment headers. The only other type we’ll concern ourselves with in this article is the “LOAD” type.
  • Offset tells the loader the location of the segment in the ELF file. This particular segment starts at an offset of 0x40 == 64 bytes into the file. Recall what we read above: “There are 11 program headers, starting at offset 64”. What this means is that the ELF file first tells the loader the location of all of the segments, and then the loader by convention interprets the first segment as the list of program headers.
  • VirtAddr specifies the address at which the segment should be loaded into the process’s virtual address space. This entry actually isn’t relevant for the PHDR segment, because it’s not a LOAD type. Confusing, I know. That’s just how things are sometimes in computer systems – it doesn’t always make sense, and sometimes it’s just been this way forever and at this point won’t change.
  • PhysAddr can literally be completely ignored. This entry is a vestige of the past, and isn’t relevant for any of the segments we’ll be looking at.
  • FileSiz specifies the size of the segment is in the file itself. Sometimes, the size of the segment in the file doesn’t match the size of the segment that’s actually loaded into memory. This can happen, for example, if the segment has a section with all zeroes (such as in a .bss section), so there’s no point in taking up extra space in the file for it.
  • MemSiz is, as you probably guessed, the size of the segment that’s actually loaded into memory in the process’s address space. Going back to the last point about FileSiz vs. MemSiz, MemSiz can sometimes be larger than FileSiz if the segment should be padded with zeros.
  • Flg encodes the permissions that the segment should be loaded with. readelf helpfully renders this as R, W, and X for us.
  • Align dictates the address that the segment must be aligned to when it’s loaded into memory (this is almost always just 0x8, which is the size of a 64 bit address).
Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align

LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x000215 0x000215 R E 0x1000

03 .init .plt .text .fini
$ objdump -d lucky_number
lucky_number: file format elf64-x86-64

Disassembly of section .init:
0000000000401000 <_init>:
401000: f3 0f 1e fa endbr64
401004: 48 83 ec 08 sub $0x8,%rsp
401008: 48 8b 05 e9 2f 00 00 mov 0x2fe9(%rip),%rax # 403ff8 <__gmon_start__>
40100f: 48 85 c0 test %rax,%rax
401012: 74 02 je 401016 <_init+0x16>
401014: ff d0 callq *%rax
401016: 48 83 c4 08 add $0x8,%rsp
40101a: c3 retq

Disassembly of section .text:
0000000000401040 <_start>: <<<< Byte Lab readers: Here it is! >>>>
401040: f3 0f 1e fa endbr64
401044: 31 ed xor %ebp,%ebp
401046: 49 89 d1 mov %rdx,%r9
401049: 5e pop %rsi
40104a: 48 89 e2 mov %rsp,%rdx
40104d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
401051: 50 push %rax

0000000000401130 main:
401130: 55 pushq %rbp
401131: 48 89 e5 movq %rsp, %rbp
401134: 48 83 ec 10 subq $16, %rsp
401138: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
40113f: 8b 34 25 30 40 40 00 movl 4210736, %esi
401146: 48 bf 04 20 40 00 00 00 00 00 movabsq $4202500, %rdi
401150: b0 00 movb $0, %al
401152: e8 d9 fe ff ff callq -295 <printf@plt>
401157: c7 04 25 30 40 40 00 02 00 00 00 movl $2, 4210736
401162: 8b 34 25 30 40 40 00 movl 4210736, %esi
401169: 48 bf 1c 20 40 00 00 00 00 00 movabsq $4202524, %rdi
401173: 89 45 f8 movl %eax, -8(%rbp)
401176: b0 00 movb $0, %al
401178: e8 b3 fe ff ff callq -333 <printf@plt>
40117d: 31 c9 xorl %ecx, %ecx
40117f: 89 45 f4 movl %eax, -12(%rbp)
401182: 89 c8 movl %ecx, %eax
401184: 48 83 c4 10 addq $16, %rsp
401188: 5d popq %rbp
401189: c3 retq
40118a: 66 0f 1f 44 00 00 nopw (%rax,%rax)
0000000000401000 <_init>:
Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align

LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000160 0x000160 R 0x1000
LOAD 0x002e10 0x0000000000403e10 0x0000000000403e10 0x000224 0x000228 RW 0x1000

04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
$ strings -t x ./lucky_number

2004 My lucky number is %d!
201c My lucky number is now %d!
...
$ readelf --symbols ./lucky_number | grep lucky_number
35: 0000000000404030 4 OBJECT LOCAL DEFAULT 23 lucky_number
$ objdump -F -D ./lucky_number | grep 404030
0000000000404030 <lucky_number> (File Offset: 0x3030):
404030: 07 (bad)

Summary

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Byte Lab

Byte Lab

Byte Lab is a technical blog that discusses systems engineering, and the tech industry.