L-Systems in C, Part 1: Pointers

This is the first post in a (planned) series covering my time spent learning C by experimenting with procedurally generating plants using L-Systems. This post gets to grips with the basics of pointers in C and how they relate to strings.

If you're coming from a memory managed language, C might seem pretty bare-bones and low-level: you have direct access to memory, and are responsible for managing it. You're given a relatively austere type system and the ability to address its types via pointers and pointer arithmetic. That's all you need! All the data structures you're used to having in your favorite language's standard library are built upon C's sequential memory model, which you're forced to interact with directly. Learning C is less about learning language syntax than it is about learning how all these higher level tools are implemented, getting your hands metaphorically dirty and building them yourself. Often badly. There will be segfaults and undefined behavior. There will be frustration. There will be so many eureka moments.

Back to L-Systems. They start with string production rules, which means we need to start with strings. Actually, it means we have to start with pointers. A pointer in C is a language construct that allows you to reference an address in memory as a specific type. To understand what that means, take a look at the following code:

1int32_t *my_int = malloc(sizeof(int32_t));

It's doing a few things:

It declares a pointer to a int32_t (a 32bit signed integer) called my_int
It calls malloc to allocate enough memory to store sizeof(int32_t) bytes (the size of a int32_t, or four bytes) which returns the address of the first byte allocated.
It assigns the address returned by malloc to my_int

We now own 4 bytes 'somewhere' in memory. If we want to read or write to it, we do so via our pointer which knows that exact location, and the format in which to read and write data there. It's important to understand that there are three different types in play here:

The pointer's type: int32_t* (which I read as 'A pointer to a int32_t')
The type being addressed by the pointer: int32_t
The type used to represent the memory address stored in the pointer: uintptr_t or unsigned long. This is an unsigned integer type that varies depending on the target architecture you're compiling to, but will always be able to hold any valid pointer. It'll be at least 64 bits on a 64-bit system, at least 32 on a 32-bit system and so on.

The distinction between these types is important, as reflected in the following:

 1
 2int32_t *my_int = 0x6000014298e0;
 3*my_int = 1234;
 4int32_t my_other_int = *my_int;
 5uintptr_t my_addr = (uintptr_t)my_int;
 6
 7printf("%p \n%i \n%i \n%p", m_int, *my_int, my_other_int, my_addr);
 8// > 0x6000014298e0 
 9// > 1234
10// > 1234
11// > 0x6000014298e0

Step by step, this code

Points my_int to memory address 0x6000014298e0.
Writes the value 1234 as a signed int32 to the four bytes starting at 0x6000014298e0.
Reads the four bytes starting at 0x6000014298e0 as a signed int32 into the variable my_other_int.
Cast my_int* to a raw uintptr_t and write its value (the memory address 0x6000014298e0) to the variable my_addr.

So we can treat a pointer as a number, and use it to get to some other number in memory. What we can also do is something called pointer arithmetic. Since a pointer is just a uintptr_t number itself, we can add and subtract from it to get addresses of bytes relative to it.

To see what I mean, take a look at this code:

1char *ptr_to_char = malloc(sizeof(char) * 4);
2*ptr_to_char = 'a';
3*(ptr_to_char + 1) = 'b';
4*(ptr_to_char + 2) = 'c';
5ptr_to_char[3] = '\0';
6printf("%s", ptr_to_char);
7// > abc

We're using the char type here which is a single byte (which could be signed or unsigned) that's often used to represent an ASCII character. Step by step:

Allocate sizeof(char) * 4 bytes (sizeof(char) is one byte, so that's four bytes total) of memory and point ptr_to_char at it.
Set the first byte to a 'a'
Set the second and third bytes to 'b' and 'c' by calculating their addresses relative to ptr_to_char
Set the fourth byte to a null terminator ('\0') using array syntax. ptr_to_char[3] is effectively syntactical sugar for *(ptr_to_char + 3).
Print ptr_to_char to STDOUT formatted as a string

This demonstrates how a pointer can act like 'index zero' of an array of bytes in memory, and we can address the rest of that array through simple pointer arithmetic. Understanding that arrays really are just contiguous, allocated sequences in memory is super important. This realization leads us back to the original string-related goal of this post: a string is just an array of char.

Specifically, C has the concept of a 'null terminated string': a char * pointing to a sequence of bytes in memory that ends with a '\0' - a null terminator. This concept allows us to do things like printf("%s", ptr_to_char) as the printf function simply reads bytes starting at ptr_to_char until it encounters a null terminator. It's also pretty error-prone, as forgetting or incorrectly placing null terminators quickly leads to disaster and undefined behavior.

Now that we've got the basics of memory, pointers and strings covered, we'll be able to start actually using them to generate some L-System strings... in the next post. Stay tuned!