L-Systems in C, Part 1: Pointers
This is the first post in a (planned) series covering my time spent learning C by experimenting with procedurally generating plants using L-Systems. This post gets to grips with the basics of pointers in C and how they relate to strings.
If you're coming from a memory managed language, C might seem pretty bare-bones and low-level: you have direct access to memory, and are responsible for managing it. You're given a relatively austere type system and the ability to address its types via pointers and pointer arithmetic. That's all you need! All the data structures you're used to having in your favorite language's standard library are built upon C's sequential memory model, which you're forced to interact with directly. Learning C is less about learning language syntax than it is about learning how all these higher level tools are implemented, getting your hands metaphorically dirty and building them yourself. Often badly. There will be segfaults and undefined behavior. There will be frustration. There will be so many eureka moments.
Back to L-Systems. They start with string production rules, which means we need to start with strings. Actually, it means we have to start with pointers. A pointer in C is a language construct that allows you to reference an address in memory as a specific type. To understand what that means, take a look at the following code:1int32_t *my_int = malloc(sizeof(int32_t));
It's doing a few things:
- It declares a pointer to a
int32_t
(a 32bit signed integer) calledmy_int
- It calls
malloc
to allocate enough memory to storesizeof(int32_t)
bytes (the size of aint32_t
, or four bytes) which returns the address of the first byte allocated. - It assigns the address returned by malloc to
my_int
We now own 4 bytes 'somewhere' in memory. If we want to read or write to it, we do so via our pointer which knows that exact location, and the format in which to read and write data there. It's important to understand that there are three different types in play here:
- The pointer's type:
int32_t*
(which I read as 'A pointer to aint32_t
') - The type being addressed by the pointer:
int32_t
- The type used to represent the memory address stored in the pointer:
uintptr_t
orunsigned long
. This is an unsigned integer type that varies depending on the target architecture you're compiling to, but will always be able to hold any valid pointer. It'll be at least 64 bits on a 64-bit system, at least 32 on a 32-bit system and so on.
The distinction between these types is important, as reflected in the following: 1
2int32_t *my_int = 0x6000014298e0;
3*my_int = 1234;
4int32_t my_other_int = *my_int;
5uintptr_t my_addr = (uintptr_t)my_int;
6
7printf("%p \n%i \n%i \n%p", m_int, *my_int, my_other_int, my_addr);
8// > 0x6000014298e0
9// > 1234
10// > 1234
11// > 0x6000014298e0
Step by step, this code
- Points
my_int
to memory address0x6000014298e0
. - Writes the value 1234 as a signed int32 to the four bytes starting at
0x6000014298e0
. - Reads the four bytes starting at
0x6000014298e0
as a signed int32 into the variablemy_other_int
. - Cast
my_int*
to a rawuintptr_t
and write its value (the memory address0x6000014298e0
) to the variable my_addr.
So we can treat a pointer as a number, and use it to get to some other number in memory. What we can also do is something called pointer arithmetic. Since a pointer is just a uintptr_t
number itself, we can add and subtract from it to get addresses of bytes relative to it.
To see what I mean, take a look at this code:1char *ptr_to_char = malloc(sizeof(char) * 4);
2*ptr_to_char = 'a';
3*(ptr_to_char + 1) = 'b';
4*(ptr_to_char + 2) = 'c';
5ptr_to_char[3] = '\0';
6printf("%s", ptr_to_char);
7// > abc
We're using the char
type here which is a single byte (which could be signed or unsigned) that's often used to represent an ASCII character. Step by step:
- Allocate
sizeof(char) * 4
bytes (sizeof(char)
is one byte, so that's four bytes total) of memory and pointptr_to_char
at it. - Set the first byte to a
'a'
- Set the second and third bytes to
'b'
and'c'
by calculating their addresses relative toptr_to_char
- Set the fourth byte to a null terminator (
'\0'
) using array syntax.ptr_to_char[3]
is effectively syntactical sugar for*(ptr_to_char + 3)
. - Print
ptr_to_char
toSTDOUT
formatted as a string
This demonstrates how a pointer can act like 'index zero' of an array of bytes in memory, and we can address the rest of that array through simple pointer arithmetic. Understanding that arrays really are just contiguous, allocated sequences in memory is super important. This realization leads us back to the original string-related goal of this post: a string is just an array of char
.
Specifically, C has the concept of a 'null terminated string': a char *
pointing to a sequence of bytes in memory that ends with a '\0'
- a null terminator. This concept allows us to do things like printf("%s", ptr_to_char)
as the printf function simply reads bytes starting at ptr_to_char
until it encounters a null terminator. It's also pretty error-prone, as forgetting or incorrectly placing null terminators quickly leads to disaster and undefined behavior.
Now that we've got the basics of memory, pointers and strings covered, we'll be able to start actually using them to generate some L-System strings... in the next post. Stay tuned!