- Apple Silicon
- Prerequisites and basic concepts 📚
- ft_strlen
- ft_strcpy
- ft_strcmp
- ft_write
- ft_read
- ft_strdup
- Resources 📖
Apple Silicon is based on ARM64 architecture. The assembly code in this repository is written for x86_64 architecture.
If you wish to use and test your assembly code seamlessly on those chips, you need to add the following lines to your Makefile:
CC = gcc
ifeq ($(shell uname -m), arm64)
CC += -ld_classic --target=x86_64-apple-darwin
endifInstead of rewriting all the assembly code for ARM64 architecture, we basically compile our C code in x86_64 architecture as it can be executed through Rosetta 2.
Don't forget the
-f macho64flag after yournasmcommand when compiling on Macs.
Assembly works mainly with registers. They can be compared to little pre-defined boxes that can store data.
The general-purpose registers are:
raxrcxrbxrdxrsirdirsprbpr8,r9, ...r15rip
We have to be careful when putting data in those registers, because they can be used by the system at any time (read or written).
Among the above registers for example:
raxis used to store the return value of a function.rcxis used as a counter in loops.rspis used to store the stack pointer.rbpis used to store the base pointer.rdi,rsi,rdx,rcx,r8,r9are used respectively to pass arguments, vulgarly likefunction(rdi, rsi, rdx, rcx, r8, r9)
This may seem like an inconvenience, but it is actually very useful to manipulate the behavior of the program.
For instance, if we want to call
sys_write:
- We put the syscall number (4 for
sys_write) inrax.- We put the file descriptor in
rdi.- We put the address of the buffer in
rsi.- We put the number of bytes to write in
rdx.- We call the
syscallinstruction.
The conclusion is: the little boxes are very useful as intermediate storage for data, but we have to be careful not to overwrite them when we (or the system) need them.
Assembly is read from top to bottom.
The instructions can be grouped in labels, which are used to mark a specific point in the code. They are followed by a colon.
One thing that can be confusing is that although labels look like functions in other languages like C, they are not.
For example:
entry_point:
xor rax, rax
do_something:
mov rax, 0
do_something_else:
mov rax, 1
return_label:
retIn this example, entry_point is the entry point of the program/function.
do_something and do_something_else will be executed one after the other, even without a "jump" instruction.
The syntax used in this repository is the Intel syntax. It is the most common syntax used in assembly programming, and a requisite of the subject. It is characterized by the fact that the destination operand is on the left and the source operand is on the right.
Virtually all the lines in assembly are composed of an instruction followed by its operand(s).
A few examples:
-
mov rax, 0copies the value0into the registerrax. -
add rax, 1adds1to the value in the registerrax. -
cmp rax, 0compares the value in the registerraxwith0. -
jmp do_somethingjumps to the labeldo_something.
It is very important to remember that every instruction can alter the behavior of the program implicitly.
For example:
- The
cmpinstruction will set the flags register according to the result of the comparison. - The
loopinstruction will decrement thercxregister and jump to the label ifrcxis not zero.
Like in C, we can work with addresses.
The square brackets [] are used to dereference an address.
For example, if we want to move the value at the address 0x1234 into the register rax, we can do:
mov rax, [0x1234]If we want to compare the address 3 bytes after the address in rax with 0, we can do:
cmp [rax + 3], 0Here we should technically use an identifier for the address (
BYTE,WORD,DWORD,QWORD) to specify the size of the data we want to compare, but we ignored it for the sake of this explanation.
In order to use the functions we write in assembly in a C program, we need to export them.
To do so, we can use the global directive.
For example, if we want to export the function ft_strlen, we can do:
global ft_strlen
ft_strlen:
...The ft_strlen function is a function that returns the length of a string. It is a very simple function that iterates over the string until it finds the null-terminator (\0).
To implement it in assembly, we need to recapitulate the behavior of the function:
- Set a counter to 0.
- Look at the first character of the string.
- Increment the counter.
- Look at the next character.
- If it is not the null-terminator, increment the counter and go back to step 4.
- If it is the null-terminator, return the counter.
To replicate this behavior in assembly, we will need to learn a few instructions:
movto copy data.jmpto jump.cmpto compare data.jeto jump if equal.incto increment a register.retto return from the function.
mov rax, rdiThis instruction copies the value in rdi to rax.
entry_point:
jmp some_other_label
some_label:
...
some_other_label:
...The first line of entry_point will jump to the label some_other_label (and skip some_label) regardless of the condition.
Jumps work like a goto in C. They can be used to skip parts of the code, or to create loops (as they can jump to a label that is located earlier in the code).
cmp rax, rdiThis instruction compares the value in rax with the value in rdi.
If we want to check if rax is equal to rdi, we can do:
cmp rax, rdi
je equalWhich leads us to the next instruction.
cmp some_register, some_other_register
je equalThis instruction will jump to the label equal only if the two registers are equal.
inc raxThis instruction increments the value in rax by 1. Simple.
retThis instruction returns from the program/function.
Remember that in the System V AMD64 ABI, the return value of a function is stored in the
raxregister. Whatever is inraxwhen we callretinstruction will be the return value of the program/function.
Now that we know the instructions we need, we can implement the ft_strlen function.
The first thing we need to do is to set the counter to 0. We can do this by moving 0 to rcx (or any other register, but remember rcx is commonly used as a counter).
mov rcx, 0Then, we need to define our recurring loop.
We need to look at every character in the string passed in rdi (where is passed the first argument in this calling convention, as seen before).
So, rdi first points at the first character of the string, and rcx is our counter initialized to 0.
Like we would look into *(str + i) in C, we can use assembly's square brackets [] as follows:
cmp [rdi + rcx], 0What happens next?
- If the character is not the null-terminator, we need to increment the counter and go back to the beginning of the loop.
- If it is the null-terminator, we need to return the counter.
So, with the instructions we learned before, we can use:
cmpto compare the character with0jeto jump to the end of the function if it is the null-terminator.incto increment the counter.jmpto go back to the beginning of the loop.
Our loop would then look like:
loop:
cmp [rdi + rcx], 0
je end
inc rcx
jmp loopFinally, we need to return the counter. We can do this by moving the value in rcx to rax and returning.
Remember, that "whatever is in
raxwhen we callretinstruction will be the return value of the program/function."
end:
mov rax, rcx
retAnd that's it! We have implemented the ft_strlen function in assembly.
The ft_strcpy function is a function that copies a string into another string, and returns a pointer to the destination string.
The logic is very similar to the ft_strlen function:
- Set a counter to 0.
- Look at the first character of the source string.
- Copy it to the destination string.
- Increment the counter.
- Look at the next character and copy it.
- If it is not the null-terminator, increment the counter and go back to step 5.
- If it is the null-terminator, exit the loop and return a pointer to the destination string.
A pointer to our dst string is passed in rdi, and a pointer to our src string is passed in rsi.
We can start the same way we did with ft_strlen by setting the counter to 0.
ft_strcpy:
mov rcx, 0We can then start our loop.
We need to copy every character in rsi to rdi. However, in assembly, we can't copy data directly from one address to another (mov [rdi], [rsi] would not work).
We therefore need to copy the data from the source address to a register, and then copy it to the destination address.
We could use any register to store the character (like
r8as seen before), but it is more appropriate to usealfor this purpose, as it is a register that is meant to store a single byte.
loop:
mov al, [rsi + rcx]
mov [rdi + rcx], al
inc rcx
cmp al, 0
jne loopIn this loop:
- We copy the character in
rsitoal. - We copy
altordi. - We increment the counter.
- We check if the character is the null-terminator.
- If it is not, we go back to the beginning of the loop.
Finally, we need to return the pointer to the destination string.
Given that we received this pointer in rdi and that we did not move it, we can simply copy it to rax and return.
mov rax, rdi
retThe ft_strcmp function is a function that compares two strings, and returns the difference between the first two different characters. For example, "abc" and "abd" would return -1, as 'c' - 'd' = -1.
Its logic is not much more complex than the previous functions, but the particularities of assembly make it a bit more challenging (for me at least, maybe I have a shitty logic).
Let's recap the behavior of the function anyway:
- Set a counter to 0.
- Compare every character of the first string with every character of the second string.
- If they are different, substract the second character from the first character and return the result.
From this exercise on, I will only explain the new instructions and concepts we use.
sub rax, rdi performs a substraction such as rax ← rax - rdi.
movzx rax, BYTE[rdi + rcx] moves the byte at the address rdi + rcx to rax, and fills the remaining bits with 0.
movzx will adapt to the keyword used to specify the size of the data we want to move. For example,
movzx rax, WORD[rdi + rcx]would move a word (16 bits) torax.
This instruction is useful to us as rax is a 64-bit register, and we only want to compare the characters as bytes (8 bits).
jz label jumps to label if the zero flag is set, a bit like je and jne but for the zero flag.
For example, if we want to jump to end if one of register1 or register2 is 0, we can do:
cmp register1, 0
cmp register2, 0
jz endI will not show code like in the previous ones, just describe the logic with more depth.
My intermediates registers will be rax and r8 (not the most efficient code but easier to understand).
- We set
rcx,raxandr8to 0. - We start our loop.
- We copy
rdiandrsitoraxandr8. - We check that the characters are not the null-terminator.
- We compare the characters.
- If they are different, we substract them and return.
- If not, we increment the counter and go back to the beginning of the loop.
That's it! With all the precautions I mentioned and the new instructions, you should be able to implement the ft_strcmp function.
The ft_write function is a function that, provided a file descriptor, a buffer and a size, writes the buffer to the file descriptor. It returns the number of bytes written, or -1 if an error occurred.
To implement this function, we need to know how to make a syscall.
As we saw in the introduction, to make a syscall, we need to:
- Put the syscall number in
rax. - Put its arguments in the appropriate registers (here
rdi,rsiandrdx). - Trigger the syscall with the
syscallinstruction, that will read the values in the registers and behave accordingly.
In our case, the parameters we receive from C are already in the right registers, so we can save the following instructions:
mov rdi, rdi ; file descriptor
mov rsi, rsi ; buffer
mov rdx, rdx ; countWe can then put the syscall number in rax and call the syscall.
On Apple Silicon Macs, the syscall number for sys_write is 0x2000004. On Linux, it is 1.
mov rax, 0x2000004
; mov rdi, rdi
; mov rsi, rsi
; mov rdx, rdx
syscall
retWe could stop here, as the function would properly write to the file descriptor and return the number of bytes written (the syscall putting its return value in rax).
However, we need to return -1 if an error occurred, and set the errno variable accordingly, as asked in the subject.
To handle any error and jump to another label, we can use the jc instruction. This will jump to the label if the Carry Flag is set, which is generally the case when an error occurs.
jc errorerrno is a variable that is set when an error occurs. It is a global variable that is set by the system when a syscall fails.
However, it is not automatically translated to the errno variable in C. We need to set it ourselves.
To do so, we are provided with __errno_location (or ___error on Mac). This function returns a pointer to the errno variable.
We can call it with the instruction call __errno_location. As for other instructions, it will put the return value in rax.
However, rax already contains the return value of the write syscall. We need to save it before calling __errno_location.
error:
mov r8, rax
call __errno_locationWe now have the address of the errno variable in rax. We can put the previously saved error code in it by dereferencing the address.
mov [rax], r8Finally, we can return -1 and exit the function.
mov rax, -1
retAnd that's it! We have implemented the ft_write function.
The ft_read function is a function that, provided a file descriptor, a buffer and a size, reads from the file descriptor to the buffer. It returns the number of bytes read, or -1 if an error occurred.
It's literally implementing the ft_write function but with the sys_read syscall.
Same logic, same instructions, same everything.
So we just replace 0x2000004 with 0x2000003 and we're good to go.
Easy.
The ft_strdup function is a function that duplicates a string. It allocates memory for the new string, copies the string to it, and returns a pointer to the new string.
Doesn't sound as easy as ft_read.
Luckily, we can use the functions we implemented before, and the malloc function. But we also need to learn two new instructions.
We have another way to store data in assembly: the stack.
The stack is literally a pile of data, organized in a LIFO (Last In, First Out) way.
If I push 1, 2 and 3 on the stack, it will look like:
3
2
1
If I pop the stack here, I will get 3.
In assembly, we can manipulate the stack as follows:
push rax
push rbxThis instruction will push the value of rax, then the value of rbx on the stack.
value of rbx
value of raxWe then have the pop command, that will take the last value pushed on the stack and put it in the operand we specify.
pop any_registerAfter this instruction, any_register will contain the value of rbx.
In this implementation, we will use push and pop to save the values of our registers when calling other functions that manipulate them.
Let's recap the steps of the function:
- Get the length of the string (
ft_strlen). - Allocate memory for the new string (
malloc). - Copy the string to the new string (
ft_strcpy).
We also know that:
- When
ft_strdupis called,rdicontains*s(the string to duplicate). ft_strlenreads a string inrdiand returns the length inrax.mallocreads the size inrdiand returns a pointer to the allocated memory inrax.ft_strcpyreads the source string inrsiand the destination string inrdi.
So:
- We don't need to touch
rdibefore callingft_strlen. - We call
ft_strlen, that saves the length inrax. - We push the
rdi(that contains*s) to the stack, to use it later. - We increment the length by 1 (for the null-terminator).
- We move the length of
*stordito callmallocwith the right size. - We call
malloc, that returns a pointer to the allocated memory inrax. - We pop the
*sfrom the stack torsi(second argument offt_strcpy). - We move the pointer to the allocated memory to
rdi(first argument offt_strcpy). - We call
ft_strcpy, that copies the string to the allocated memory. - We can directly call
retas the pointer to the new string is already inrax.
And that's it! We have implemented the last mandatory function of the project.
- The syscall number for
sys_writeis1on Linux. - The syscall number for
sys_readis0on Linux. - The
___errorfunction is named__errno_locationon Linux. - The symbols don't need to be prefixed with an underscore on Linux.
- The Carry Flag is not set when an error occurs on Linux. We need to check if
raxis negative to detect an error. - The nasm flag
-f elf64should be used instead of-f macho64.