Instruction Set Specification
The "eBPF Instruction Set Specification, v1.0" is outdated and incomplete. Although eBPF is not strictly versioned, it does seem that it is far beyond "v1.0".
This page aims to be a "diff" between that spec and current kernel implementation.
Standardization efforts
This blog entry states that BPF ISA version is now v3
since kernel version v5.1.
It seems people are working to update the kernel documentation, and the updated version covers most of the content on this page.
Architecture
- The stack / frame: In terms of eBPF, a stack pointer is just a frame pointer. Each eBPF function call has its own 512-byte stack.
R10
: This registers points to the base of the stack, that is, the very end of the stack range (R10[-512 : 0]
).
Instruction Encoding
Instructions are encoded in host endianness.
Wide instructions
A wide instruction is a 128-bit instruction:
| 64-bit insn1 | 64-bit insn2 |
While the spec states "the wide instruction encoding... appends a second 64-bit immediate value (imm64) after the basic instruction for a total of 128 bits", it is not.
Actually, insn1.imm32
is the lower 32 bits and insn2.imm32
is the upper 32 bits: ___bpf_prog_run#LD_IMM_DW
.
imm64 = insn1.imm32 | (insn2.imm32 << 32);
Instructions
You might want to see how each instruction is actually executed by the verifier. Here is an incomplete list of them.
Arithmetic and jumps
BPF_NEG
: No, this opcode is not a bitwise-not as the spec states:dst = ~src
. It is actually justDST = -DST;
and has nothing to do with thesrc
register.BPF_DIV
,BPF_MOD
: An implementation must check for zero divisions. In Linux, they just rewrite that instruction into several, with explicit hard-coded zero checks.In jump instructions, the
off
(offset) is(current_jump_insn_pc + 1) - target_pc
, which is quite straightforward though.
Function calls
The eBPF interpreter/JIT compilers in Linux rely on preprocessing in the verifier. In order to understand how actually function calls work, you will need to get down to the verifier: kernel/bpf/verifier.c#do_misc_fixups
In short, BPF_CALL
has multiple semantics, differentiated by the src_deg
field:
Calling a helper function.
- The
src_reg
field in the instruction must be zeroed. - During
do_misc_fixups
, theimm32
field (which originally contains an ID to the helper function) is replaced with a function pointer relative to__bpf_call_base
.
- The
Calling one of BPF Kernel Functions (kfuncs), which requires JIT compilation.
- The
src_reg
field must beBPF_PSEUDO_KFUNC_CALL
. - During
fixup_kfunc_call
, theimm32
field is replaced similar to helper function calls.
- The
Doing a
BPF_PSEUDO_CALL
:- The
src_reg
field must beBPF_PSEUDO_CALL
. - It is just a relative function call. A libbpf example:cA new stack frame is allocated for each call.
SEC("some_sec") int handle(void *ctx) { my_func(); // BPF_PSEUDO_CALL return 0; } SEC("some_sec") void my_func() { bpf_printk("Pseudo-called\n"); }
- During
bpf_patch_call_args
, the instruction is replaced with an internal one (JMP_CALL_ARGS
).
- The
TIP
If you are implementing your own eBPF runtime, you don't need to follow the internals of Linux. All the above explanations just aim to help with reading Linux source code and understanding eBPF semantics.
Relocation
See libbpf for some info about map relocation.
(WIP)