Architecture 1001: x86-64 Assembly

Introduction

About this Class

  • x86 - it's called x86 because of the progression of Intel chips from 8086, 80186, 80286, etc.

  • x86-64 - used by server systems and supercomputers

Refresher: Binary to Hex to Decimal

Decimal (base 10)
Binary (base 2)
Decimal (aka Hex, base 16)

00

0000b

0x00

01

0001b

0x01

02

0010b

0x02

03

0011b

0x03

04

0100b

0x04

05

0101b

0x05

06

0110b

0x06

07

0111b

0x07

08

1000b

0x08

09

1001b

0x09

10

1010b

0x0A

11

1011b

0x0B

12

1100b

0x0C

13

1101b

0x0D

14

1110b

0x0E

15

1111b

0x0F

Example:

Given: 0x1337

To Decimal:

= 1 x 16^3 + 3 x 16^2 + 3 x 16^1 + 7 x 16^0

= 1 x 4096 + 3 x 256 + 3 x 16 + 7 x 1

= 4096 + 768 + 48 + 7

= 4919 or 4919d

To Binary:

4096
2048
1024
512
256
128
64
32
16
8
4
2
1

1

0

0

1

1

0

0

1

1

0

1

1

1

= 1001100110111 or 1001100110111b

Refresher: two's complement negative numbers

  • Signed numbers - either positive or negative values

  • Unsigned numbers - only positive values

  • Signed char - can hold positive 0x01 to 0x7F (127) and values 0x80 to 0xFF represents -128 to -1

  • Unsigned char - can hold 0-255

Negative values are represented as "two's complement" of their positive value, it is computed by flipping all bits and adding 1

Example:

Given: 0xFF is -1

= 15 x 16^1 + 15 x 16^0

= 255 in decimal

= 11111111b in binary

= 00000000b (flip)

= 00000000b + 1

= 00000001

= 1 or -255

Example:

Given: -128

= 1000000 in binary

= 01111111 (flip)

= 01111111 + 1

= 10000000

= 128 in decimal

ranges

Questions:

1. What is the hexadecimal two's complement representation of the lowest value possible in an 8 byte signed value?

0x8000000000000000 or 8000000000000000 or 8000000000000000h

2. What does that value correspond to in decimal?

-9223372000000000000 or -9223372036854775808

Refresher: C data type sizes

size of basic C data types
  • char - single byte

  • short - two bytes

  • word - intel's native 16-bit data size when x86 is a 16-bit architecture

  • double word (DWORD) - expanded to 32-bit

  • quad word (QWORD) - for 64-bit

Background: Endianess

  • Little Endian - (little end first) the least significant byte (LSB) of a word or larger is stored in the lowest address e.g. 0x12345678 -> 0x78, 0x56, 0x34, 0x12

Intel is Little Endian

  • Big Endian - (big end first) the most significant byte (MSB) of a word or larger is stored in the lowest address e.g. 0x12345678 -> 0x12, 0x34, 0x56, 0x78

Network Traffic is Big Endian, also many RISC systems. ARM is started out as Little Endian and now is Bi-Endian

  • Endianess applies only in memory NOT IN REGISTERS

  • Endianess applies to bytes NOT IN BITS

Endianess
Endianess

Computer Registers

Memory Hierarchy

Computer Memory Hierarchy

First 3 on top is the represents memory that has a short-term memory and Last 3 on the bottom represents memory that has a long-term memory.

x86-64 General Purpose Registers

  • Registers - small memory storage areas built into the processor (volatile memory)

  • Intel - has 16 "General Purpose" Registers + instruction pointer which points at the next instruction to be executed

  • x86-32, registers are 32 bits wide

  • x86-64, registers are 64 bits wide

  • Intel Register Evolution

8-bit 8008
16-bit 8086
32-bit 80386
64-bit AMD Opteron/Intel Pentium 4 (old)
64-bit AMD Opteron/Intel Pentium 4 (New)
Cheat Sheet from this link
  • RAX - Stores the function return values

  • RBX - Base Pointer to the data section

  • RCX - Counter for string and loop operations

  • RDX - I/O Pointer

  • RSI - Source Index Pointer for string operations

  • RDI - Destination Index Pointer for string operations

  • RSP - Stack (top) Pointer, last value was put on the stack

  • RBP - Stack Frame Base Pointer, used to point the base current stack frame

  • RIP - Instruction Pointer, pointer to the next instruction to execute

First Instruction

  • No-Operation (NOP) - no registers, no values, nothing. Just there to pad/align bytes or to delay time (known as 0x90).

The Stack

Overview

  • Stack - is a Last-In-First-Out (LIFO) data structure where data is pushed on the top of stack and popped off the top, also conceptual area of RAM

Different OSes starts in different addresses by their own convention. Sometimes they are using Address Space Layout Randomization (ASLR)

High and Low Addresses

By convention, stack grows toward lower addresses. Adding to stack means the top of stack is now at a lower address.

  • RSP - points at the top of the stack - the lowest address is being used

  • You can find on the stack:

    • Return Addresses on the function

    • Local variables

    • Arguments passed in a function

    • Save space for registers

    • Dynamic allocated memory via alloca()

Simple Stack Diagram

Push and Pop Instructions

  • Push - instruction that automatically decrements the stack pointer, RSP, by 8

  • r/mX - is a term that refers to r/m8, r/m16, r/m32 or r/m64 in the Intel

  • [ ] - brackets means to treat the value within a memory address, fetch value at that address

    • Register -> rbx

    • Memory, base-only -> [rbx]

    • Memory, base+index*scale -> [rbx+rcx*X]

    • Memory, base+index*scale+displacement -> [rbx+rcx*X+Y]

r/mX Addressing
push
  • Pop - pop a value from the stack, in RSP, by 8

pop
  • Push/Pop in 64-bit, they decrement and increment RSP by 8

  • Push/Pop in 32-bit, they decrement and increment RSP by 4

  • Push/Pop in 16-bit, they decrement and increment RSP by 2

rbp, rsp

Examples (RBP is red, RSP is blue):

basic example
moving rbp and rsp

If the High and Low Addresses is flipped, remember the sign for Low Address is (-) and High Address is (+)

standard visualization of addresses

Last updated