Zulip Chat Archive

Stream: Program verification

Topic: RISC-V ISA in Lean


SnowFox (Oct 02 2020 at 06:38):

(Migrating from 'Is there code for X?" "bidirectional maps?" = https://leanprover.zulipchat.com/#narrow/stream/217875-Is-there.20code.20for.20X.3F/topic/bidirectional.20maps.3F/near/212013731) @Mario Carneiro

SnowFox (Oct 02 2020 at 07:18):

@Mario Carneiro Can you explain why you're using inductive Props in your x86.lean?

Mario Carneiro (Oct 02 2020 at 07:18):

because it's a spec

Mario Carneiro (Oct 02 2020 at 07:18):

it's not intended to be directly executable

Mario Carneiro (Oct 02 2020 at 07:18):

but the advantage is that you don't need separate encode/decode

SnowFox (Oct 02 2020 at 07:26):

Hmm. Not something I can use then.

Mario Carneiro (Oct 02 2020 at 07:26):

No, I don't expect that aspect will translate

Mario Carneiro (Oct 02 2020 at 07:27):

I wrote a proof producing disassembler for x86/ARM before which is closer to what you need, but it is at least for now closed source

SnowFox (Oct 02 2020 at 07:29):

Okay.

SnowFox (Oct 02 2020 at 07:31):

I've heard good things about metaprogramming in Lean but ... the documentation doesn't demonstrate how. Do you have any reference I can use for implementing make_defs? https://leanprover.github.io/reference/metaprogramming.html

Mario Carneiro (Oct 02 2020 at 07:34):

for the most part it's just functional programming

Mario Carneiro (Oct 02 2020 at 07:34):

you have to interact with the lean API to make definitions

Mario Carneiro (Oct 02 2020 at 07:34):

and the API is admittedly not so well documented

Mario Carneiro (Oct 02 2020 at 07:35):

Rob Lewis made a metaprogramming tutorial that is pretty good

Mario Carneiro (Oct 02 2020 at 07:35):

https://www.youtube.com/playlist?list=PLlF-CfQhukNnq2kDCw2P_vI5AfXN7egP2

SnowFox (Oct 02 2020 at 07:36):

Thanks. I'll watch the series.

Jannis Limperg (Oct 02 2020 at 11:31):

There's also the metaprogramming section of the Hitchhiker's guide and, if I may be so bold, my blog post on some gotchas of the API (but that'll only make sense once you have a bit of an overview). Beyond that, you'll have to read init/meta/expr.lean and init/meta/tactic.lean, which are reasonably documented.

SnowFox (Oct 02 2020 at 12:57):

Okay. I've watched the short series linked above and bookmarked the blog post. I've adjusted the python scripts to generate at least a decent mostly-correct subset of RISC-V instruction to format with embedded constants list. add_inductive and add_defn_equations appear to be my next steps, which look straight forward. Afterwards, I may open an issue in the riscv-opcode repo proposing an alternative format and a new code generator to make certain things cleaner and better reflect the specification.

Thanks for all the links everyone. I'll resume this tomorrow morning. :)

SnowFox (Oct 02 2020 at 13:52):

Okay, I couldn't resist but... ouch. add_inductive, as the documentation for its sibling add_ginductive suggests, doesn't generate the auxiliary definitions necessary for matching against it... I couldn't figure out how to use add_ginductive however. Could someone correct this code?

meta def make_defs (l : list (name × type)) : tactic unit :=
tactic.add_inductive `rvim [] 0 `(Type) (l.map (λ n, t⟩, n, expr.const `rvim []⟩))

SnowFox (Oct 12 2020 at 06:11):

@Mario Carneiro Any insight on this issue?

Mario Carneiro (Oct 12 2020 at 06:20):

meta def make_defs' (l : list (name × type)) : tactic unit :=
do env  tactic.get_env,
  env'  env.add_ginductive options.mk
    [] [] [((`rvim', `(Type)),
      l.map (λ n, t⟩,
        { constr := `rvim' ++ n,
          type := expr.const `rvim' [] }))]
    ff,
  tactic.set_env env'

run_cmd make_defs' [(`a, ()), (`b, ())]
#print rvim'

SnowFox (Oct 12 2020 at 06:25):

There we go, thanks again!

Mario Carneiro (Oct 12 2020 at 06:27):

Note that I changed your code a bit to prefix rvim on the names of the constructors. Lean will let you name the constructors something else, but the front end inductive command doesn't, so it seems ill-advised

SnowFox (Oct 12 2020 at 06:47):

@Mario Carneiro Noticed. Though now I'm stuck trying to control the types of the constructors. As these need the registers and immediate values. I was expecting something like type := `(nat -> `rvim) to work. But that says "invalid return type". As it does with just `(`rvim).

Mario Carneiro (Oct 12 2020 at 06:48):

rvim needs to be a local constant

SnowFox (Oct 12 2020 at 06:49):

Like let rvim := `rvim in `(rvim)?

SnowFox (Oct 12 2020 at 06:50):

Or let rvim := @expr.const ff `rvim [] in `(rvim) ? Neither worked.

Mario Carneiro (Oct 12 2020 at 06:51):

The tricky part is that rvim doesn't exist yet, so `(rvim) isn't going to work

Mario Carneiro (Oct 12 2020 at 06:51):

You can use let rvim : expr := expr.const `rvim' [], and then splice it in

Mario Carneiro (Oct 12 2020 at 06:51):

i.e. `((%%rvim : Type) → (%%rvim : Type))

Mario Carneiro (Oct 12 2020 at 06:52):

actually, it seems it should be a constant not a local constant

Mario Carneiro (Oct 12 2020 at 06:53):

which is extra tricky because that definitely doesn't typecheck

SnowFox (Oct 12 2020 at 06:53):

Well, this worked. let rvim : expr := expr.const `rvim [] in `(nat -> (%%rvim : Type))

Mario Carneiro (Oct 12 2020 at 06:53):

I think the usual approach if you need more advanced inference would be to create a new local constant with the appropriate type, and then substitute the constant for it at the end

SnowFox (Oct 12 2020 at 07:03):

Okay, I have rvim expanded with the correct number of natural arguments for each instruction. :)

SnowFox (Oct 12 2020 at 07:03):

But it is getting slow... which is worrying considering how early this is still.

SnowFox (Oct 12 2020 at 07:05):

What is the vscodium trick to expand all the variants on a def? def q : rvim -> nat := _

SnowFox (Oct 12 2020 at 07:06):

I thought it was under ctrl+dot with the _ in focus. But that doesn't include the option for some reason.

SnowFox (Oct 12 2020 at 07:07):

I've seen a different method on a video

SnowFox (Oct 12 2020 at 07:11):

Video used {! x -> y} and that worked. I guess the language server just has a hole..

Mario Carneiro (Oct 12 2020 at 07:17):

processing new inductives is a little sluggish, especially if you are using nested or mutual inductives

Mario Carneiro (Oct 12 2020 at 07:19):

You can double check if you have accidentally defined a nested inductive because if you #print rvim it will say def rvim := .. instead of inductive rvim ...

SnowFox (Oct 12 2020 at 07:20):

It says inductive.

Mario Carneiro (Oct 12 2020 at 07:20):

how many variants are we talking here?

SnowFox (Oct 12 2020 at 07:21):

I have 63 instructions.

Mario Carneiro (Oct 12 2020 at 07:23):

hm, I wonder whether no_confusion is as expensive as I suspected

Mario Carneiro (Oct 12 2020 at 07:23):

it's basically quadratic in size

SnowFox (Oct 12 2020 at 07:23):

inductive fmt : Type
| R     (opcode7 funct3 funct7 : nat) -- (rd rs1 rs2     : nat)
| I     (opcode7 funct3        : nat) -- (rd rs1     imm : nat)
| ISH   (opcode7 funct3 funct7 : nat) -- (rd rs1     imm : nat)
| ISHW  (opcode7 funct3 funct7 : nat) -- (rd rs1     imm : nat)
| S     (opcode7 funct3        : nat) -- (   rs1 rs2 imm : nat)
| B     (opcode7 funct3        : nat) -- (   rs1 rs2 imm : nat)
| U     (opcode7               : nat) -- (rd         imm : nat)
| J     (opcode7               : nat) -- (rd         imm : nat)
| FENCE (opcode7               : nat)

meta def make_defs (l : list (name × fmt)) : tactic unit :=
  tactic.updateex_env $ λ env, env.add_ginductive options.mk
    [] [] [(
      (`rvim, `(Type)),
      l.map (λ n, t⟩,
        { constr := `rvim ++ n,
          type := let rvim : expr := expr.const `rvim [] in
                  match t with
                  | (fmt.R _ _ _) := `(nat -> nat -> nat -> (%%rvim : Type))
                  | (fmt.I _ _) := `(nat -> nat -> nat -> (%%rvim : Type))
                  | (fmt.ISH _ _ _) := `(nat -> nat -> nat -> (%%rvim : Type))
                  | (fmt.ISHW _ _ _) := `(nat -> nat -> nat -> (%%rvim : Type))
                  | (fmt.S _ _) := `(nat -> nat -> nat -> (%%rvim : Type))
                  | (fmt.B _ _) := `(nat -> nat -> nat -> (%%rvim : Type))
                  | (fmt.U _) := `(nat -> nat -> (%%rvim : Type))
                  | (fmt.J _) := `(nat -> nat -> (%%rvim : Type))
                  | (fmt.FENCE _) := `((%%rvim : Type))
                  end
        }
      )
    )]
  ff

open fmt

-- rv{32,64}[im] sans pseudo instructions
run_cmd make_defs [
--#eval list.length [
  -- rv32i
  (`BEQ,        B    0b1100011 0b000),
  (`BNE,        B    0b1100011 0b001),
  (`BLT,        B    0b1100011 0b100),
  (`BGE,        B    0b1100011 0b101),
  (`BLTU,       B    0b1100011 0b110),
  (`BGEU,       B    0b1100011 0b111),
  (`JALR,       I    0b1100111 0b000),
  (`JAL,        J    0b1101111),
  (`LUI,        U    0b0110111),
  (`AUIPC,      U    0b0010111),
  (`ADDI,       I    0b0010011 0b000),
  (`SLLI,       ISH  0b0010011 0b001 0b0000000),
  (`SLTI,       I    0b0010011 0b010),
  (`SLTIU,      I    0b0010011 0b011),
  (`XORI,       I    0b0010011 0b100),
  (`SRLI,       ISH  0b0010011 0b101 0b0000000),
  (`SRAI,       ISH  0b0010011 0b101 0b0100000),
  (`ORI,        I    0b0010011 0b110),
  (`ANDI,       I    0b0010011 0b111),
  (`ADD,        R    0b0110011 0b000 0b0000000),
  (`SUB,        R    0b0110011 0b000 0b0100000),
  (`SLL,        R    0b0110011 0b001 0b0000000),
  (`SLT,        R    0b0110011 0b010 0b0000000),
  (`SLTU,       R    0b0110011 0b011 0b0000000),
  (`XOR,        R    0b0110011 0b100 0b0000000),
  (`SRL,        R    0b0110011 0b101 0b0000000),
  (`SRA,        R    0b0110011 0b101 0b0100000),
  (`OR,         R    0b0110011 0b110 0b0000000),
  (`AND,        R    0b0110011 0b111 0b0000000),
  (`LB,         I    0b0000011 0b000),
  (`LH,         I    0b0000011 0b001),
  (`LW,         I    0b0000011 0b010),
  (`LBU,        I    0b0000011 0b100),
  (`LHU,        I    0b0000011 0b101),
  (`SB,         S    0b0100011 0b000),
  (`SH,         S    0b0100011 0b001),
  (`SW,         S    0b0100011 0b010),
  (`FENCE,      I    0b0001111 0b000),
  -- FENCEI belongs in Zifencei... why is it here?
  -- (`FENCE_I,    I    0b0001111 0b001),
  -- Shouldn't we have ECALL and EBREAK?
  -- rv32m
  (`MUL,        R    0b0110011 0b000 0b0000001),
  (`MULH,       R    0b0110011 0b001 0b0000001),
  (`MULHSU,     R    0b0110011 0b010 0b0000001),
  (`MULHU,      R    0b0110011 0b011 0b0000001),
  (`DIV,        R    0b0110011 0b100 0b0000001),
  (`DIVU,       R    0b0110011 0b101 0b0000001),
  (`REM,        R    0b0110011 0b110 0b0000001),
  (`REMU,       R    0b0110011 0b111 0b0000001),
  -- rv64i
  (`ADDIW,      I    0b0011011 0b000),
  (`SLLIW,      ISHW 0b0011011 0b001 0b0000000),
  (`SRLIW,      ISHW 0b0011011 0b101 0b0000000),
  (`SRAIW,      ISHW 0b0011011 0b101 0b0100000),
  (`ADDW,       R    0b0111011 0b000 0b0000000),
  (`SUBW,       R    0b0111011 0b000 0b0100000),
  (`SLLW,       R    0b0111011 0b001 0b0000000),
  (`SRLW,       R    0b0111011 0b101 0b0000000),
  (`SRAW,       R    0b0111011 0b101 0b0100000),
  (`LD,         I    0b0000011 0b011),
  (`LWU,        I    0b0000011 0b110),
  (`SD,         S    0b0100011 0b011),
  -- rv64m
  (`MULW,       R    0b0111011 0b000 0b0000001),
  (`DIVW,       R    0b0111011 0b100 0b0000001),
  (`DIVUW,      R    0b0111011 0b101 0b0000001),
  (`REMW,       R    0b0111011 0b110 0b0000001),
  (`REMUW,      R    0b0111011 0b111 0b0000001)
]

def is_16_bit_instruction (n : ) : Prop :=
n.land 0b11  0b11

def is_32_bit_instruction (n : ) : Prop :=
n.land 0b11 = 0b11  n.land 0b11100  0b11100

def is_valid_instruction (n : ) : Prop :=
is_16_bit_instruction n 
is_32_bit_instruction n

Mario Carneiro (Oct 12 2020 at 07:28):

one thing you can do to make the ISA easier to analyze is to bucket the instructions, for example putting all binops in one variant

Mario Carneiro (Oct 12 2020 at 07:29):

It wouldn't be hard to bucket these by type now, although that may not be exactly the layout you want

SnowFox (Oct 12 2020 at 07:34):

RISC-V uses tri-ops generally; though compressed ops do include binary ops where the first source register is repeated as the destination. ADD rd rs1 rs2 is rd = rs1 + rs2. I like the current order, ordered by XLEN and ISA extensions; because these are the units they are introduced. If I were to add more order, it'd probably be either (opcode, funct3, funct7)-order or format-shape order.

SnowFox (Oct 12 2020 at 07:34):

reg[rd] = reg[rs1] + reg[rs2] **

SnowFox (Oct 12 2020 at 07:35):

Oh, you meant reduce the inductive variants and use defs to map the instructions to the fewer variants?

Mario Carneiro (Oct 12 2020 at 07:36):

If you have an inductive type with 63 constructors, proving anything by cases on the type will be an exercise in patience (particularly if you have two variables, e.g. to prove determinism)

SnowFox (Oct 12 2020 at 07:37):

I'm open to any way to reduce the complexity.

Mario Carneiro (Oct 12 2020 at 07:39):

e.g.

inductive binop
| ADD | SUB | SLL -- | ...

inductive loadop
| LUI | LD -- | ...

inductive rvim
| binop (b : binop) (rd rs rt : ) : rvim
| load (b : loadop) (rd imm : ) : rvim

Mario Carneiro (Oct 12 2020 at 07:41):

Bucketing based on format shape is the obvious thing to do, but you should also separate out variants that treat their arguments significantly differently in the same format-shape, not sure if this is a major concern in riscv

Mario Carneiro (Oct 12 2020 at 07:43):

Note that proving no_confusion with a bucketed inductive like this is literally lower asymptotic complexity, since if you have a 7 * 7 bucketed inductive then proving that different variants are different is 8 * 7 ^ 2 large (8 inductives of 7 variants each) instead of 49 ^ 2 with a 49 variant inductive

SnowFox (Oct 12 2020 at 07:49):

Note all of these instructions are either a pick-3 of rd, rs1, rs2, imm; or both rd, imm; or zero (in the single case of FENCE), so they could all be reduced to 3 cases (so far; though there are more variations with more ISA extensions active). The format constants (opcode7, funct3, funct7) could be flattened into a single natural of 17 bits. The opcode7 internally embeds the tag for the format variant.

SnowFox (Oct 12 2020 at 07:50):

Rather, it doesn't explicitly embed the tag but is sufficient to reconstruct it.

SnowFox (Oct 12 2020 at 07:51):

The pick-3 can be seen at the top, in the comments of the inductive fmt section.

SnowFox (Oct 12 2020 at 07:53):

One deviating hack I've considered is to allow the rd/rs1/rs2 to exceed their 5-bit limit to represent pseudo-registers a la SSA. But this adds a mode where assembly is illegal; which is fine. Assembly should happen only after register allocation anyway and these instructions are useful before register allocation.

SnowFox (Oct 12 2020 at 07:53):

and imm, for offset symbols.

Mario Carneiro (Oct 12 2020 at 07:56):

The format constants (opcode7, funct3, funct7) could be flattened into a single natural of 17 bits.

I would suggest translating these to variants as early as possible though, because lean's handling of 17 bit numbers will not be satisfactory to you

SnowFox (Oct 12 2020 at 07:56):

@Mario Carneiro BTW: "LUI" is load upper-immediate; isn't a load involving memory so I wouldn't call it a "load operation". It only writes the immediate to the upper bits of a register.

Mario Carneiro (Oct 12 2020 at 07:56):

yeah you can tell I'm mocking things quickly

Mario Carneiro (Oct 12 2020 at 07:57):

it's not actually about the type of operation that it does but rather it's "mode of operation", i.e. the way it handles its arguments

Mario Carneiro (Oct 12 2020 at 07:57):

just imagine you are doing a proof by cases on these variants, what is the most useful categorization?

Mario Carneiro (Oct 12 2020 at 07:58):

if some of these only interact with certain parts of the machine state (e.g. ALU ops vs memory ops) that's another possible categorization

SnowFox (Oct 12 2020 at 07:59):

RISC-V is indeed a load-store architecture; which means that the ALU operations do not touch memory at all; and memory operations do no arithmetic (besides immediate offsets added to the register for the address)

Mario Carneiro (Oct 12 2020 at 08:00):

One deviating hack I've considered is to allow the rd/rs1/rs2 to exceed their 5-bit limit to represent pseudo-registers a la SSA. But this adds a mode where assembly is illegal; which is fine. Assembly should happen only after register allocation anyway and these instructions are useful before register allocation.

This should be fine; you can have an assembly function that turns an instruction into a bit pattern, and it can fail on some arguments

Mario Carneiro (Oct 12 2020 at 08:02):

although it sounds like you might need an extended semantics with a machine with infinite registers to make sense of the extra instructions

Mario Carneiro (Oct 12 2020 at 08:02):

unless they are just garbage instructions with no semantics

Mario Carneiro (Oct 12 2020 at 08:04):

I know some compilers use a v-code style approach, where you have the same instructions as the machine but with infinite registers; I prefer to just have an IR for that, where I get to pick a nicer set of high level instructions

SnowFox (Oct 12 2020 at 08:08):

Regarding proofs; all will target the non-{pseudo,compressed} instructions. Pseudoinstructions won't exist in my compiler. Compressed instructions are only abbreviations of the larger instructions. Compression is applied when certain registers are used and when the destination is the first source register and in a few other cases. Currently all instructions are 32-bits. Compressed instructions are 16. Higher sizes exist in the specification but have no members and their format isn't frozen.

SnowFox (Oct 12 2020 at 08:09):

v-code?

SnowFox (Oct 12 2020 at 08:10):

My IR will be RVSDG-based enriched with equivalence graphs a la eqsat.

SnowFox (Oct 12 2020 at 08:11):

RVSDG is very lambda calculus ish. My source language will be RDT based.

Mario Carneiro (Oct 12 2020 at 08:11):

well you also need a low level IR for doing register allocation

Mario Carneiro (Oct 12 2020 at 08:12):

unless you are going to go straight from there to machine code

SnowFox (Oct 12 2020 at 08:12):

RVSDG = https://www.sintef.no/contentassets/11da6d67207348db98a30ddbdf3b0bba/reissmann_poster.pdf
Equality Saturation = https://www.cs.cornell.edu/~ross/publications/eqsat/
Resourceful Dependent Types = http://www.cse.chalmers.se/~abela/talkTYPES18.pdf + http://www.cse.chalmers.se/~abela/types18.pdf

Mario Carneiro (Oct 12 2020 at 08:14):

I forget where I heard about v-code architecture, possibly cranelift

SnowFox (Oct 12 2020 at 08:15):

My understanding of RVSDG is that register colouring and instruction selection can be done straight from it without additional IRs.

Mario Carneiro (Oct 12 2020 at 08:17):

Is there a more complete reference for RVSDG than the poster?

SnowFox (Oct 12 2020 at 08:17):

BTW: eqsat enriched here means that the RVSDG IR is non-destructively optimized by adding equivalent programs in an equivalence graph. All optimizations are applied in all orders until the resource bounds are reached.

SnowFox (Oct 12 2020 at 08:17):

Yes.

SnowFox (Oct 12 2020 at 08:18):

I thought it had a link at the bottom but it just says the reference.

SnowFox (Oct 12 2020 at 08:19):

And that didn't seem to reference what I expected.. This is a great start. https://arxiv.org/abs/1912.05036

Mario Carneiro (Oct 12 2020 at 08:20):

I also found Reissmann's thesis

SnowFox (Oct 12 2020 at 08:22):

Also a good resource. Though this arxiv paper is the GOTO reference IIUC. The poster is just a nice dense visual overview. :)

SnowFox (Oct 12 2020 at 08:30):

As an added bonus; with RDT as the source and RVSDG in the middle; I'll have linearity throughout the whole compiler. Which should have very good implications for memory usage.

SnowFox (Oct 12 2020 at 08:47):

Re splitting up rvim. If I split by these categories; the largest I have is 21. First by XLEN {32, 64}, then by ISA extension {i, m}, then into {control flow (8), arithmetic (21), memory (8)} with these numbers for rv32i. rv64i adds (0, 9, 3) respectively. rv32m +8 arith, rv64m +5 arith.

SnowFox (Oct 12 2020 at 08:47):

But this might be a bit too fine.. ?

Mario Carneiro (Oct 12 2020 at 08:51):

The XLEN thing can hopefully be parameterized

Mario Carneiro (Oct 12 2020 at 08:52):

ISA extension is probably not worth it unless you intend to do something in particular with e.g. compiling to just rv64i

Mario Carneiro (Oct 12 2020 at 08:53):

For arithmetic ops in particular I would try my best to get uniform handling of a large number of those

Mario Carneiro (Oct 12 2020 at 08:53):

i.e. operations which differ only in the mathematical binary function being applied

SnowFox (Oct 12 2020 at 08:54):

I will be compiling to restrictive configurations; but most extensions can be emulated so are safe to allow so long that they do get reduced to the target. I.e. multiplication without the M extension.

Mario Carneiro (Oct 12 2020 at 08:54):

Another way to represent that is to have a predicate that tells you whether an instruction is in the configuration

Mario Carneiro (Oct 12 2020 at 08:55):

which seems like a good idea especially for risc-v because it has so many extensions

SnowFox (Oct 12 2020 at 08:55):

You'll love the vector extension ;)

Mario Carneiro (Oct 12 2020 at 08:56):

egh.. throw those all in one variant so they don't pollute the rest of the code

SnowFox (Oct 12 2020 at 08:57):

Note these aren't all R, the format detection code I wrote in python to generate these doesn't yet understand the vectors. https://gist.github.com/4fd94ffef4917b2f42d9bdb48e72fa8b

SnowFox (Oct 12 2020 at 08:57):

441

Mario Carneiro (Oct 12 2020 at 08:58):

I mean, this is just a bad representation, total abuse of the concept of an instruction

SnowFox (Oct 12 2020 at 08:58):

Certainly a lot more consumable than Intel's many many thousands of instructions.

Mario Carneiro (Oct 12 2020 at 08:58):

see also https://www.sigarch.org/simd-instructions-considered-harmful/

Mario Carneiro (Oct 12 2020 at 08:59):

in lean you can at least separate those mostly into orthogonal representation pieces (like size + op)

Mario Carneiro (Oct 12 2020 at 08:59):

there are 441 things you can do, in the same sense that there are 441 kinds of burgers I can get at the local burger shop

SnowFox (Oct 12 2020 at 09:00):

Very important note for RISC-V vectors; they are hardware configuration agnostic. I.e. unlike SSE, SSE3, AVX, AVX2, AVX512; RISC-V's V extension covers all of them generically. The vector lengths are user controlled and the hardware implements and optimizes however it wants to, if at all.

Mario Carneiro (Oct 12 2020 at 09:01):

I find 37 occurrences of ADD in that list. clearly it's not orthogonal enough

SnowFox (Oct 12 2020 at 09:04):

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

Mario Carneiro (Oct 12 2020 at 09:08):

One thing I notice in the docs is that instructions are indeed grouped, and they even use "dot notation" for separating out the orthogonal parts, e.g. vw(add|sub)(u?).{vw}{vx} has some clear orthogonality going on

SnowFox (Oct 12 2020 at 09:08):

Note that many of these instructions really are just names for variants of the same instruction.

SnowFox (Oct 12 2020 at 09:08):

As you just pointed out. :)

Mario Carneiro (Oct 12 2020 at 09:09):

I think it is very important that a formalization of the spec makes as much orthogonality explicit as possible

SnowFox (Oct 12 2020 at 09:09):

Agreed and this is the goal.

Mario Carneiro (Oct 12 2020 at 09:10):

I still don't have a great grasp on how many instructions x86 has if you do this properly, but it's a lot less than 1500

SnowFox (Oct 12 2020 at 09:11):

Imagine how low that'd be for RISC-V ;)

Mario Carneiro (Oct 12 2020 at 09:11):

it's just unfortunate that so many ISA specs "case bash" in their documentation

Mario Carneiro (Oct 12 2020 at 09:12):

so it's not clear when something is one part of a family or a special exception until you read all the exponentially many doc strings to ensure they are all the same

SnowFox (Oct 12 2020 at 09:33):

@Mario Carneiro Not sure if you remember; but I started with only the "opcode map" and not the instructions themselves. Note that most opcodes are the same. The 32-bit instructions have 7 bits, where the low two are both high. Thus 5 bits, 32 opcodes. Note that SUB is really an ADD with one of its basically unused bits set high.

SnowFox (Oct 12 2020 at 09:35):

See the top of page 129. https://github.com/riscv/riscv-isa-manual/releases/download/draft-20200727-8088ba4/riscv-spec.pdf

SnowFox (Oct 12 2020 at 09:37):

All the branch instructions have the BRANCH opcode, row = 11, column = 000, size = 11 (32-bit length). 1100011,

Mario Carneiro (Oct 12 2020 at 09:37):

For the most part the decoding is less important than the semantic classes of the instructions. It's not clear to me if the difference between ADD and SUB is any less/more than the difference between ADD and XOR, except in the encoding

SnowFox (Oct 12 2020 at 09:38):

ADD and XOR have the same opcode, but are different functions. See their funct3 bits. ADD is 000, XOR is 100.

Mario Carneiro (Oct 12 2020 at 09:38):

I see that

Mario Carneiro (Oct 12 2020 at 09:39):

ADD and SUB have the same funct3 but differ in the extra bits

SnowFox (Oct 12 2020 at 09:39):

Yes. Similarly for shifting left and right.

Mario Carneiro (Oct 12 2020 at 09:39):

but that seems like they just wanted 10 ALU ops and ran out of bits for the encoding

SnowFox (Oct 12 2020 at 09:41):

Not quite; SUB is exactly the same hardware as ADD with one minor tweak. SUB is effectively a fused instruction to invert the sign, then add. Thus subtraction is an argument of addition.... ish

Mario Carneiro (Oct 12 2020 at 09:42):

hm, where's add-carry?

SnowFox (Oct 12 2020 at 09:42):

Er, the left/right shifts use funct3. The right shifts logical vs arithmetic use the extra bit in funct7. Which just decides if it sign-extends or not.

SnowFox (Oct 12 2020 at 09:42):

No add-carry because there are no carry flags or overflow exceptions ;)

Mario Carneiro (Oct 12 2020 at 09:42):

I guess a SUB is an ADC + NOT

Mario Carneiro (Oct 12 2020 at 09:42):

well I mean the operation x , y -> x + y + 1

Mario Carneiro (Oct 12 2020 at 09:43):

or x + y + c if possible

Mario Carneiro (Oct 12 2020 at 09:43):

it's pretty important to have an op for that if you want to do e.g. bignum addition

SnowFox (Oct 12 2020 at 09:44):

Right, not an instruction. Easy to emulate of course. Just as rv32i does multiplication and division without multiplication or division.

SnowFox (Oct 12 2020 at 09:44):

Bignum implementations should defer carries anyway. Familiar with radix 2^51?

SnowFox (Oct 12 2020 at 09:45):

https://www.chosenplaintext.ca/articles/radix-2-51-trick.html

SnowFox (Oct 12 2020 at 09:52):

RISC-V avoids exceptions and flags. There are no arithmetic flags. I.e. division by zero acts like it does in Lean, but in hardware! There are floating point flags, available in the control and status registers (CSR); fflags.

Mario Carneiro (Oct 12 2020 at 09:55):

I don't object to that, but I hope there is still a way to perform a 65-bit addition

SnowFox (Oct 12 2020 at 10:02):

Lets ask godbolt. https://godbolt.org/z/WPs5Mq

Add then set less-than unsigned.

widdening_add:                          # @widdening_add
        add     a0, a0, a1
        sltu    a1, a0, a1
        ret

Mario Carneiro (Oct 12 2020 at 10:09):

clever

SnowFox (Oct 12 2020 at 10:10):

These are equivalent.

uint128 widdening_add(uint64 a, uint64 b) {
    return (uint128) a + (uint128) b;
}

uint128 widdening_add2(uint64 a, uint64 b) {
    uint64 c = a + b;
    return (uint128) c +
      (((uint128) (c < b)) << 64);
}

SnowFox (Oct 12 2020 at 10:12):

To contrast to x86...

widdening_add:
        mov     rax, rdi
        xor     edx, edx
        xor     edi, edi
        add     rax, rsi
        adc     rdx, rdi
        ret
widdening_add2:
        add     rdi, rsi
        mov     edx, 0
        mov     rax, rdi
        jnc     .L3
        add     rdx, 1
.L3:
        ret

SnowFox (Oct 12 2020 at 10:13):

Register movement noise and it doesn't even generate the same code.

Mario Carneiro (Oct 12 2020 at 10:14):

I don't think that's optimal code though

Mario Carneiro (Oct 12 2020 at 10:15):

I'm pretty sure you can get it down to one instruction if you handcode the assembly

SnowFox (Oct 12 2020 at 10:15):

BTW: I use RISC-V's assembly to debug my x86 code... I avoid x86. :D

Mario Carneiro (Oct 12 2020 at 10:16):

I don't want to suggest that x86 is a good ISA by any stretch, but ADC looks like a useful instruction to me

Mario Carneiro (Oct 12 2020 at 10:18):

Aside: even norm_num likes adc

SnowFox (Oct 12 2020 at 10:23):

I believe that RISC-V is the best ISA option there is. It may not be perfect but the complaints are small and few. I think I can trim RISC-V down by using the real opcode map instead of the instruction list; then generalize over the formats and get something pretty small.

With this, I guess I'm going to set aside our progress so far and start fresh. Focusing on keeping the inductives reasonable. An unfortunate reason to push towards this design, though I hope it'll work out better anyway.

Mario Carneiro (Oct 12 2020 at 10:27):

The main reason I chose x86 instead of RISC-V for the MM0 project was because x86 hardware is more accessible

Mario Carneiro (Oct 12 2020 at 10:28):

I think that from a technical perspective RISC-V is an obvious improvement on all axes, because it was actually designed instead of accrued

SnowFox (Oct 12 2020 at 10:32):

I don't know the implications on hardware demands for your project; but for mine, where there are sensitive security critical details involved... I'd really rather start with a well designed ISA I can memorize and reason with.

I'm "okay" with running my code in an emulator or using a JIT to x86 recompiler to run my code. Simply because it will take time to implement my compiler and language, to implement what I want to in it, and verify the whole chain there. This is a project for the future, where I can only see success for RISC-V and x86 necessarily dying out.

Worst case, someone else can just extend my compiler with an x86 backend. ;)

SnowFox (Oct 12 2020 at 10:33):

The emulator and JIT obviously ruin the security properties of the language and compiler; but are fine to just run code.

Mario Carneiro (Oct 12 2020 at 10:33):

Well, as long as it can be soundly specified one ISA is as good as another

Mario Carneiro (Oct 12 2020 at 10:35):

but other arches are on the todo list, after the initial MVP is done

SnowFox (Oct 12 2020 at 10:36):

What are your targets?

Mario Carneiro (Oct 12 2020 at 10:36):

But for the purpose of minimizing the chain of trust, it's no good to emulate the hardware because then you have to trust the emulator and the x86 machine it's running on

Mario Carneiro (Oct 12 2020 at 10:37):

Right now, I target a simple subset of x86-64 + a simple subset of posix

Mario Carneiro (Oct 12 2020 at 10:38):

the output is an ELF executable file, and the spec describes how the bits of that file turn into operational behavior

SnowFox (Oct 12 2020 at 10:39):

Right; I can't trust x86 at all and I certainly can't trust my RISC-V code when emulating it. The emulator is exclusively to just run it on legacy hardware until both all my code is ready, and RISC-V is cheap enough and has taken over the world. Yes, I'm calling all x86 legacy. :)

SnowFox (Oct 12 2020 at 10:42):

Kernel-wise; I'll target seL4 and will write a basic wrapper to run it on POSIX environments. OS-wise, I'll target Robigalia. Which is barely a thing so far.... but we're passionate to have a safe operating system. :)

SnowFox (Oct 12 2020 at 10:45):

I'm focusing on cryptography, capabilities, and verified distributed protocols.

SnowFox (Oct 12 2020 at 10:46):

Do you plan to implement all of x86?

SnowFox (Oct 12 2020 at 10:47):

I'd like to at least handle all of RV{32,64}IMCV. Note the lack of atomics and floats. Though they could come later.

SnowFox (Oct 12 2020 at 10:48):

Might throw in B too, the bit manipulation extension can be good for cryptography. But that is much less ready than V. V might even be stable soon.

SnowFox (Oct 12 2020 at 10:50):

I believe that SiFive is waiting on V to be stable before they offer a cheap Linux-ready SBC. Holding back with the expensive devboards until then.

Mario Carneiro (Oct 12 2020 at 10:50):

SnowFox said:

Do you plan to implement all of x86?

Certainly not. x86 is the means not the end. I've got about 30 instructions and even that might have been more than needed

Mario Carneiro (Oct 12 2020 at 10:52):

https://github.com/digama0/mm0/blob/master/examples/x86.mm0 is about 1600 lines to specify x86 + linux enough to compile a C-like programming language

Mario Carneiro (Oct 12 2020 at 10:55):

The main inductive type for instructions has 5+6+7+4=22 variants

SnowFox (Oct 12 2020 at 10:56):

aside: My godbolt test was a widening addition with 2x 64 bit numbers, not two 65 bit numbers. Not sure which you intended. If you meant the 2x 65-bit addition, then it is equivalent to 2x 128-bit addition. 3x 64-bit adds + one sltu to extract the carry bit.

Mario Carneiro (Oct 12 2020 at 10:56):

The unit of work in a bignum addition is 64 + 64 + 1 bit addition to produce a 65 bit result

SnowFox (Oct 12 2020 at 10:59):

Then I guess that'd be 3x add + 2x sltu.

SnowFox (Oct 12 2020 at 11:00):

But I think that'd be a silly method. When working with lots of or large numbers, you really want to defer the carries. Carries are too slow to handle immediately.

SnowFox (Oct 12 2020 at 11:01):

Especially if you have lots of operations to do.

Mario Carneiro (Oct 12 2020 at 11:05):

For verification though, with a single add loop, you probably don't want the complexity of deferred carries

Mario Carneiro (Oct 12 2020 at 11:05):

like in the norm_num example, I don't know if deferred carries help at all

Mario Carneiro (Oct 12 2020 at 11:05):

because it's serial

SnowFox (Oct 12 2020 at 11:09):

The difference is when you do something like adding 50 numbers. With the smaller radix; you split each number into smaller parts and let them grow until they could overflow, and finally apply the carries all at once. For radix 2^51 with 64-bit words, you can add 2^13 pre-normalized numbers before you need to normalize the carries. Next with associativity, you can see how parallel you can make this go. Not serial.

Mario Carneiro (Oct 12 2020 at 11:10):

No I mean proof checking is serial

Mario Carneiro (Oct 12 2020 at 11:11):

so something like norm_num has no advantage producing proofs of numerical facts with deferred carries

SnowFox (Oct 12 2020 at 11:13):

/me shrugs; just not where you'd use either then :P

Mario Carneiro (Oct 12 2020 at 11:14):

well, lean does addition in base 2 instead of 2^64 but the principle is the same

Mario Carneiro (Oct 12 2020 at 11:15):

amusingly, I've written norm_num three times now in three different proof systems and each time the base is different: lean's is base 2, metamath uses base 10, and MM0 uses base 16

SnowFox (Oct 12 2020 at 11:15):

Heh

SnowFox (Oct 12 2020 at 11:19):

@Mario Carneiro You mentioned "nested inductives" earlier, what does this mean? I understand mutually inductive types, haven't seen nested. Is there a syntax for it or just a thing you can do with the low level tools?

Mario Carneiro (Oct 12 2020 at 11:19):

inductive foo
| mk : list foo -> foo

Mario Carneiro (Oct 12 2020 at 11:20):

you use the inductive you are defining as a parameter to another inductive type

SnowFox (Oct 12 2020 at 11:20):

Ah. Okay.

Mario Carneiro (Oct 12 2020 at 11:20):

functions don't count - this is a regular inductive

inductive foo
| mk : (nat -> foo) -> foo

SnowFox (Oct 12 2020 at 11:21):

Noted. What is special about the nested variant that makes it a def instead of an inductive; in so far as #print sees it?

Mario Carneiro (Oct 12 2020 at 11:25):

lean's kernel doesn't actually support nested inductives, so instead it does some smoke and mirrors around a definition like

mutual inductive foo, list_foo
with foo : Type
| mk : list_foo -> foo
with list_foo : Type
| nil : list_foo
| cons : foo -> list_foo -> list_foo

which is itself defined in terms of

inductive foo' : bool -> Type
| mk : foo' tt -> foo' ff
| nil : foo' tt
| cons : foo' ff -> foo' tt -> foo' tt

which is a regular (indexed) inductive type that lean's kernel understands. Then you have to define all the maps from this type to the original type you wanted, and prove the recursion principle, and here the support is a little lacking, and you get a weak recursion principle that isn't good enough to prove all the facts you would like

Mario Carneiro (Oct 12 2020 at 11:27):

So I advise you to avoid the conditions under which lean triggers the nested/mutual compilation strategy, and write foo' directly if you need to

SnowFox (Oct 12 2020 at 11:28):

Huh, sounds unfortunate.. Noted.

SnowFox (Oct 12 2020 at 11:50):

@Mario Carneiro What do you think of this so far?

abbreviation register := nat

inductive reg_imm
| reg (reg : register)
| imm (imm : nat)

inductive compare | eq | ne | lt | ge | ltu | geu

inductive operation
| add (invert : opt_param bool ff)
| shift (amount : ) (arith : opt_param bool ff)
| xor
| or
| and

inductive ast
| branch (c : compare) (rs1 rs2 : register) (offset : nat)
| jump (reg : register) (base : option nat) (offset : nat)
| arith (op : operation) (rd rs1 : register) (rs2_imm : reg_imm)
| load (dest : register) (width : nat) (base : register) (offset : nat)
| store (src : register) (width : nat) (base : register) (offset : nat)

Mario Carneiro (Oct 12 2020 at 11:51):

is the shift amount always an immediate?

SnowFox (Oct 12 2020 at 11:52):

Er, that amount shouldn't be in that position.

SnowFox (Oct 12 2020 at 11:52):

But it is the only integer here, so replace that with direction : bool

Mario Carneiro (Oct 12 2020 at 11:53):

looks pretty good. Is this the whole thing?

SnowFox (Oct 12 2020 at 11:53):

Or just two separate operations.

SnowFox (Oct 12 2020 at 11:53):

This covers RV32I.

Mario Carneiro (Oct 12 2020 at 11:53):

if so it's a pretty massive reduction from the original

SnowFox (Oct 12 2020 at 11:54):

Very!

SnowFox (Oct 12 2020 at 11:56):

There is also what I started with; before we derailed with the raw instructions. Which I'll probably use this round.

inductive rv_base_opcode
| load   | load_fp  | custom_0   | misc_mem | op_imm | auipc      | op_imm_32
| store  | store_fp | custom_1   | amo      | op     | lui        | op_32
| madd   | msub     | nmsub      | nmadd    | op_fp  | reserved_0 | custom_2
| branch | jalr     | reserved_1 | jal      | system | reserved_2 | custom_3

Mario Carneiro (Oct 12 2020 at 11:56):

This doesn't cover encoding though; hopefully these classes align well with the opcodes

Mario Carneiro (Oct 12 2020 at 11:56):

right, that

SnowFox (Oct 12 2020 at 11:58):

They do. All operations over immediates are rv_base_opcode.op_imm. All operations over registers are rv_base_opcode.op. The specifics only map to funct3 and funct7 where funct7 is basically one bit rarely set, occasionally a different bit. :P

SnowFox (Oct 12 2020 at 11:58):

... mostly lots of zeros :)

Mario Carneiro (Oct 12 2020 at 11:59):

I mean ast

Mario Carneiro (Oct 12 2020 at 11:59):

clearly rv_base_opcode maps well to the opcodes, AFAICT it's 1-1

SnowFox (Oct 12 2020 at 11:59):

Right, I meant that ast.arith has only two opcodes.

SnowFox (Oct 12 2020 at 12:00):

ast.branch maps to opcodes.branch, ast.jump with jal and jalr. load and store, are well, exactly those.

Mario Carneiro (Oct 12 2020 at 12:01):

you should try writing the function ast -> (opc, funct3, funct7), hopefully you can pattern match your way to victory

Mario Carneiro (Oct 12 2020 at 12:03):

or I guess that could even map directly to nat (really fin (2^32)) using helper functions like fmt_R (opcode7 funct3 funct7 rd rs1 rs2 : nat) : nat

Mario Carneiro (Oct 12 2020 at 12:04):

that seems like the cleanest way to write down all that encoding information on this ast basis

SnowFox (Oct 12 2020 at 12:10):

Why does vscodium not always offer to expand case splits for you? o.o

SnowFox (Oct 12 2020 at 13:16):

Hmm. Does it make sense to expand these out or use "modes" of the base operation?

inductive operation
| add | sub | addw
| sll | srl | sra
| xor
| or
| and
| mul | mulh | mulhs | mulhsu | mulw
| div | divu | divw | divuw
| rem | remu | remw | remuw

Mario Carneiro (Oct 12 2020 at 13:17):

how are they specified?

SnowFox (Oct 12 2020 at 13:17):

rv64i adds addw. rvm adds the last 3 rows.

SnowFox (Oct 12 2020 at 13:18):

Eh, RV64I adds a few more.

Mario Carneiro (Oct 12 2020 at 13:18):

The main question is whether you are always going to have to case split the subparts anyway

SnowFox (Oct 12 2020 at 13:19):

Yes.

Mario Carneiro (Oct 12 2020 at 13:19):

if there are functions where you can say (mul _) := stuff where _ is the subop, then it's a win

SnowFox (Oct 12 2020 at 13:19):

The Ws are operations over sign-extended 32-bits on 64-bit systems. the rest are XLEN-wise.

SnowFox (Oct 12 2020 at 13:20):

All but the logical operations have a W variant.

SnowFox (Oct 12 2020 at 13:20):

So I guess I should split arithmetic into two modes and split logic out.

SnowFox (Oct 12 2020 at 13:21):

I also forgot about set less than (unsigned) which I guess is logic.

SnowFox (Oct 12 2020 at 13:23):

However, there is no mul[h[[s]u]]w.

SnowFox (Oct 12 2020 at 13:23):

Only mulw, of all those options.

SnowFox (Oct 12 2020 at 13:24):

Which I guess is fine to "allow" and later mark as invalid.

SnowFox (Oct 12 2020 at 13:24):

As I'd do for subtraction with an immediate.

SnowFox (Oct 12 2020 at 13:25):

(Because addition with an immediate has signed immediates already)

SnowFox (Oct 12 2020 at 13:39):

RV{32,64}IM with a few illegal cases, but easy to normalize (subi -> addi) or mark as invalid (no mulhw et al).

abbreviation register := nat

inductive reg_imm
| reg (reg : register)
| imm (imm : nat)

inductive compare | eq | ne | lt | ge | ltu | geu

inductive arithmetic
| add | sub
| sll | srl | sra
| slt | sltu
| mul | mulh | mulhu | mulhsu
| div | divu
| rem | remu

inductive logical
| xor
| or
| and

inductive ast
| arith (op : arithmetic) (w : bool) (rd rs1 : register) (rs2_imm : reg_imm)
| logic (op : logical)  (rd rs1 : register) (rs2_imm : reg_imm)
| lui (rd : register) (offset : nat)
| auipc (rd : register) (offset : nat)
| jump (reg : register) (base : option nat) (offset : nat)
| branch (c : compare) (rs1 rs2 : register) (offset : nat)
| load (dest : register) (width : fin 4) (base : register) (offset : nat)
| store (src : register) (width : fin 4) (base : register) (offset : nat)
| fence (mode pred succ : nat)
| call
| break

SnowFox (Oct 12 2020 at 14:03):

@Mario Carneiro get_opcode and get_funct

namespace ast
  def get_opcode : ast -> opcode
  | (branch _ _ _ _) := opcode.branch
  | (jump _ (some _) _) := opcode.jalr
  | (jump _ none _) := opcode.jal
  | (arith _ ff _ _ (reg_imm.reg _)) := opcode.op
  | (arith _ ff _ _ (reg_imm.imm _)) := opcode.op_imm
  | (arith _ tt _ _ (reg_imm.reg _)) := opcode.op_32
  | (arith _ tt _ _ (reg_imm.imm _)) := opcode.op_imm_32
  | (logic _ _ _ (reg_imm.reg _)) := opcode.op_32
  | (logic _ _ _ (reg_imm.imm _)) := opcode.op_imm_32
  | (lui _ _) := opcode.lui
  | (auipc _ _) := opcode.auipc
  | (load _ _ _ _) := opcode.load
  | (store _ _ _ _) := opcode.store
  | (fence _ _ _) := opcode.misc_mem
  | call := opcode.system
  | break := opcode.system

  open arithmetic
  open logical
  open compare

  -- funct7 << 3 + funct3
  def get_funct : ast -> nat
  -- rvi
  | (branch c _ _ _) :=
    match c with
    | eq := 0b000
    | ne := 0b001
    | lt := 0b100
    | ge := 0b101
    | ltu := 0b110
    | gtu := 0b111
    end
  | (jump _ _ _) := 0
  | (arith add  _ _ _ _) := 0b000
  | (arith sub  _ _ _ _) := 0b000 + nat.shiftl 3 0b0100000
  | (arith sll  _ _ _ _) := 0b001
  | (arith slt  _ _ _ _) := 0b010
  | (arith sltu _ _ _ _) := 0b011
  | (arith srl  _ _ _ _) := 0b101 + nat.shiftl 3 0b0100000
  | (arith sra  _ _ _ _) := 0b101
  | (logic xor  _ _ _)   := 0b100
  | (logic or   _ _ _)   := 0b110
  | (logic and  _ _ _)   := 0b111
  | (load _ width _ _)   := width.val
  | (store _ width _ _)  := width.val
  | (lui _ _) := 0
  | (auipc _ _) := 0
  | (fence _ _ _) := 0
  | call := 0
  | break := 0
  -- rvm
  | (arith mul    _ _ _ _) := 0b000 + nat.shiftl 3 0b0000001
  | (arith mulh   _ _ _ _) := 0b001 + nat.shiftl 3 0b0000001
  | (arith mulhsu _ _ _ _) := 0b010 + nat.shiftl 3 0b0000001
  | (arith mulhu  _ _ _ _) := 0b011 + nat.shiftl 3 0b0000001
  | (arith div    _ _ _ _) := 0b100 + nat.shiftl 3 0b0000001
  | (arith divu   _ _ _ _) := 0b101 + nat.shiftl 3 0b0000001
  | (arith rem    _ _ _ _) := 0b110 + nat.shiftl 3 0b0000001
  | (arith remu   _ _ _ _) := 0b111 + nat.shiftl 3 0b0000001
end ast

SnowFox (Oct 12 2020 at 14:04):

@Mario Carneiro How does this partitioning look?

SnowFox (Oct 12 2020 at 14:14):

s/ast/inst/g. Not quite a tree...

Mario Carneiro (Oct 12 2020 at 14:17):

why the match on branch conditions, instead of putting it in the top level?

Mario Carneiro (Oct 12 2020 at 14:19):

Unless funct7 is actually 3 bits above funct3, you probably shouldn't mix them like that, and instead either return an inductive type (with a case for funct3 and another for funct3 + funct7), or a precomposed nat containing the entire instruction (using auxiliary functions type_1 : funct3 -> nat and type_2 : funct3 -> funct7 -> nat)

SnowFox (Oct 12 2020 at 14:21):

Actually I should shift it more than 3, as they are a fixed distance across. This is consistent for all 32-bit instructions.

SnowFox (Oct 12 2020 at 14:21):

That match will be moved. I'm currently enumerating the opcodes of interest to map them to formats; then I'll write an encoder and decoder which masks appropriately.

Mario Carneiro (Oct 12 2020 at 14:22):

I mean you have those instruction formats, right? you should keep them separate until you have all the fields

SnowFox (Oct 12 2020 at 14:25):

The format only decides the layout of the immediates and which are present between rd rs1 rs2 funct3 funct7. funct7 "doesn't exist for immediates" but there are a few immediates which effectively use it. Specifically the shift right arithmetic. So it "acts" half-way between I and R.

Mario Carneiro (Oct 12 2020 at 14:27):

that is, you can define

def fmt_R (opcode7 funct3 funct7 rd rs1 rs2 : nat) : nat :=
opcode7 + rd.shiftl 7 + funct3.shiftl 12 +
rs1.shiftl 15 + rs2.shiftl 20 + funct7.shiftl 25

and then write encode : ast -> nat that calls fmt_R for R-type instructions

Mario Carneiro (Oct 12 2020 at 14:28):

and fmt_I doesn't have a funct7 argument so there is no bad typing

SnowFox (Oct 12 2020 at 14:42):

The format only influences decoding (as we need to know how to mask the correct values out); and for the immediate layout while encoding. The encoder can assume every instruction has an opcode, has all the registers, and has both function parts without caring about the format.

SnowFox (Oct 12 2020 at 14:44):

Only the immediate explosion depends on the format -- for encoding. Actually decoding could probably be pretty lazy too.

Mario Carneiro (Oct 12 2020 at 14:54):

  def opcode.encode : opcode  nat := sorry

  def branch_funct3 : compare  nat
  | eq := 0b000
  | ne := 0b001
  | lt := 0b100
  | ge := 0b101
  | ltu := 0b110
  | gtu := 0b111

  def B_type (opcode funct3 rs1 rs2 imm : nat) : nat := sorry
  def J_type (opcode rd imm : nat) : nat := sorry
  def I_type (opcode funct3 rd rs1 imm : nat) : nat := sorry
  def R_type (opcode funct3 funct7 rd rs1 rs2 : nat) : nat := sorry

  def encode_arith : arithmetic   × 
  | add := (0b000, 0)
  | sub := (0b000, 0b0100000)
  | sll := (0b001, 0)
  | slt := (0b010, 0)
  | sltu := (0b011, 0)
  | srl := (0b101, 0b0100000)
  | sra := (0b101, 0)
  | _ := sorry

  def encode : ast -> nat
  | (branch c rs1 rs2 off) := B_type opcode.branch.encode (branch_funct3 c) rs1 rs2 off
  | (jump reg (some rs1) off) := I_type opcode.jalr.encode 0b000 reg rs1 off
  | (jump reg none off) := J_type opcode.jal.encode reg off
  | (arith op w rd rs1 (reg_imm.reg rs2)) :=
    let (funct3, funct7) := encode_arith op in
    R_type (if w then opcode.op_32 else opcode.op).encode funct3 funct7 rd rs1 rs2
  | (arith op w rd rs1 (reg_imm.imm imm)) :=
    let (funct3, _) := encode_arith op in
    I_type (if w then opcode.op_imm_32 else opcode.op_imm).encode funct3 rd rs1 imm
  | _ := sorry

Mario Carneiro (Oct 12 2020 at 14:57):

I'm ignoring the funct7 field of arith with immediate here, I suppose that means that subi doesn't exist?

SnowFox (Oct 12 2020 at 14:58):

Well, subi doesn't exist anyway.

Mario Carneiro (Oct 12 2020 at 14:59):

yeah, the type of this function doesn't really allow for error handling so I guess it doesn't matter what it does in that case

SnowFox (Oct 12 2020 at 14:59):

But srli does.

Mario Carneiro (Oct 12 2020 at 14:59):

oh?

SnowFox (Oct 12 2020 at 15:00):

Yes, the shifts are the exception to the format rule here; they don't fit in any of them. :P

Mario Carneiro (Oct 12 2020 at 15:00):

no ISA is complete without a few exceptions to the rule

SnowFox (Oct 12 2020 at 15:00):

Because you encode the shift amount in the lower bits of the immediate; the upper bits are all zeros except for the single bit which indicates if a right shift is arithmetic vs logical.

SnowFox (Oct 12 2020 at 15:00):

arithmetic means sign-extended shift.

Mario Carneiro (Oct 12 2020 at 15:03):

ah, the funct7 bits are switched between srl and sra

SnowFox (Oct 12 2020 at 15:04):

Yeah I noticed that while splitting the funct7 out to its own function.

SnowFox (Oct 12 2020 at 15:18):

  def encode (i : inst) :  :=
  i.get_opcode.to_nat + i.rd.shiftl 7 + i.funct3.shiftl 12
  + i.rs1.shiftl 15 + i.rs2.shiftl 20 + i.funct7.shiftl 25 + i.imm

SnowFox (Oct 12 2020 at 15:18):

This works for all 32-bit instructions.

SnowFox (Oct 12 2020 at 15:19):

The format only influences the positions of the immediates. :)

SnowFox (Oct 12 2020 at 15:21):

Returning to the magic metaprogramming we started with; implemented for all instructions. I'll adapt this to the opcode, funct3, and funct7 mapping to get bidirectional maps.

SnowFox (Oct 12 2020 at 15:23):

Actually. I may take it a step further regarding the opcode map. To actually use the rows/columns to encode them. :) Which is basically just the binary count of the position +3 while skipping any where the lower 5 bits are all high.

SnowFox (Oct 12 2020 at 15:24):

The condition map is at least a direct count. Is there a @[derive to_nat] for inductives?

Mario Carneiro (Oct 12 2020 at 15:27):

The point of the different functions is to make it easier to express which format a given instruction has, so that you have to pass 6 arguments vs 7 and such

Mario Carneiro (Oct 12 2020 at 15:28):

Regarding the opcode inductive, you could just skip it and put numbers in where I had things like opcode.op.encode

Mario Carneiro (Oct 12 2020 at 15:30):

There is no derive to_nat, although it wouldn't be hard to autogenerate

SnowFox (Oct 12 2020 at 15:30):

Could.

SnowFox (Oct 12 2020 at 15:30):

Probably will.

SnowFox (Oct 12 2020 at 15:31):

But it is nice to have the full list and to use them by-name to avoid mistakes. But this'll be verified eventually anyway... So it would be fine to just put the numbers.

Mario Carneiro (Oct 12 2020 at 15:31):

You can also skip the inductive and instead do def opcode.op := 0b0100011 or whatever

SnowFox (Oct 12 2020 at 15:33):

True. Although the biggest detail for annoyance will be flipping the encoder to decode; ideally using the metaprogramming trick to avoid the duplication.

SnowFox (Oct 12 2020 at 15:51):

!! Lean's language server appears to have a memory leak?

SnowFox (Oct 12 2020 at 15:51):

Filled my ram to 100% started using emergency swap. o.o

Mario Carneiro (Oct 12 2020 at 17:42):

keep your finger on the Lean: Restart button

SnowFox (Oct 13 2020 at 13:08):

@Mario Carneiro I'm stuck again with generating definitions. Is there anything stopping an interface for constructing these a la `(«def» q := 3) or just with def as it is obvious that we're inside an opened parenthesis so it mustn't be a command anyway.

run_cmd [
    [`load,   `load_fp,  `custom_0, `misc_mem, `op_imm,   `auipc,      `op_imm_32],
    [`store,  `store_fp, `custom_1, `amo,      `op,       `lui,        `op_32    ],
    [`madd,   `msub,     `nmsub,    `nmadd,    `op_fp,    `reserved_0, `custom_2 ],
    [`branch, `jalr,     `reserved, `jal,      `system,   `reserved_1, `custom_3 ]
  ].enum >>= λ i, n⟩, n.enum >>= λ j, n⟩, /- def n := 3 + j.shiftl 2 + i.shiftl 5 -/

Mario Carneiro (Oct 13 2020 at 13:09):

well commands are not expressions, so that doesn't work literally

SnowFox (Oct 13 2020 at 13:10):

or any similar interface if not exactly using the expr-quotes.

Mario Carneiro (Oct 13 2020 at 13:12):

I mean for the longest time there wasn't even an API to add defs at all (with all the bells and whistles)

Mario Carneiro (Oct 13 2020 at 13:13):

you can use tactic.add_decl, I think that might not add equation lemmas though

Mario Carneiro (Oct 13 2020 at 13:15):

there is also environment.add_defn_eqns for more advanced definitions

SnowFox (Oct 13 2020 at 13:16):

Re reducing variations. I've arrived at embedding the width in the register and using an "alt" flag. Usually to distinguish between (un)signed operations, though also to select the high half of mults (which then respects source register (un)sign flags). This lines up well for tuning the operations.

Mario Carneiro (Oct 13 2020 at 13:16):

Anyway lean 4 will add something a lot like you are proposing; quotation and antiquotation now works on every syntactic class

SnowFox (Oct 13 2020 at 13:18):

Awesome. When I started with Lean, I dove into Lean 4's code ... then I found myself writing Lean 4 by mistake. :P

SnowFox (Oct 13 2020 at 13:19):

As in, while using lean3.

SnowFox (Oct 13 2020 at 13:25):

I was trying to use tactic.add_defn_equations before; though with add_decl, I feel much closer but have hit at least an issue with lists/tactics/monad switching... Not sure how to fix this in Lean...

  run_cmd [
    [`load,   `load_fp,  `custom_0, `misc_mem, `op_imm,   `auipc,      `op_imm_32],
    [`store,  `store_fp, `custom_1, `amo,      `op,       `lui,        `op_32    ],
    [`madd,   `msub,     `nmsub,    `nmadd,    `op_fp,    `reserved_0, `custom_2 ],
    [`branch, `jalr,     `reserved, `jal,      `system,   `reserved_1, `custom_3 ]
  ].enum >>= λ i, n⟩, n.enum >>= λ j, n⟩,
  tactic.add_decl (declaration.defn n [] `() `(3 + j.shiftl 2 + i.shiftl 5)
                     reducibility_hints.abbrev tt)

SnowFox (Oct 13 2020 at 13:31):

There we go. .mmap

  run_cmd [
    [`load,   `load_fp,  `custom_0, `misc_mem, `op_imm,   `auipc,      `op_imm_32],
    [`store,  `store_fp, `custom_1, `amo,      `op,       `lui,        `op_32    ],
    [`madd,   `msub,     `nmsub,    `nmadd,    `op_fp,    `reserved_0, `custom_2 ],
    [`branch, `jalr,     `reserved, `jal,      `system,   `reserved_1, `custom_3 ]
  ].enum.mmap $ λ i, n⟩, n.enum.mmap $ λ j, n⟩,
  tactic.add_decl (declaration.defn n [] `() `(3 + j.shiftl 2 + i.shiftl 5)
                     reducibility_hints.abbrev tt)

Mario Carneiro (Oct 13 2020 at 13:32):

you probably want mmap' there since you aren't returning anything

Mario Carneiro (Oct 13 2020 at 13:33):

you may or may not want to precompute those numbers btw, by using let n := 3 + j.shiftl 2 + i.shiftl 5 in reflect n

Mario Carneiro (Oct 13 2020 at 13:34):

oh, actually what you have definitely won't work because you haven't antiquoted i and j

SnowFox (Oct 13 2020 at 13:34):

/-
invalid field notation, 'branch' is not a valid "field" because environment does not contain 'nat.branch'
  op
which has type

-/

Mario Carneiro (Oct 13 2020 at 13:35):

did you write op.branch somewhere?

SnowFox (Oct 13 2020 at 13:35):

Yes. And I'm running the run_cmd inside the op namespace.

Mario Carneiro (Oct 13 2020 at 13:36):

you might want to qualify those names, op is kind of general for an unqualified name

SnowFox (Oct 13 2020 at 13:36):

It seems to be clobbering the namespace... does the add_decl bypass the namespace?

SnowFox (Oct 13 2020 at 13:37):

Seem so. (`op ++ n) worked.

Mario Carneiro (Oct 13 2020 at 13:37):

Yeah, you said add_decl `op so it's making a definition called `op

Mario Carneiro (Oct 13 2020 at 13:38):

you can sometimes use ``op to do name resolution but it won't work here because the thing doesn't exist yet

SnowFox (Oct 13 2020 at 13:38):

Makes sense. I assumed the namespace would inject itself but I guess that only affects the commands. The tactic just bypasses it.

SnowFox (Oct 13 2020 at 13:39):

Current code -> https://gist.github.com/4d56511885f5b44514b0fcf006a5546a

SnowFox (Oct 13 2020 at 13:50):

Any more tips @Mario Carneiro?

Mario Carneiro (Oct 13 2020 at 13:52):

It seems a bit odd to put the alt flag in the register instead of the instruction

SnowFox (Oct 13 2020 at 13:54):

The flag basically means that the register itself is treated as signed in most cases. The mul is the only deviation, where rd.alt means it acts on the high half of the result, but then the rs1.alt and rs2.alt represent their own signs.

SnowFox (Oct 13 2020 at 15:15):

I found one edge case with my approach with the type tagged registers (the alt flag). I can't encode one variant of the the SRAI hint, where the immediate is zero; because I use the sign of the immediate to differentiate between left and right shifts. The alt in this case applies to the destination, to distinguish logical vs signed arithmetic shifts. One minor edge case!

SnowFox (Oct 13 2020 at 15:16):

I could specially encode this with its own hack... but I'll leave it out for now.

SnowFox (Oct 13 2020 at 15:23):

Considering it is a single unused custom hint; which may never be used... pretty safe to ignore. :)

SnowFox (Oct 13 2020 at 15:28):

Actually... I just overconstrained it. I don't care about the direction in this position because it has already been declared that it is an arithmetic shift; which is only valid for right shifts.

SnowFox (Oct 13 2020 at 15:45):

Actually the hint issue does exist; SRL with rd = x0 and imm = 0 is inexpressible.

SnowFox (Oct 15 2020 at 09:04):

@Mario Carneiro Major update. Still a lot left to do. Now the registers may only be legally composed for each of the instructions. This should cover all of RV{32,64,128}IMFDQZicsr, or soon. I need to do another pass over all the variations. The core instruction type unifies all these extensions well. Abstracting over int vs float, (un)signed, and all widths. https://gist.github.com/4d56511885f5b44514b0fcf006a5546a

Bolton Bailey (Aug 27 2022 at 20:05):

I'm interested in getting RISC-V semantics in Lean to help with my STARK verification project. Sorry to resurrect a two-year old thread, but I thought I'd ask @SnowFox , did you do any more work on this? Is there any repository you have that I could potentially use?

Karl Palmskog (Aug 27 2022 at 20:12):

maybe useful for RISC-V specification: https://github.com/rems-project/sail https://github.com/mit-plv/riscv-coq

Bolton Bailey (Aug 27 2022 at 20:21):

I have heard of this project, and in fact it's at the top of the list for what my team is considering using now. The only issue is that it's written in Coq, and I would prefer to use Lean.

Karl Palmskog (Aug 27 2022 at 20:21):

you could add a Lean backend to Sail (possibly following the Coq backend for Sail)

Bolton Bailey (Aug 27 2022 at 20:23):

That sounds like it could be the path of least resistance. Thanks!

Andrés Goens (Aug 28 2022 at 13:57):

I would also be very interested in that (adding such a Lean backend to Sail)

Sofia Snow (Aug 29 2022 at 09:52):

@Bolton Bailey, @Karl Palmskog, @Andrés Goens: Yes, work continued, but along a few different dimensions, very incomplete and I've been meaning to get back to this. https://gist.github.com/sofia-snow/31b32cc03c433faedde58ef63db08b0e

Notably this design focuses on dataflow operations and expressing all the variations with minimal explosion. All control flow is factored out and will be handled by an RVSDG based IR. Ex. Conditional branches will be encoded as slt simple nodes composed with gamma nodes, fused respectively when encoding.

I've shifted focus towards other components of my compiler, notably the type theory, and optimizer. No repository yet. I welcome collaborating with others.

I'd like to integrate it with sail, but I feel that sail is too low level and anything it generates promotes poor interfaces. I'm after a higher level specification which can generate all which sail does, and more, with appropriate abstractions. Thus can generate similar output to sail and cross-verify their consistency.

Sofia Snow (Aug 29 2022 at 10:16):

CSRs and memory accesses need state edges, RVSDG models partial orderings for states. A tweak to RVSDG I've applied is to make states linear and use an additional node which acts like gamma (generalized switch statement) to partition and join state rather than choice. Notably along with this, strict memory partitioning. My exact memory model isn't yet written down but it'll allow mutable sharing when explicitly stated as intentional.

Sofia Snow (Aug 29 2022 at 10:20):

Similar to Lean's totality checker, some uses can be proven eventually consistent for all orders, but others may require manual proofs, or you may explicitly allow divergence.

Sofia Snow (Aug 29 2022 at 10:34):

Also notable I'll be using an E-graph representation extending RVSDG to run equality saturation, using user-defined and user-proven rewrites. Meaning it'll represent an exponentially large set of equivalent programs, then use an SMT solver to select the optimal program per a cost function (for each dimension along the Pareto frontier of the cost function), and I want a full correctness proof from the source language (very Lean like syntax, with resource grades for exact or bounded usage and taint tracking (notably sensitive data may not be published or leaked through side channels)); to the machine code (at least RISC-V and WebAssembly; if successful, may consider SPIR-V or a tiny subset of x86_64 or arm64).

RVSDG.

https://arxiv.org/abs/1912.05036
https://www.sintef.no/contentassets/11da6d67207348db98a30ddbdf3b0bba/reissmann_poster.pdf

Equality saturation.

https://arxiv.org/abs/1012.1802
https://arxiv.org/abs/2004.03082
https://arxiv.org/abs/2108.02290

Andrea Nardi (Oct 05 2022 at 07:12):

Sofia Snow
I wanted to work on a similar project for an optimizing compiler, but instead of assembly code it would optimize fpga netlists. In my mind it is very similat as working with a what you(and the literature) refer to as dataflow. In the fpga world (differently from cpus), every "operation" has a well defined and predictable "runtime" often referred to as latency in clock cycles. It would be interesting using equality saturation to optimize for latency. You could take it a step further and allow different possibly aproximative methods of calculation which have different accuracy (e.g. lookup table vs cordic for sin(x)) and pareto optimize for latency and accuracy.
Did not know the method of saturation equality. Those seem like nice background material.

Sofia Snow (Oct 05 2022 at 08:45):

(Conversation continued in a private message.)

Joh-Tob Schäg (Mar 23 2023 at 00:13):

I found the talk about the Vector ISA interesting. Would you generally suspect Lean or Z3 being quicker for this kind of work? I would be interested in a super optimizer for certrain tractable functions.
I ask because i am running an actual Vector ISA https://twitter.com/Anno0770/status/1619333399227011073 although it is not RISC-V based https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf .

Mario Carneiro (Mar 23 2023 at 00:44):

Lean and Z3 are not really interchangeable, there are very few problems where they both apply. Z3 is an automated theorem prover, and is usually used as a backend for other tools which provide a more easy to use interface than raw SMT-LIB. It's not really intended as something you use to "do formalizations in", as it has no facilities for mathematical library management

Mario Carneiro (Mar 23 2023 at 00:46):

it is however something you can use to do every step of a formalization in; for example F# is a language which basically uses Z3 as a backend to prove all the individual proof steps and handles the higher level structure. F# is probably a reasonable choice for a RISC-V formalization project, more directly comparable to lean

Mario Carneiro (Mar 23 2023 at 00:51):

If you aren't proving any theorems, or only trivial theorems, then probably Lean is a good choice. I don't have enough experience with F# to say how well it would fare at just defining a bunch of inductive types and recursive functions which is what this would amount to.

If you are proving theorems, such as determinism or type correctness of the ISA, then Lean will do well if they are proofs expressible nicely using the type theory (e.g. induction and recursion), while if it is stuff like proving bit twiddling identities over fixed width bit vectors then Z3 blows lean out of the water. Of course, lean is intended as a front end for a variety of kinds of automation, so it is not impossible to get an F#-like interface with frequent calls to an SMT solver like Z3; this just doesn't happen out of the box like it does in F#.

Joh-Tob Schäg (Mar 23 2023 at 01:06):

Thank you. Looks like i Z3 would be better suited to derive fast programs. The ISAs is someone else concern i just want to have as much fun with it as possible.

Mario Carneiro (Mar 23 2023 at 01:35):

I'm not sure why you need either one TBH if you just want a superoptimizer. That can be written in a regular programming language

Joh-Tob Schäg (Mar 23 2023 at 02:16):

By superoptimizer i mean a program that if run long enough is guaranteed to the find the optimal i,plementation (assuming infinite memory). Having some SMT capabilities really, really cuts down the search tree. Especially to discover values for bit twidling. So that kind of stuff https://fitzgeraldnick.com/2020/01/13/synthesizing-loop-free-programs.html

Karl Palmskog (Mar 23 2023 at 06:41):

I think you mean F* (Fstar) rather than F# (Fsharp)

Mario Carneiro (Mar 23 2023 at 10:35):

yes I did

Andrés Goens (Mar 23 2023 at 11:18):

SMT has certainly been used more for these kinds of "superoptimization" tasks. @Joh-Tob Schäg , you might want to look into CVC5, which is also an SMT solver like Z3, but it has algorithms for syntax guided synthesis, which is commonly also used for "superoptimization". Bus as others mentioned above, Lean is probably not the tool for that job.

Constantin Gierczak (Apr 13 2023 at 10:12):

I have some experience using F*, so I might give some insight. It doesn't seem like the best option if you want to generate information. It really is more geared towards verifying programs, e.g. cryptographic protocol implementations, memory allocators.


Last updated: Dec 20 2023 at 11:08 UTC