Stream: Is there code for X?

Topic: bidirectional maps?

SnowFox (Oct 01 2020 at 11:18):

Hello. I have an inductive type a la inductive t | a | b | cwhich uniquely maps to a natural number | a := 1 | b := 5 | c := 30. I want two functions t_to_nat : t -> nat and f_of_nat : nat -> t, but don't want to manually define them both. Is there a mathlib type I should be using? finmap sounds appropriate but I don't have sigma types here (yet).

Eric Wieser (Oct 01 2020 at 11:19):

What should f_of_nat 2 be?

SnowFox (Oct 01 2020 at 11:20):

Ah, good point. Lets say I have a reserved illegal variant. :)

SnowFox (Oct 01 2020 at 11:24):

Or just option; both work.

SnowFox (Oct 01 2020 at 11:25):

It might also be fine to just assume a safe default; putting it on the caller to ensure the number is valid.

Eric Wieser (Oct 01 2020 at 11:36):

It seems like somewhere there ought to be some noncomputable def function.left_inverse_of_injective (f : α → β) (h : injective f) : β → α := sorry which you could use here

Hm?

SnowFox (Oct 01 2020 at 11:41):

I don't see any such or similar function.

SnowFox (Oct 01 2020 at 11:51):

@Eric Wieser This?

/-- The inverse of a function (which is a left inverse if f is injective
and a right inverse if f is surjective). -/
noncomputable def inv_fun (f : α → β) : β → α := inv_fun_on f set.univ


Eric Wieser (Oct 01 2020 at 11:53):

Yes, docs#function.inv_fun and docs#function.partial_inv look like what you want

SnowFox (Oct 01 2020 at 11:56):

@Eric Wieser Given the noncomputable keyword, I'm not seeing how?

SnowFox (Oct 01 2020 at 12:07):

I mean, I now have a noncomputable def of_nat but I need it to be computable.

Eric Wieser (Oct 01 2020 at 12:13):

Ah, you didn't say it needed to be computable ;)

SnowFox (Oct 01 2020 at 12:14):

I thought that was the default! :P

Johan Commelin (Oct 01 2020 at 12:15):

We generally like noncomputable stuff (-;

SnowFox (Oct 01 2020 at 12:24):

One idea I had, adding @[pattern] to the to_nat and trying to implement def of_nat | (to_nat x) := x but that hits "invalid function application in pattern, cannot be reduced to a constructor, possible solution mark as inaccessible"

SnowFox (Oct 01 2020 at 12:29):

Unless someone has a better idea; I guess I'll go with a list (t \x nat) and hack together lookups over this...

Mario Carneiro (Oct 01 2020 at 17:23):

@SnowFox I think you need to be clearer about your actual application. It's certainly possible to write a tactic that does the necessary copy paste to make two functions the way you would expect

SnowFox (Oct 02 2020 at 05:21):

To be concrete; I want a map between representations of RISC-V opcodes. An inductive type to introduce names and exhaustive coverage, and its numerical representation. However, some mappings depend on the integer length, XLEN. I.e. (opcode.system, any XLEN) <=> nat.shiftl 3 5 + nat.shiftl 4 2 + 3, and both (opcode.c_fld, 32 or 64) and (opcode.c_lq, 128) map to nat.shiftl 1 13 + 0.

Kyle Miller (Oct 02 2020 at 05:27):

Ok, so you want an assembler/dissassembler pair? (assembler being mnemonics -> words and disassembler being words -> mnemonics, the mnemonics encoded as inductive types?)

Precisely, yes.

Kyle Miller (Oct 02 2020 at 05:27):

Are you saying RISC-V has multiple mnemonics that have the same encoding?

Kyle Miller (Oct 02 2020 at 05:29):

I'm not sure if you're going to be able to get around having to write a disassembler. At least in Lean you'll be able to prove the disassembler is a right inverse to the assembler to verify you probably did it right!

SnowFox (Oct 02 2020 at 05:29):

The mnemonics for RV{32,64,128}I are shared despite being incompatible due to the differing integer lengths; but RVC, the compressed extension has several mnemonics which don't exist under different XLENs, so they share the word/encoding.

SnowFox (Oct 02 2020 at 05:30):

All uncompressed mnemonics are XLEN independent. Only the compressed opcodes depend on XLEN.

Kyle Miller (Oct 02 2020 at 05:30):

Just out of curiosity, are the compressed extensions like ARM Thumb instructions?

SnowFox (Oct 02 2020 at 05:31):

Er, not mnemonics in the sense of addi, sub, etc. These are the opcodes which determine the set of instructions by format. I.e. addi and slli are both under the op_imm opcode.

SnowFox (Oct 02 2020 at 05:32):

I don't know the details of ARM Thumb instructions but I do know they differ. IIUC Thumb instructions are necessary, whereas for RISC-V the compressed instructions are only a hot subset of the uncompressed.

Kyle Miller (Oct 02 2020 at 05:34):

So this function is only for encoding the instruction format as an integer? How many of these are there?

SnowFox (Oct 02 2020 at 05:36):

Unless I've miscounted, 28 uncompressed (XLEN independent) and 33 compressed (XLEN dependent). But some of these are reserved or custom, for user defined instruction ranges.

SnowFox (Oct 02 2020 at 05:37):

There are 6 uncompressed + 9 compressed "instruction formats", which are different to the opcodes.

Kyle Miller (Oct 02 2020 at 05:37):

I've never done this, but in principle I think you could do Lean metaprogramming to read in your definition for the assembler and generate an inverse function. I'm not aware of anything like this that already exists, and I'm also new to Lean so take my advice with that in mind

SnowFox (Oct 02 2020 at 05:38):

Page 112 has the RVC opcodes.

SnowFox (Oct 02 2020 at 05:39):

Page 129 has the standard RVI opcode map.

Kyle Miller (Oct 02 2020 at 05:39):

If you have a nice data format to represent all the instruction types, you might also consider doing code generation, either within Lean or in another language. (A number assemblers use a database of opcode and instruction encodings after all.)

Kyle Miller (Oct 02 2020 at 05:40):

The proof that these are inverses of each other in a suitable sense could probably be mostly automated with tactics

Mario Carneiro (Oct 02 2020 at 05:40):

You should definitely do this with a tactic

SnowFox (Oct 02 2020 at 05:40):

Yes, this would be ideal. I considered it too but didn't know how to approach this.

Kyle Miller (Oct 02 2020 at 05:41):

@Mario Carneiro Define the function itself with a tactic?

Mario Carneiro (Oct 02 2020 at 05:41):

You can calibrate the input to be whatever is easiest and the tactic makes the definitions

SnowFox (Oct 02 2020 at 05:41):

Pages 130 to 136 have the actual instructions.

Mario Carneiro (Oct 02 2020 at 05:43):

I've done disassembly for an ISA before and you get a lot more flexibility to generate the functions and proof producing assembler/disassembler with a custom input format

SnowFox (Oct 02 2020 at 05:44):

I could write code to extract the opcodes from the .tex. Though there may be better formats available. I know rv8 has some opcode maps but they are old. I could manually update these; but at that point I may as well just start from scratch and define the format myself.

Mario Carneiro (Oct 02 2020 at 05:45):

Oh, I was thinking of you manually transcribing the tex (or the part of it you need), possibly with some regex preprocessing first

Mario Carneiro (Oct 02 2020 at 05:46):

If you want to stay up to date with a changing spec it would make sense to parse the .tex but otherwise it's just a one time job

SnowFox (Oct 02 2020 at 05:46):

https://github.com/riscv/riscv-opcodes These should be up to date

Mario Carneiro (Oct 02 2020 at 05:47):

aha, it's already formalized (in python, but still)

Mario Carneiro (Oct 02 2020 at 05:47):

so you could write a python program that spits out a lean file

SnowFox (Oct 02 2020 at 05:48):

I'd like to take the lean route. Metaprogramming, reading in these opcode files.

Kyle Miller (Oct 02 2020 at 05:48):

@Mario Carneiro Did you mean anything specifically by a custom input format? Like, write some top-level command to read in a data file and generate Lean code?

Mario Carneiro (Oct 02 2020 at 05:53):

inductive reg | rs1 | rs2 | rd

inductive instr : Type
| imm (_ : name) (r1 r2 : reg) (bits : nat) : instr
| reg3 (_ : name) (r1 r2 r3 : reg) : instr

meta def make_defs (_ : list instr) : tactic unit := tactic.skip

open reg instr
run_cmd make_defs [
imm ADD rd rs1 12,
reg3 SUB rd rs1 rs2
]


Mario Carneiro (Oct 02 2020 at 05:53):

the details are entirely made up, but something like this

Kyle Miller (Oct 02 2020 at 05:54):

Ah, ok. I guess you could take the Python program in that repository and generate the code that goes into the run_cmd

Mario Carneiro (Oct 02 2020 at 05:54):

the idea is that you have some terse syntax for defining instruction formats tailored for RISC-V, and then make_defs massages it into lean exprs

Mario Carneiro (Oct 02 2020 at 05:56):

and it could do what it likes with the data, for example defining the inductive type of instructions, a function to list byte, a function from list byte, a tactic that parses an instruction, etc etc

SnowFox (Oct 02 2020 at 05:57):

Yes, something like this would be ideal. But I really don't know anything about tactic writing yet.

Mario Carneiro (Oct 02 2020 at 05:58):

You could also write a program (either in python or lean) to autogenerate lean definitions as strings

Mario Carneiro (Oct 02 2020 at 05:59):

or you could do the simpler thing and just write it manually (with lots of find/replace and multicursor editing)

Mario Carneiro (Oct 02 2020 at 06:00):

but I think the effort spent to write the metaprogram would pay off

SnowFox (Oct 02 2020 at 06:01):

That's basically what I already have, though only for the opcode maps, not the addi/slli/add/sub/etc instructions. Though that may have been unnecessarily lower level.

Mario Carneiro (Oct 02 2020 at 06:03):

I don't think opcode maps are the best way to represent the instructions, since it's unnecessarily case-bashy

Kyle Miller (Oct 02 2020 at 06:04):

@Mario Carneiro It should be possible to make it so the tactic reads in the opcode file directly right? I just don't know what kind of file I/O you can do. (The parsing into some data type should be easy enough with the parser combinator library, assuming it's anything like Haskell's)

Mario Carneiro (Oct 02 2020 at 06:04):

Yeah you can do file IO

Mario Carneiro (Oct 02 2020 at 06:05):

I would be inclined to copy and paste into the lean file for stability though

SnowFox (Oct 02 2020 at 06:06):

Or maybe a hash commitment on the Lean side to detect changes.

Mario Carneiro (Oct 02 2020 at 06:07):

well you don't really need to guard things on the lean side because your code will break

SnowFox (Oct 02 2020 at 06:07):

Depends on what changes, but probably.

Mario Carneiro (Oct 02 2020 at 06:07):

but putting everything in the lean file makes it more portable, you don't need to worry about users not being able to read your files the same way

Mario Carneiro (Oct 02 2020 at 06:08):

plus parsing is going to be pretty slow, comparatively speaking

Mario Carneiro (Oct 02 2020 at 06:09):

the approach I showed is basically leveraging the C++ lean parser to do the dirty work

Kyle Miller (Oct 02 2020 at 06:10):

Mario Carneiro said:

but putting everything in the lean file makes it more portable, you don't need to worry about users not being able to read your files the same way

I don't really understand this concern -- if it's a data file in the project, what would be the matter? But it does seem reasonable re-encoding the data file as a Lean file (especially since the repository has a translator with multiple backends already)

SnowFox (Oct 02 2020 at 06:13):

Hmm. Reading over the python, I think it might be decent to emit at least the array to paste into the Lean file.

SnowFox (Oct 02 2020 at 06:13):

Time to learn Python I guess...

Mario Carneiro (Oct 02 2020 at 06:13):

My main experience with loading large files into lean is in the metamath import; running a pure lean parser over the metamath file directly was unusably slow, but translating the file into lean and then loading that was much better

SnowFox (Oct 02 2020 at 06:16):

Regarding registers; I'm using abbreviations for fin 32 for each of xreg freg and vreg. The inductive type mocked above doesn't look right to me.

Mario Carneiro (Oct 02 2020 at 06:16):

Like I said, completely made up

Mario Carneiro (Oct 02 2020 at 06:17):

if you showed me a snippet of your ISA model I could make something a little more accurate

Mario Carneiro (Oct 02 2020 at 06:18):

I have a model of x86 here if it helps

Mario Carneiro (Oct 02 2020 at 06:19):

I haven't done RISC-V before though

SnowFox (Oct 02 2020 at 06:19):

The spec is my model. Page 130-136 list all the instructions, starting with the uncompressed format list for the standard extensions.

fixed

Thanks.

Mario Carneiro (Oct 02 2020 at 06:20):

I mean your lean interpretation of the spec

SnowFox (Oct 02 2020 at 06:23):

Not sure you want that ;)

SnowFox (Oct 02 2020 at 06:28):

Any and all restructuring or Lean tips welcome.

Mario Carneiro (Oct 02 2020 at 06:30):

Here's my hand translation of the first few instructions:

inductive type : Type
| R_type (funct3 : nat) : type
| I_type (funct3 : nat) : type
| B_type (funct3 : nat) : type
| U_type : type
| J_type : type

meta def make_defs (_ : list (name × nat × type)) : tactic unit := tactic.skip

open type
run_cmd make_defs [
(LUI,   0b0110111, U_type),
(AUIPC, 0b0010111, U_type),
(JAL,   0b1101111, J_type),
(JALR,  0b1100111, I_type 0b000),
(BEQ,   0b1100011, B_type 0b001)
]
`

Mario Carneiro (Oct 02 2020 at 06:31):

also this should probably move to the #Program verification stream before everyone else gets bored to tears :grinning:

SnowFox (Oct 02 2020 at 06:35):

My objective is an Equality Saturation enriched RVSDG -> RISC-V optimizing compiler for a Lean-like language enriched with Resourceful Dependent Types. With sensitive data annotated a la resources to taint along data dependencies. I want a language and optimizing compiler for cryptography, such that cryptography can be specified as a clean reference and optimizations may be specified orthogonality and be applied without fear of leaking sensitive data through crypto-unsafe optimizations, like branching on secrets.

SnowFox (Oct 02 2020 at 06:36):

Well, that sounds like a good move. ;)

SnowFox (Oct 02 2020 at 06:39):

Last updated: May 07 2021 at 22:14 UTC