owned this note
owned this note
Published
Linked with GitHub
I was trying to solve [this issue](https://github.com/radare/radare2/issues/8385) but maybe I didn't understand what it was about...
I saw that the doc printed by running ag? was:
`| agf [addr] Show ASCII art graph of given function`
but agf and agft ignored every argument provided. So I thought to implement it to solve that issue, but on telegram @pancake said:
>"It was not a bug. How to make a callgraph and a graph with all the functions?"
But I don't get what he's saying; what does the callgraph has to do with the agf command?
Sorry for the confusion ^^
-------------
dont say sorry, this thing needs some guiding and discusison before going forward with more changes. that's why this pad matters.
related issues
https://github.com/radare/radare2/issues/9716
https://github.com/radare/radare2/issues/9867 <- mainly this
------------
Basically the questions are:
1) Which ag subcommands should be used to do the following types of graphs?
- callgraph (agc)
- bb graph (agf)
- xrefs graph (one node for current function, and add edges to xrefs with contents)
- refs graph (same, but for refs instead of xrefs (forward references))
- data graph (references can be code or data, data graph is like refs one but for non-code)
the import graph is something that we must thnk too.. this is, taking all imports and showing the xrefs to them. which is a global xrefs on all imports.
this makes me think that argment should be a list of offsets not just one, or an expression to glob the flags. like agx sym.imp.*
if we do this glob or list support we can just handle global graphs this way and avoid having to add more commands. but this limits the way we handle that, because imports should be taken from rbin and not from rflags. or it will be a mess in debugger mode
the visual mode have those xrefs and refs graphs with '<' and '>' keys.
**** pancake comment ****
* i would use agx for xrefs graphs
* agr is dupe of agc ?
2) how to specify current function or all functions?
- all functions graphs use to be extremelly costful to generate, even more on big binaries
- not sure if global graphs make sense in ascii art, too big to scroll around in 80x25
**** pancake comment ****
* maybe we should go for using uppercase letters like this:
* agC -> full program callgraph, not just one function.. but one function callgraph is just a refs graph :?
* what if we want refs + xrefs graph?
3) we need to specify the output format (or just use a config var for this):
- interactive ascii art graph -- i would go to suffix 'v' like agcv or agfv
- non-interactive ascii art graph -- just agc, agf, ..
- graphviz output -- we can use a different suffix like 'd' (from dot)
- the graphviz format sucks quite a lot, yfiles supports an xml one which is pretty nice
- json format (agj) --- just append a 'j' at the end like all r2 commands imho
see:
- graph.format: Specify output format for graphs (dot, gml, gmlfcn)
we must review all those graph. variables imho
Implementation details:
* honor scr.interactive for global graphs?
**** cyanpencil ****
> agr is dupe of agc ?
agr calls `r_core_anal_codexrefs` while agc calls `r_core_anal_coderefs`.
I think that the ag? documentation it's wrong/outdated on many commands
>maybe we should go for using uppercase letters like this:
> agC -> full program callgraph, not just one function
I agree. But to be consistent we would also need to change the agJ command (which prints formatted disassembly on json)
yep, uppercase J is misleading and inconsistent, so ok to change agJ too
**** cyanpencil ****
This is the current documentation for ag? (because it is relevant):
```
| Usage: ag[?f] Graphviz/graph code
| ag [addr] output graphviz code (bb at addr and children)
| ag- Reset the current ASCII art graph (see agn, age, agg?)
| aga [addr] idem, but only addresses
| agr[j] [addr] output graphviz call graph of function
| agg display current graph created with agn and age (see also ag-)
| agc[*j] [addr] output graphviz call graph of function
| agC[j] Same as agc -1. full program callgraph
| agd [fcn name] output graphviz code of diffed function
| age[?] title1 title2 Add an edge to the current graph
| agf [addr] Show ASCII art graph of given function
| agg[?] [kdi*] Print graph in ASCII-Art, graphviz, k=v, r2 or visual
| agj [addr] idem, but in JSON format
| agJ [addr] idem, but in JSON format with formatted disassembly (like pdJ)
| agk [addr] idem, but in SDB key-value format
| agl [fcn name] output graphviz code using meta-data
| agn[?] title body Add a node to the current graph
| ags [addr] output simple graphviz call graph of function (only bb offset)
| agt [addr] find paths from current offset to given address
| agv Show function graph in web/png (see graph.web and cmd.graph) or agf for asciiart
```
---
Supposing we apply in practice what pancake wrote above (that is try to find a non-confusing and consistent way of using suffixes to specify the kind of graph and the output format), I tried to list them all to see what the result would be like:
**graphs**
```
- callgraph -> agc
- simple callgraph -> ags (only offsets)
- bb graph -> agf
- xrefs graph -> agx
- refs graph -> agr
- data graph -> aga (not yet implemented?)
- diff graph -> agd
- custom graph -> agg
```
**output formats**
```
- non interactive ascii-> blank (e.g. agf)
- interactive ascii -> v (e.g. agfv)
- graphviz dot -> d (e.g. agfd)
- graphviz w/ metadata -> dm (e.g. agfdm)
- json -> j (e.g. agfj)
- json w/ disassembly -> J (e.g. agfJ)
- gml -> g (e.g. agfg)
- gmlfcn -> gf (e.g. agfgf)
- SDB key-value -> k (e.g. agfk)
- r2 commands -> * (e.g. agf*)
- web/png -> w (e.g. agfw)
```
Here are my questions/doubts:
- What about the (ag-, agn, age, agg) commands to create a custom graph? Should we leave them as they are? (In my opinion we should rename them, but I don't know how many things depend on them)
<pancake> yes, keep them, they are useful and used by some scripts and ppl. do you think they may conflict with something? imho it should be fine to have 'e' and 'n' spare for this
- Command aga and ags are exactly the same in the code, despite their different description. Maybe we should remove one of them? (for this reason I proposed `aga` for the data graph)
<pancake> agree, i prefer aga too
- Could not find any command/code for displaying the data graph... I think it is not implemented yet, in fact inside `r_core_anal_coderefs()` in file canal.c there is a `// TODO: display only code or data refs?`
<pancake> yeah, the idea is that agc is a callgraph, but refs graph includes data references too, like strings and such.
- Not entirely sure why command `agt` starts with ag... what does it have to do with graphs?
<pancake> agree, agt command must be out of ag, maybe use abt because it follows basic blocks paths? or abuse apt (ap is under-used, only to find preludes, we should extend it or something and ap may mean also "anal paths")
- Not every graph type works with every output format type (for example, json with formatted disassembly works only where there is disassembly), so what would be an easy way of dealing with this?
<pancake> no need to show disasm in the json output, te thing of json isto provide a way to process information found in graph by 3rd party ppl, so all graph outputs should have json format
- `r_core_anal_codexrefs()` currently supports only the r2 custom graph commands output...?
yeah, this must be extended
**** cyanpencil update ****
- I found out what commands actually depend on (ag-, agn, age, agg). For example pressing `G` while in the graph view mode calls `r_core_cmd0 (core, "ag-;.dtg*;aggi");`. The other commands which depend on them are pressing `>` or `<` while in graph view, and also command `dtgi` which calls `r_core_cmd0 (core, "ag-;.dtg*;aggi");`. Maybe I was wrong, we shouldn't rename them to avoid breaking any existing things.
<panake> yep, i wouldnt remove those
- Speaking of which, there's also the `dtg` command which displays the debug trace callbacks, and supports only graphviz, custom graph (age, agn) and interactive ascii output formats. Should it follow the same behaviour of the other ag* commands (that is, support all the possible output formats with their suffixes)?
<pancake> maybe just use graph.trace variable to display that info in all graphs instead of having a command for this
- Also, imho, we should at least mention the existence of `dtg` in the ag? documentation (especially if it follows the same behaviour of the other ag* commands). I wasn't aware of this command, I found out about it only by searching in the code for commands that depended on agn/age...
<pancake> agree, go for it!
----
And I totally forgot about the import graph:
> which is a global xrefs on all imports.
- Doesn't this explode very quickly? Running `ii` on a simple binary like /bin/ls gives 117 different imports...
- I think that the most intuitive command for the import graph would be `agi`
----
>what if we want refs + xrefs graph?
I thought about it, but no easy way of defining a command that does this comes to my mind...
---
>honor scr.interactive for global graphs?
I think that it is almost guaranteed that a global graph will not fit into the terminal screen, so I agree that if scr.interactive is set we should default to the interactive view
But this is a problem regarding not only global graphs: for example there are a lot of functions which have a bb graph much bigger than the terminal window, and running `agf` prints garbage.
Another option would be checking before printing any graph in non-interactive ascii if it is bigger than the current window, and if that's the case prompt the user with a `r_cons_yesno()` asking to switch to interactive ascii. *(which already happens in some way: for example running agf on entry0 in /bin/ls prompts `Do you want to print 4433 lines? (y/N)` because the graph is huge)*
However, I have no idea if that would be easy/hard to implement.
---
**How do I propose to solve [#9716](https://github.com/radare/radare2/issues/9716)**
The problem is that agj, when provided without argument, uses core->offset as the argument, so doing `agj @ 0` is equal to doing `agj 0`.
When the argument is 0, it prints the json data for the graph of the functions in the whole range (graph.from, graph.to). If the range was never set by the user (that means both graph.from and graph.to are equal to `UT64_MAX`, or `0xffffffffffffffff`), it prints the json of every function in the file.
*What I think we can do:*
1. If we adopt the convention that global graphs are printed with a capital letter, then we could remove that `UT64_MAX` check: to get the global bb graph in json format, one would use `agFj`. In this way, address 0 would be a special argument only if the graph range has been set by the user.
2. Alternatively, we could change how the graph range is given by the user. We could for example use a graph range only if the user gives two arguments to a graph function, like this:`agj 0x000a00 0x000b00`. In this way, address 0 would never be a special argument, and would be always treated like any other address (imho this is more intuitive and less error prone).
<pancake >problem of using spaces to do that is that rnum_math handle spaces and "1 + 2" is a valid expression, so i would use commas or just a one word glob expression
<thestr4ng3r> I would like to fix this asap, because it is relevant for Cutter. So, which way should I fix it? Or should cyanpencil do this explicitly because it is part of gsoc?