-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add markdown formatter / exporter #1976
base: main
Are you sure you want to change the base?
Conversation
Thank you @mayel, I think the general direction is good and I think we can continue exploring it. The testing structure will be very important too. Just a heads up, we will be slow with reviews on our side, since we are focused on Elixir v1.18 and launching Livebook Teams. |
This is awesome @mayel! And thanks for letting me know about this PR :) Having spent some time thinking about this, a few requirements suggestions for discussion/debate. My ideal Markdown export would generate:
I would also recommend generating all of this by default, so tooling can start to rely on these files existing :) |
@mjrusso Thanks for the feedback! The structure of the files and markdown contents should already match that of the html docs. In terms of a single file, that was the intention of the ZIP archive containing all the md docs, so it can easily be downloaded from hexdocs and devs can choose which files/modules they want to add as context rather than always including everything, but now I'm thinking we could add a cli flag to generate either a single file or a ZIP with seperate files, and leave the discussion of what the default should be for later?... I've pushed some WIP I hadn't staged which includes generating an |
Perfect :) I was mostly trying to enumerate my ideal requirements independent of what was already written, just for ease of debate.
In general my preference would be to make Markdown generation (in whatever form we decide) the default, with no other configuration options (other than disabling it if you really don't want it) so tools can rely on a common approach. On the topic of the zip, single file generation, etc.:
Since we can already download a tarball of all docs from hex.pm (the md files, if generated, would be included by default there as well, correct?), I think we can forego the zip archive. Easy enough to get the markdown files from there. (And would these be fetched by default with Thinking through this a bit more, I think we could forego the single-file generation (at least for now). Realistically for AI tooling integration that works we are going to need a server in between that can manage pulling the right chunks of documentation for any given task. The individual md files being produced here provide the right building blocks. (Also, instead of "Download Markdown version", perhaps "View Markdown version", which just links to the index.md file.) |
Ah the downloadable docs from hex.pm had completely slipped me by! It may be useful to also include that link in the doc footers next to the ePub. And yeah all makes sense to me, hoping I find some time to work on it a bit more soon :) |
The epub is included in that ZIP so I'm guessing yes |
OK I'm starting to feel pretty good with the generated output (tested with a bunch of projects), probably missing some things but could use some feedback on the implementation and test coverage :) |
@doc """ | ||
Transform AST into a markdown string. | ||
""" | ||
def to_markdown_string(ast, fun \\ fn _ast, string -> string end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not render the original content instead? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do when the original content is markdown. This was needed for cases where an AST node is created manually, like for type specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In this case maybe we should push the functionality to the retriever, so it adds specs both in text and html format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we use a separate function that knows how to render the specs for a given node with the given format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the autolink functions? They depend on the parsed AST of the extra markdown docs and transform it.
Edit: ah I'm not currently using to_markdown_string
to use that transformed AST for the guides, but would be good to do so IMO...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the markdown links in this case? Would the markdown links be useful for man pages? cc @eksperimental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno. And on the LLM side not sure if links would make a difference, any idea @mjrusso?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mjrusso that working hyperlinks between all .md would be nice to have in general. For man pages, I'm not sure they're required, but they might be useful to be able to extract the links and do some ad-hoc processing. Man pages are often referring to other man pages on the form "name(section)", especially under SEE ALSO in the bottom of a man page.
For the OTP man pages, it's probably more useful to refer to functions using Erlang syntax like keyfind/3
or lists:keyfind/3
. But maybe for modules, we can refer to them using the man page syntax like maps(3erl).
Thanks everyone for the work so far. I believe this is a great direction and, at the same time, it shows we need to some clean up before moving forward:
|
Is this about DRY and not having duplicate logic (which I tried to address by introducing the
Yeah it's generating separate files now, following the same structure/naming as the html ones.
Ah yeah seems so, I can look at that PR to see if there's any approach or piece of code (thinking especially of the templates) that looks better and port them to this one? |
Hi everyone. I woud like to discuss about this more in detail, I think we could open up an issue so we don't divert the conversation from this PR? Doing the formatter I noticed the duplication but also the limitations of the current approach. |
I created a gist with the markdown docs for the Erlang stdlib. I think the results look good, but as mentioned in other comments it could be nice to have links working. I also think that specs/types/callbacks should be inside I also noticed that the I did a quick attempt att fixing markdown_to_man.escript and the generated output looks nice enough: though one can probably spend an infinite amount of time fixing the many many small formatting issues that pop up in various places. |
how are links not working? and yeah it's either having links or formatting there, not sure which is preferable... |
were you going to open an issue @eksperimental? otherwise not sure how to proceed here @josevalim? |
I should also say that we are adding search over HexDocs, which would allow you to search only certain packages for a given term, and submit the filtered results to a LLM. Would that be better than giving the whole docs of a bunch of deps? Which I assume would consume too many tokens? |
Yes, there is downstream work required to effectively use the documentation (at least for LLM consumption), which also happens to look a lot like a search problem. This Livebook is a simple prototype; there's tons of opportunities for improvement but it does work and provide reasonable results. (This happens to use hex2txt to get the docs, but the nice part is that the approach is general and could work with any Markdown as input. I want a standalone app like this that I can run locally that exposes as a Model Content Protocol server, but that's getting off topic :) |
The links in specs are working, but the autolinks in the markdown documentation does not (that is
I'm going to guess that this depends on what it will be used for. For the usecase that @zuiderkwast wants (that is converting to man pages), the formatting is to prefer as there are no links in man pages anyway. Either way it is easy enough for some postprocessing tool to strip links and re-format the specs. |
This is a quick proof of concept of https://elixirforum.com/t/generate-docs-markdown-similar-to-epub/67946