Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add markdown formatter / exporter #1976

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion lib/ex_doc/autolink.ex
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,13 @@ defmodule ExDoc.Autolink do
if app in config.apps do
path <> ext <> suffix
else
#  TODO: remove this if/when hexdocs.pm starts including .md files
ext = ".html"

config.deps
|> Keyword.get_lazy(app, fn -> base_url <> "#{app}" end)
|> String.trim_trailing("/")
|> Kernel.<>("/" <> path <> ".html" <> suffix)
|> Kernel.<>("/" <> path <> ext <> suffix)
end
else
path <> ext <> suffix
Expand Down
66 changes: 65 additions & 1 deletion lib/ex_doc/doc_ast.ex
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ defmodule ExDoc.DocAST do
meta param source track wbr)a

@doc """
Transform AST into string.
Transform AST into an HTML string.
"""
def to_string(ast, fun \\ fn _ast, string -> string end)

Expand Down Expand Up @@ -64,6 +64,70 @@ defmodule ExDoc.DocAST do
Enum.map(attrs, fn {key, val} -> " #{key}=\"#{val}\"" end)
end

@doc """
Transform AST into a markdown string.
"""
def to_markdown_string(ast, fun \\ fn _ast, string -> string end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not render the original content instead? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do when the original content is markdown. This was needed for cases where an AST node is created manually, like for type specs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In this case maybe we should push the functionality to the retriever, so it adds specs both in text and html format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we use a separate function that knows how to render the specs for a given node with the given format.

Copy link
Contributor Author

@mayel mayel Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the autolink functions? They depend on the parsed AST of the extra markdown docs and transform it.

Edit: ah I'm not currently using to_markdown_string to use that transformed AST for the guides, but would be good to do so IMO...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the markdown links in this case? Would the markdown links be useful for man pages? cc @eksperimental

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno. And on the LLM side not sure if links would make a difference, any idea @mjrusso?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mjrusso that working hyperlinks between all .md would be nice to have in general. For man pages, I'm not sure they're required, but they might be useful to be able to extract the links and do some ad-hoc processing. Man pages are often referring to other man pages on the form "name(section)", especially under SEE ALSO in the bottom of a man page.

For the OTP man pages, it's probably more useful to refer to functions using Erlang syntax like keyfind/3 or lists:keyfind/3. But maybe for modules, we can refer to them using the man page syntax like maps(3erl).


def to_markdown_string(binary, _fun) when is_binary(binary) do
ExDoc.Utils.h(binary)
end

def to_markdown_string(list, fun) when is_list(list) do
result = Enum.map_join(list, "", &to_markdown_string(&1, fun))
fun.(list, result)
end

def to_markdown_string({:comment, _attrs, inner, _meta} = ast, fun) do
fun.(ast, "<!--#{inner}-->")
end

def to_markdown_string({:code, _attrs, inner, _meta} = ast, fun) do
result = """
```
#{inner}
```
"""

fun.(ast, result)
end

def to_markdown_string({:a, attrs, inner, _meta} = ast, fun) do
result = "[#{inner}](#{attrs[:href]})"
fun.(ast, result)
end

def to_markdown_string({:hr, _attrs, _inner, _meta} = ast, fun) do
result = "\n\n---\n\n"
fun.(ast, result)
end

def to_markdown_string({tag, _attrs, _inner, _meta} = ast, fun) when tag in [:p, :br] do
result = "\n\n"
fun.(ast, result)
end

def to_markdown_string({:img, attrs, _inner, _meta} = ast, fun) do
result = "![#{attrs[:alt]}](#{attrs[:src]} \"#{attrs[:title]}\")"
fun.(ast, result)
end

# ignoring these: area base col command embed input keygen link meta param source track wbr
def to_markdown_string({tag, _attrs, _inner, _meta} = ast, fun) when tag in @void_elements do
result = ""
fun.(ast, result)
end

def to_markdown_string({_tag, _attrs, inner, %{verbatim: true}} = ast, fun) do
result = Enum.join(inner, "")
fun.(ast, result)
end

def to_markdown_string({_tag, _attrs, inner, _meta} = ast, fun) do
result = to_string(inner, fun)
fun.(ast, result)
end

## parse markdown

defp parse_markdown(markdown, opts) do
Expand Down
247 changes: 247 additions & 0 deletions lib/ex_doc/formatter.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
defmodule ExDoc.Formatter do
@moduledoc false

alias ExDoc.{Markdown, GroupMatcher, Utils}

@doc """
Autolinks and renders all docs.
"""
def render_all(project_nodes, filtered_modules, ext, config, opts) do
base = [
apps: config.apps,
deps: config.deps,
ext: ext,
extras: extra_paths(config),
skip_undefined_reference_warnings_on: config.skip_undefined_reference_warnings_on,
skip_code_autolink_to: config.skip_code_autolink_to,
filtered_modules: filtered_modules
]

project_nodes
|> Task.async_stream(
fn node ->
language = node.language

autolink_opts =
[
current_module: node.module,
file: node.moduledoc_file,
line: node.moduledoc_line,
module_id: node.id,
language: language
] ++ base

docs =
for child_node <- node.docs do
id = id(node, child_node)

autolink_opts =
autolink_opts ++
[
id: id,
line: child_node.doc_line,
file: child_node.doc_file,
current_kfa: {child_node.type, child_node.name, child_node.arity}
]

specs = Enum.map(child_node.specs, &language.autolink_spec(&1, autolink_opts))
child_node = %{child_node | specs: specs}
render_doc(child_node, ext, language, autolink_opts, opts)
end

%{
render_doc(node, ext, language, [{:id, node.id} | autolink_opts], opts)
| docs: docs
}
end,
timeout: :infinity
)
|> Enum.map(&elem(&1, 1))
end

defp render_doc(%{doc: nil} = node, _ext, _language, _autolink_opts, _opts),
do: node

defp render_doc(%{doc: doc} = node, ext, language, autolink_opts, opts) do
rendered = autolink_and_render(doc, ext, language, autolink_opts, opts)
%{node | rendered_doc: rendered}
end

defp id(%{id: mod_id}, %{id: "c:" <> id}) do
"c:" <> mod_id <> "." <> id
end

defp id(%{id: mod_id}, %{id: "t:" <> id}) do
"t:" <> mod_id <> "." <> id
end

defp id(%{id: mod_id}, %{id: id}) do
mod_id <> "." <> id
end

defp autolink_and_render(doc, ".md", language, autolink_opts, _opts) do
doc
|> language.autolink_doc(autolink_opts)
|> ExDoc.DocAST.to_markdown_string()
end

defp autolink_and_render(doc, _html_ext, language, autolink_opts, opts) do
doc
|> language.autolink_doc(autolink_opts)
|> ExDoc.DocAST.to_string()
|> ExDoc.DocAST.highlight(language, opts)
end

@doc """
Builds extra nodes by normalizing the config entries.
"""
def build_extras(config, ext) do
groups = config.groups_for_extras

language =
case config.proglang do
:erlang -> ExDoc.Language.Erlang
_ -> ExDoc.Language.Elixir
end

source_url_pattern = config.source_url_pattern

autolink_opts = [
apps: config.apps,
deps: config.deps,
ext: ext,
extras: extra_paths(config),
language: language,
skip_undefined_reference_warnings_on: config.skip_undefined_reference_warnings_on,
skip_code_autolink_to: config.skip_code_autolink_to
]

extras =
config.extras
|> Task.async_stream(
&build_extra(&1, groups, ext, language, autolink_opts, source_url_pattern),
timeout: :infinity
)
|> Enum.map(&elem(&1, 1))

ids_count = Enum.reduce(extras, %{}, &Map.update(&2, &1.id, 1, fn c -> c + 1 end))

extras
|> Enum.map_reduce(1, fn extra, idx ->
if ids_count[extra.id] > 1, do: {disambiguate_id(extra, idx), idx + 1}, else: {extra, idx}
end)
|> elem(0)
|> Enum.sort_by(fn extra -> GroupMatcher.index(groups, extra.group) end)
end

defp build_extra(
{input, input_options},
groups,
ext,
language,
autolink_opts,
source_url_pattern
) do
input = to_string(input)
id = input_options[:filename] || input |> filename_to_title() |> Utils.text_to_id()
source_file = input_options[:source] || input
opts = [file: source_file, line: 1]

{source, ast} =
case extension_name(input) do
extension when extension in ["", ".txt"] ->
source = File.read!(input)
ast = [{:pre, [], "\n" <> source, %{}}]
{source, ast}

extension when extension in [".md", ".livemd", ".cheatmd"] ->
source = File.read!(input)

ast =
source
|> Markdown.to_ast(opts)
|> sectionize(extension)

{source, ast}

_ ->
raise ArgumentError,
"file extension not recognized, allowed extension is either .cheatmd, .livemd, .md, .txt or no extension"
end

{title_ast, ast} =
case ExDoc.DocAST.extract_title(ast) do
{:ok, title_ast, ast} -> {title_ast, ast}
:error -> {nil, ast}
end

title_text = title_ast && ExDoc.DocAST.text_from_ast(title_ast)
title_html = title_ast && ExDoc.DocAST.to_string(title_ast)
content_html = autolink_and_render(ast, ext, language, [file: input] ++ autolink_opts, opts)

group = GroupMatcher.match_extra(groups, input)
title = input_options[:title] || title_text || filename_to_title(input)

source_path = source_file |> Path.relative_to(File.cwd!()) |> String.replace_leading("./", "")
source_url = Utils.source_url_pattern(source_url_pattern, source_path, 1)

%{
source: source,
content: content_html,
group: group,
id: id,
source_path: source_path,
source_url: source_url,
title: title,
title_content: title_html || title
}
end

defp build_extra(input, groups, ext, language, autolink_opts, source_url_pattern) do
build_extra({input, []}, groups, ext, language, autolink_opts, source_url_pattern)
end

defp extra_paths(config) do
Map.new(config.extras, fn
path when is_binary(path) ->
base = Path.basename(path)
{base, Utils.text_to_id(Path.rootname(base))}

{path, opts} ->
base = path |> to_string() |> Path.basename()
{base, opts[:filename] || Utils.text_to_id(Path.rootname(base))}
end)
end

defp disambiguate_id(extra, discriminator) do
Map.put(extra, :id, "#{extra.id}-#{discriminator}")
end

defp sectionize(ast, ".cheatmd") do
ExDoc.DocAST.sectionize(ast, fn
{:h2, _, _, _} -> true
{:h3, _, _, _} -> true
_ -> false
end)
end

defp sectionize(ast, _), do: ast

defp filename_to_title(input) do
input |> Path.basename() |> Path.rootname()
end

def filter_list(:module, nodes) do
Enum.filter(nodes, &(&1.type != :task))
end

def filter_list(type, nodes) do
Enum.filter(nodes, &(&1.type == type))
end

def extension_name(input) do
input
|> Path.extname()
|> String.downcase()
end
end
11 changes: 7 additions & 4 deletions lib/ex_doc/formatter/epub.ex
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ defmodule ExDoc.Formatter.EPUB do
@mimetype "application/epub+zip"
@assets_dir "OEBPS/assets"
alias __MODULE__.{Assets, Templates}
alias ExDoc.Formatter
alias ExDoc.Formatter.HTML
alias ExDoc.Utils

Expand All @@ -17,16 +18,18 @@ defmodule ExDoc.Formatter.EPUB do
File.mkdir_p!(Path.join(config.output, "OEBPS"))

project_nodes =
HTML.render_all(project_nodes, filtered_modules, ".xhtml", config, highlight_tag: "samp")
Formatter.render_all(project_nodes, filtered_modules, ".xhtml", config,
highlight_tag: "samp"
)

nodes_map = %{
modules: HTML.filter_list(:module, project_nodes),
tasks: HTML.filter_list(:task, project_nodes)
modules: Formatter.filter_list(:module, project_nodes),
tasks: Formatter.filter_list(:task, project_nodes)
}

extras =
config
|> HTML.build_extras(".xhtml")
|> Formatter.build_extras(".xhtml")
|> Enum.chunk_by(& &1.group)
|> Enum.map(&{hd(&1).group, &1})

Expand Down
Loading
Loading