Feature request: automatically generated ids for headers

Non Gamstop Casino UK Games Not On Gamstop Non Gamstop Casino Site Best Non Gamstop Online Casinos UK Non Gamstop Casinos

Feature request: automatically generated ids for headers

Uncategorized

zwol — 2014-09-03T17:49:39-04:00 — #1

One of the big lacunae in Markdown (IMNSHO) is that there's no way to get id="" attributes in the generated HTML, and therefore, no way to use fragment references when linking to the containing document.

A general mechanism for adding id attributes to generated HTML elements seems Too Hard, and any mechanism for adding user-specified attributes to generated elements is going to have backward compatibility headaches. However, id attributes are most useful on headers, and it's possible to generate id attributes algorithmically from the header text, so I'd like to propose that Markdown should do that. A reasonable algorithm might look something like

Apply aggressive Unicode normalization (NFKC + lowercasing) to the text of the header.
Replace all characters not in Unicode categories [L*] and [N*] (that is, that are neither letters nor numbers) with - characters.
Compress all runs of - characters to a single -.

I'm sure this can be bikeshedded, but the important thing is to get implementations consistently doing something, not the exact details.

imsky — 2014-09-03T18:06:23-04:00 — #2

This needs to be optional or at least namespaced. Two cases present problems:

Existing IDs in the DOM, e.g. # Main should not conflict with <div id="main">
Existing IDs in the Markdown can cause unnecessary confusion, e.g. # Main ... # Main causes two <h1 id="main"> elements. Github solves this with an additional suffix, though that seems like a kludge.

poke — 2014-09-03T18:12:07-04:00 — #3

I actually think the opposite. If you would be adding automatically generated ids to the standard specification, then there absolutely must be a clear specification of how that’s going to work. So when using links in the document that link to separate sections/headers, those links can be specified exactly without having implementation-specific differences.

zwol — 2014-09-03T20:13:16-04:00 — #4

This needs to be optional or at least namespaced.

I would like not to have to make this feature depend on adding a file-metadata mechanism, which seems like a can of worms in itself.

Existing IDs in the DOM, e.g. # Main should not conflict with <div id="main">

I sort-of feel like resolving this class of conflict should be the responsibility of whatever template engine is embedding Markdown content in a larger structure with existing IDs. Of course the problem with that logic is then the author of the Markdown content can no longer predict what the fragment IDs are going to be.

# Main ... # Main causes two <h1 id="main"> elements. Github solves this with an additional suffix, though that seems like a kludge.

I'm not sure there's any way to deal with that that isn't gonna feel like a kludge to some extent.

zwol — 2014-09-03T20:17:07-04:00 — #5

@poke said: If you would be adding automatically generated ids to the standard specification, then there absolutely must be a clear specification of how that’s going to work.

You misunderstand me -- I definitely think the algorithm should be spelled out in the spec; I just think there are several possible ways the algorithm could work and any one of them would be fine. All it needs to do is generate unique-in-the-document fragment IDs, and I suppose it would be nice (but not strictly necessary, because you can always \-escape) if they were valid CSS #name tokens.

tabatkins — 2014-09-03T21:00:28-04:00 — #6

Markdown Extra allows special attributes to add classes and ids to some constructs, notably headers. It looks like:

I'm a header
============ {#the-id}

zwol — 2014-09-03T21:31:14-04:00 — #7

That syntax seems as good as any I could come up with (I'd probably have omitted the #, but that's a detail) ... but I do not think it obviates the need for mechanically generated ids in general. It's valuable to have a guarantee that all subheadings will have an id, whether or not it was explicitly specified.

codinghorror — 2014-09-03T21:35:24-04:00 — #8

I agree that whenever you have headers, you pretty much always want an implicit anchor there too.

tabatkins — 2014-09-03T21:43:22-04:00 — #9

@zwol: The reason for the # in there is that it also supports classes, by preceding the word with a .. It's CSS syntax, basically.

My own experience with auto-genning IDs suggests that it's often not a great idea. If you try to gen them from text, you get some terrible anchors, and they're not stable against wording changes; if you gen them from a counter or something similar, they're not stable against reordering or adding/deleting sections above them. You can make it slightly better if you pay attention to the outline generated by the headings, as then sub-headings before you won't matter, but still.

In my spec-generation tool that accepts some Markdown, I auto-gen IDs from text but log a warning that the author should be providing an explicit ID.

If we do end up supporting autogenned IDs on headings, however, I do suggest it be based on the outline level of the heading, like "heading-2-1" for the first h2 after the second h1. As I said above, it makes it slightly more robust against document changes; only ancestors and preceding siblings of ancestors (in the outline tree) can affect your ID, rather than all preceding headings like what you get if you just number them sequentially.

imsky — 2014-09-03T21:54:41-04:00 — #10

Right, and how about Markdown content being loaded into independent containers - now the template engine has to check IDs globally to make sure that "news-3" does not conflict with another "news-3" somewhere else on the page.

StackOverflow and Reddit don't add any attributes to headers. Github does a workaround where it wraps the header text into a link and uses JS to make document fragments (page.html#heading URLs) work:

<h1><a name="user-content-js-conf-slides---2011" class="anchor" href="#js-conf-slides---2011" rel="noreferrer"><span class="octicon octicon-link"></span></a>JS Conf slides - 2011</h1>

In Babelmark 2, fewer parsers add attributes to headers than don't: http://johnmacfarlane.net/babelmark2/?text=%23+Heading

Namespacing or leaving the anchoring behavior up to the template engine are the best ways to ensure that there are no conflicts with existing elements on the page.

imsky — 2014-09-03T21:59:46-04:00 — #11

Stability is a great point, I've seen a few spec documents where structure was preserved simply for the sake of keeping references intact.

Another thing about implicit anchors: users won't know what they are unless you add a linking element of some kind so they can copy the URL. That's what Github does and it works for them, at the cost of broken functionality when JS is disabled.

barraponto — 2014-09-03T22:06:38-04:00 — #12

I think automatically-generated ids would be awesomely useful but should it add some sort of link feature, to make it easier to extract the full url with the appropriate fragment? Github does that, but it seems to me that it should be the responsability of the template engine that is embedding markdown content.

There are some issues with autogenerating ids that are hard to crack: should there be a limit to the ids, as to still be useful as a url fragment? Should the header text be transliterated to eliminate/transform non-ascii characters? I think transliteration rules for several languages are stil under heavy debate...

Instead, I think we should push for browser features for deep-linking to custom fragments, as proposed by Simon in his CSS Fragments spec and already implemented in some browser extensions.

stof — 2014-09-03T22:06:46-04:00 — #13

The functionality is not broken at all without Javascript. The element is not added in Javascript but by the renderer. Hiding it until you hover the title is a matter of approximately 5 lines of CSS

imsky — 2014-09-03T22:10:19-04:00 — #14

Disable JavaScript and visit https://gist.github.com/mythz/957816#track-b

Click on the link icon to test the anchors as well.

This is broken in Chrome 37 and Firefox 32 (with javascript.enabled set to false).

poke — 2014-09-03T22:13:29-04:00 — #15

It is. The point is that the exported HTML for a header looks for example like this:

<h2><a name="user-content-configuration" class="anchor" href="#configuration" aria-hidden="true"><span class="octicon octicon-link"></span></a>Configuration</h2>

As you can see, there is no id, and the name (which works as an anchor) is user-content-configuration while the link obviously goes to just #configuration.

GitHub actually uses a hashchange event listener to see if there is a user-content-foo for a hash #foo and navigates there using JavaScript. It is broken without JavaScript.

$.hashChange(function () {
    var t, e;
    if (location.hash && !document.querySelector(':target'))
        return t = 'user-content-' + location.hash.slice(1),
            e = document.getElementsByName(t),
            $(e).scrollTo()
})

imsky — 2014-09-03T22:31:25-04:00 — #16

CSS Fragments would be a cool feature for developers, but not so great for documents. Consider this:

http://example.com/lorem.html#css(.content:nth-child(2))

There's nothing semantic communicated in the link at all.

That said, deep linking of some kind would be very useful. Perhaps a fuzzy search query can be helpful here, for example:

<h1>Production</h1>
...
<h1>Models</h1>
<h2>Currently in production</h2>
<h3>Trucks</h3>
<h3>Buses</h3>

A search "deep link" for Production would look like this: page.html#@production, implying /^production$/i.
A link for Currently in production would look like this: page.html#@models/*production, implying /^.*production$/i under the element matched by /^models$/i.
Finally, a link for Buses would look like this: page.html#@models/*production/buses or like this: page.html#@models/*/buses.

The above approach has its issues, but it could be used to provide stability without requiring precision.