Topic on Talk:Requests for comment/Markdown

John Vandenberg (talkcontribs)

It is not sufficient to support 'markdown'.

There are many flavours, and they are usually incompatible for some edge cases. But on a large document, like a wikipedia page, those edge cases will appear very frequently.

As a result we need to be very clear about which md flavour will be used, and CommonMark is the obvious candidate.

https://github.com/jgm/CommonMark

If we are considering storing markdown, we also need to carefully plan how we will handle new versions the spec. It is no longer under our control, like mediawiki syntax is.

RobLa-WMF (talkcontribs)
There are many flavours [of Markdown], and they are usually incompatible for some edge cases. But on a large document, like a wikipedia page, those edge cases will appear very frequently.

Agreed (and thanks for educating all of us about the CommonMark spec!). We should work out a sensible strategy to live in a world where the phrase "wiki markup" becomes increasingly synonymous with “Markdown”. CommonMark is a very credible specification effort (John McFarlane has a lot of credibility leading this, especially given the ongoing success of pandoc). The R Markdown site explanation of Pandoc Markdown describes many of the interesting complexities of supporting “Markdown”.

In particular, one particular trick we would need to work out is describing the object model for Markdown. Pandoc has a credible implementation of an interoperable object model, though I’m not sure if the implementation is the spec, or if there’s a spec for the object model. Given that the Parsoid team has created a second nearly-complete implementation of MediaWiki markup. My fear is that we’ll fall into the trap of thinking “great, we have new implementation to convert MediaWiki markup to HTML! we can throw the old implementation away!” and then return to the dark ages of “the implementation is the spec”.

Given that many smart people outside of the Wikimedia movement have thought deeply about the conversion of wiki markup to/from HTML, it behooves us to collaborate with that community. My hope is that we start that conversation from a shared spirit of solving some very complicated (and important) issues.

John Vandenberg (talkcontribs)
 
We should work out a sensible strategy to live in a world where the phrase "wiki markup" becomes increasingly synonymous with “Markdown”

IMO that world is one in which the MediaWiki (and wiki) technical community has failed (that includes me), as the world overtook it with a less useful format, and the generic terms 'wiki' and 'markup' have lost their meaning, and the Wikimedia content is stored in a format that is no longer a format that MediaWiki believes and invests in. But I am not seeing that world, yet. Where is it coming from?

Which wikis support Markdown? I only see w:GrokOla (software) (proprietary), w:Gitit (software) and w:PmWiki which mention markdown on w:Comparison of wiki software, and 'Hazel' on w:List of wiki software. If there are more wiki software, we should document them on Wikipedia and/or here.

The ones most of us developers will be aware of are Github, BitBucket and Gitlab

BitBucket only supports Markdown as far as I can see, but Gitlab also has w:RDoc and w:AsciiDoc (both of which are far better than Markdown, in features and in standardisation/specification).

Github wiki supports multiple formats, including markdown, of which mediawiki syntax is one of them, using WikiCloth listed on Alternative_parsers.

How can we avoid that world?

Invest heavily into Markup spec, and into alternative/reference implementations, including providing a saner version of the syntax that cater for the needs of other organisations that consider the security and processing overhead of wikitext to be problems compared to markdown. It should be the default for new MediaWiki installs, and older installs should have a nice tool that converts the crazy unspecified wikitext into 'wikitext-simplified' where possible.

Work with other wiki vendors that might want to make 'markdown' syntax an implicit choice, not clearly identified as markdown syntax, stressing that causes confusion for users.

And work with other wiki vendors to increase compatibility between their syntax and ours, so that 'wikitext' and 'wikitext-simplified' are viable long term formats.

RobLa-WMF (talkcontribs)

You're right, we should be investing more in a clear markup specification. In particular, we need to figure out how to tie that work into the Parsoid/MediaWiki DOM spec work that is underway, as well as the test suite that the Parsoid team has developed over the years.

I started looking at how to make an edit to Markup spec page, and admittedly, I'm not sure what to do with that page. I'm pretty sure that page isn't actively used as a reference, and seen substantive edits for years. Am I correct about this?

Working with other wiki vendors is likely going to mean "make changes that are uncomfortable for those invested in MediaWiki syntax". We could invest heavily in convincing the world that two ticks for italic and three ticks for bold is a superior way of expressing <i>italic</i> and <b>bold</b>, but it's not clear who that helps.

If we're successful, a nice future we can aspire to is having different layers:

  • Wikitext layer (MediaWiki 2.0+ syntax, which looks very familiar to MediaWiki users despite subtle differences).
  • MediaWiki DOM (an object model which is capable of representing Wikitext AND the resulting HTML, and transformations between Wikitext and HTML)
  • HTML rendering

I don't think we should consider the uptake of incompatible syntax a serious failure or setback. Our lives would certainly be easier now if there was a well-defined, fully-interoperable syntax and DOM for wikitext<->HTML. That said, we've got a large and vibrant community of people writing content for our wikis and writing tools that help them.

In order to attract people who are working on incompatible systems, we'll probably have to entertain new ways of thinking about our syntax and our DOM. Thankfully, MediaWiki markup is not a proprietary format with billions of dollars invested in locking people into an investor-controlled ecosystem.

Vojtěch Dostál (talkcontribs)

Because this page was linked from the Tech News, it would be advisable to write a short introduction for dummies like me :). Why is this important, what is the aim and why should we all care. Thanks a million! :-)

Reply to "CommonMark"