Extension:ChessBrowser/PGN schema

Description of standard

PHP parserEdit

  • Proof of concept parser should be converted to it's own class.
    • Existing "else if" blocks can be made into sub parsers.
  • Import format for parser deviates from 8.2.1. The standard implies that moves can be directly adjacent, e.g. Re5Qd6Kc3. Import format for this parser requires at least one space between adjacent symbol tokens.

Developer notesEdit

  • PHP should take import format and deliver export format.
  • JavaScript can be simplified by only accepting export format PGN. As the JavaScript is delivered by the extension, it should only take input that the PHP places on the page.
    • Parsing output can probably be done token by token without regexes
    • If extension is later expanded to resolve phab:T239438, additional PHP function can take the export format and translate data for JavaScript

HTMLEdit

The html for the display is currently built by the front-end (javascript), and somewhat controlled by the "config", e.g., if the config stipulates "delay", front end does not display "faster/slower" buttons.

Discuss:

  • Should the back-end (php) provide the html?
  • Is the current html produced by the script good UX?
  • Is it possible to get UX feedback?

CSSEdit

ideally, it should be project override-able, and provided by the extension.

JavaScript parserEdit

Written by Kipod, but will need modified as heavy parsing is moved to PHP. Parsing in JS may eventually be made obsolete (See phab:T239438).

RFC: Proposal for export format from server to front-end scriptEdit

This is loosely based on the artifacts generated by calling "analyzePgn()" on the javascript viewer.

Data will be passed from PHP to JavaScript by way of a JSON object. Some config values may be taken as user input by way of XML attributes in the wikitext, for example <pgn config=...>...</pgn> or <pgn title=...>...</pgn>. The PHP parser will output a ‎<div>...‎</div> with an attribute, config="...", whose value is the JSON object. The JavaScript module can then retrieve this JSON object and perform the necessary client-side operations. The JavaScript gadget which the extension's module is based on supports multiple games on the same page. In these cases, the JSON could contain an array of "games" instead of a single game, with a single game being a special case where the array has a length of 1.

The content of this JSON object is as follows:

{
"metadata": [ {"Title":"Value"}...],
"pieces": [{"piece":"Q", "color":"d"},...],
"boards": ["superfen0", "superfen1", ...],
"san": [ "1", "e4", "e5", "2", "e1",...],
"comments": { "7": "bla", "32": "bla lba" }
}
Metadata

The metadata field contains the tag pairs from the PGN input. They must include the seven tag roster (spec 8.1.1) and may include arbitrarily many additional tag pairs.

Pieces

The pieces field provides enough data to draw each piece, i.e., type and color.

Boards

The boards field contains a sequence of board states. One entry of the array, a board or superFEN, would be like a FEN string. Unlike FEN which does not keep track of where a piece came from, a superFEN would associate a unique piece (specified in the pieces field) with its position on the board. SuperFEN would still maintain other features of FEN such as: sequences of empty squares are condensed to an integer and each piece is represented by a character. It could be enhanced by not using line breaks at all or describing a single line of 64 entries.

Of course, this is but one option. the important thing is that each "board" will carry enough information to know which pieces (indexes in the "pieces" array) are in the game, and which square each of them occupies.

SAN

The san field describes the game in w:Standard Algebraic Notation as an array of PGN integer and symbol tokens (spec 7). As an alternative, the PHP could generate the SAN as HTML rather than passing the information to the JavaScript to be drawn.

Comments

The comments field contains a dictionary of comments (spec 5) where the key is the ply and the value is the comment.

Rationale

By using consistent individual pieces, very little logic is required to draw the board (tell all the pieces in this board to be in their intended places, and all those which are not in this board, to hide. this logic is identical or similar to existing script).

CommentsEdit

  • Note: Full pgn standard allows foc "comments" interspersed among the notation. Current javascript does not supports nesting, and does not support alternatives (i.e., the user can not see the actual positions for algebraic notations in comments). This limitation is mainly because of the parsing. If the php-based pgn parser supports nesting and alternatives, it should be fairly straightforward to augment the data structure, and write a contract which will make it fairly easy for the front-end to show alternatives too. -- kipod
  • In general this seems reasonable. We shouldn't pass user input straight into the output config object for security reasons. Since PGN operates on the level of tokens, it's better to specify the SAN as a token sequence rather than as chopped up SAN. This means that the JavaScript only needs to figure out what to do with a token of that type since it's guaranteed a token. If it were given something like "1.e4", that's technically 3 tokens that need to be parsed, whereas "1","e4" is an integer (move) token, and "e4" is a symbol (san) token. I'm interested in the superFEN idea, but wonder what it buys us that FEN doesn't. If we have a sequence of FEN positions, could we not reverse engineer what piece needs moved? It seems that our job is made slightly easier because of the SAN data: we already know what movement gets us from one FEN state to another. Plus we get the data for the FEN tab for free. If we did go the superFEN route, we'd need to come up with a 36 character set to specify each individual piece in the superFEN and associate it with the pieces array. Not hard, but it would differ from FEN's, and we'd need to convert it to FEN or drop support for giving FEN for any board position. Wugapodes (talk) 04:06, 1 December 2019 (UTC)
    i played a bit with this idea (i.e., back-end passes the "parsing" result as array of FENs). it _is_ doable - consecutive array of FEN indeed contains enough information, but one has to ask, what is the value? admittedly, converting raw FENs to the actual data needed is somewhat lighter work than simply parsing the PGN, but the diff is not as impressive as one might think. maybe going from ~300 lines of code to 100+ lines. is it worth it?
    i was thinking that one of the advantages of splitting the work is that this will basically allow the same "front-end" to be used with quite a few other games similar to chess, such as, say, shogi, checkers, and other turn-base board games where the board is a matrix of "squares", and position is governed by "file/row" combo. however, if we take the "give me a series of fen" approach, i am not at all sure this has any advantage over "give me algebraic notation" approach, i.e., what we actually do now.
    maybe the best approach is something like "give me augmented PGN" (or rather augmented ASN): the main challenge with working with ASN is that "e4" does not tell you which piece moved to e4: you know it's a pawn (no [RNBQK] prefix), but you have to work out which pawn is it. if the backend will pass instead "e2e4", i.e. state origin square explicitly, the "analyze" part becomes a breeze - actually, easier than working with consecutive FENs, and less data.
    if we want to also output the FEN (TBH, this is not a hard requirement, and we can simply drop the "fen" tab, which has dubious value anyway), the "board-to-fen" routine is short and sweet: this is how current boardToFen() looks like:
    	function boardToFen(board) {
    		var res = [],
    		len = function(s) { return s.length; };
     
    		for (var r = 0; r < 8; r++) {
    			var row = '';
    			for (var f = 0; f < 8; f++)  
    				row += board[bindex(f, r)] ? board[bindex(f, r)].fen() : ' ';
    			res.push(row.replace(/(\s+)/g, len));
    		}
    		return res.reverse().join('/'); // fen begins with row 8 file a - go figure...
    	}
    
    so saving us the need to generate FEN is no big deal. of course, this is not a real FEN - it only contains the part that describes the board, not the remainder, which tells you whose turn it is, who can still castle, etc., but the current script does not give this information, and nobody complained so far, and if we go with "series of FEN", the script will have to strip this remainder anyway... peace - קיפודנחש (talk) 23:16, 3 December 2019 (UTC)
    So I would argue that we combine these ideas. I think the "boards" field should just be FEN. Since FEN is just a description of the matrix and piece locations within it, the idea can easily be extended to matrix-based games like shogi and checkers without a problem. The "SAN" field would in fact not be SAN, instead it would be the explicit origin-target pair movement that gets us from one board state to another. That way the JavaScript can be very general: it takes the FEN and draws the board; user clicks "next" and the javascript chooses the object at the origin (whatever piece that is) and moves it to the target; this pattern can be used for any game played on a matrix board with piece movement. I would say we still want to keep the boardstates being passed to the javascript because it makes jumping to board states easy: if someone clicks "17.Nf6+" from the game notation tab, the JavaScript just needs to get the board state and draw it. Moving forward and backward from that new location would work the same way as before: take piece at origin, move to target. Wugapodes (talk) 19:26, 5 December 2019 (UTC)

prototype/demoEdit

so i started playing with it, and built a small lua+jscript prototype.

it is made of the following components:

  • Module:Parse-pgn - a generic lua module, that takes a pgn, and returns 4 items:
    1. boards (array of FENs, for the positions in the pgn, first to last
    2. plys, an array of "ply" arrays, described at the top of the module
    3. notations: array of the notations associated with the plys
    4. metadata
  • Module:ChessBowserPrototype export a single function (demo), which takes two parameters: "pgn" and "display move" if display move is missing, last board of the game is assumed.
    lays the notation, board + legends, buttons. to be used by the animation script, including a "data" attribute to be used by the display and animation script.
    to support readers with javascript disabled, lays out the pieces in the "display move" state.
  • Template:ChessBrowserProto/styles.css a templatestyle page to be used with the animator
  • User:קיפודנחש/chess-animator.js the animation script. expects all the layout (board, buttons, notations, legends) to be in place when it starts.
    responds to:
    1. goto start, goto end, advance, retreat, flip (turn board 180), autoplay/pause, faster, slower.
    2. click on notation to goto this state
limitations
  • pgn parser does not allow comments

to see the demoEdit

i18nEdit

The server can preload the different strings used by the front-end as mw.messages entries.

This will allow projects to stuff the translations "manually" for languages whose translations are not yet integrated with the extension, or to override specific strings.

strings that can be translated:

  • hints ("title=") for buttons
  • LEGENDS: allow replacement of file legends abcdefgh (e.g., for Hebrew, אבגדהוזח), allow replacement of row legends 12345678 (e.g., for Arabic ۱۲۳۴۵۶۷۸۹۰)
  • NOTATIONS: notations are made of move #, piece, row, col, and some other characters (e.g., "x" for capture). all those should allow replacement.
    Q: for the col/row, is it always same as the "legends"?