Extension talk:StringFunctions#Page name variationsEep² 14:01, 20 August 2007 (UTC)Reply



Juraj, it looks like you've committed a new version of StringFunctions to SVN this week. Unfortunately, you've re-introduced mb_strlen in your len implementation. Why? This function requires users to install a nonstandard php library, and its functionality should be completely handled by mwSplit. Did you find a case where it didn't work right? If so, please let me know so we can fix it properly without reintroducing a reliance on the multibyte extension. --Algorithm 01:59, 3 September 2008 (UTC)Reply

Yes, I did (well, actually I didn't even requested it, but dantman commited my partial changes to the Code). I have already rewritten some other functions as well (though rather not commited them yet). The reason for all thir rewritting is explained in the comments of bug 6455 (I thought you were watching that bug), along with a link to benchmark test. Doesn't the core mediawiki code itself need any mb_* functions whatsoever? Btw, if you decide to search for other implementations of strlen without the mb_strlen, please update the benchmark test. We got to find something as fast as suitable. Otherwise, we'll never get it into the wikimedia cluster.. the worst thing about all this is that current wikipedia's strlen template is much slower than mwSplit. On the other side, wikipedia's strlen template does not process more than a hundred of characters.. -- jsimlo(talk|cont) 14:01, 3 September 2008 (UTC)Reply
Something else, I wanted to reach you about.. What about actually working with strip markers? We could retreive contents of strips from the parser and work with them instead of adding 1 char for each such strip? The strlen would return much better results for <nowiki> tags, which are not that uncommon/infrequent.. While I commit my latest changes, I would love you to look into retreival of the markers' contents, if you're interested. -- jsimlo(talk|cont) 14:06, 3 September 2008 (UTC)Reply
My opinion regarding the expansion of strip markers has not changed. As for the rest: by all means, if optimization is required to implement these functions, then it should certainly be done. If we need multibyte, so be it. Some suggestions for implementation:
  • First, make sure that the MW_PARSER_VERSION check doesn't break current versions. Jlerner made a useful suggestion in the extension talk page that I believe should be implemented.
  • Second, if we are returning to the multibyte functions, it would be best to abandon any pretense of using mwSplit. Let's just kill that function entirely. Instead, we should split at the strip markers *only*. Keep treating the markers as having a length of 1, but use mb functions for the rest of it. I'll try coding something up once I'm home from work today.
  • Finally, how closely would you prefer we collaborate on this? If you like, we could set up some kind of communication channel via IRC in the evenings. Keep in mind that I'm on Pacific time (UTC-7), so this may not be feasible. --Algorithm 19:09, 3 September 2008 (UTC)Reply
Well, I have already written some implementation(s) of other functions, but it kinda does not matter.. We shall try to find and benchmark several types of approach to all the functions anyway. Therefore, instead of direct collaboration (in the meaning of communication), I would like to suggest that the both of us will try to implement/optimize some of the functions on our own and then we compare the results in benchmark tests. This is what I have done with strlen, although I was all alone there (see Bench script). More brains might give better results.. -- jsimlo(talk|cont) 02:25, 4 September 2008 (UTC)Reply

Enhance StringFunctions --jdpond 20:14, 31 August 2009 (UTC)


What do you think of the idea of adding the ability to use escaped characters (e.g. newline, "\n") as part of StringFunctions. Would be glad to do the work and commit.


Downside would be that backslashes in current wiki code would be interpreted as escaped characters (probably not a problem).

Well, but the downside is too big to be ignored. You might consider adding new separate functions that would work with escapes, but we must not try to teach common users to escape the escapes. No ordinary user is going to get it, trust me, I already tried that elsewhere.. -- jsimlo(talk|cont) 21:58, 31 August 2009 (UTC)Reply