Topic on Extension talk:Scribunto

Alternative to mw.text.split?

5
Summary by FeRDNYC

Worrying about the performance of mw.text.split is unnecessary microoptimization, for most purposes.

Jonathan3 (talkcontribs)

I just want to be able to split comma-separated Cargo list fields. The notes on mw.text.split at Extension:Scribunto/Lua_reference_manual says "this function can be over 60 times slower than a reimplementation that is not Unicode-aware". I modified something from the internet. Would it make any noticeable speed difference - does it look like it will work all right? - are there better ways? It seems to work but I don't know Lua yet.

-- Source: http://lua-users.org/wiki/MakingLuaLikePhp (modified)
-- Credit: http://richard.warburton.it/
function explode(str)
    local pos,arr = 0,{}
    for st,sp in function() return string.find(str,',',pos,true) end do
        table.insert(arr,string.sub(str,pos,st-1))
        pos = sp + 1
    end
    table.insert(arr,string.sub(str,pos))
    return arr
end
Dinoguy1000 (talkcontribs)

You shouldn't worry about Lua performance until you have to; if the code you posted finishes in 10 microseconds on a typical list, mw.text.split will "only" take ~600 microseconds - i.e. still less than a millisecond. Even when you do have to worry about performance (e.g. page parses are consistently taking more than a second), you'll probably find the largest gains elsewhere, not in this microopimization.

Jonathan3 (talkcontribs)

Thanks :-)

FeRDNYC (talkcontribs)

@Jonathan3 And if you were going to use any accelerated, non-Unicode-aware reimplementation of mw.text.split, the sensible one to use would be the one that's provided right in the documentation you linked to. When the wheel has already been invented twice, why reinvent it a third time?

I agree 1000% with @Dinoguy1000 that fretting over such microoptimizations is misguided and unnecessary. (In fact, IMHO those extensive code blocks containing reimplemented mw.text.sub and mw.text.gsub functions should be removed from the Lua reference manual and relocated... well, either somewhere else, or nowhere at all. At most, a footnote regarding the speed of the functions is the appropriate level of making-users-worry-over-nothing.) But given that they are, for the moment, still there...

Jonathan3 (talkcontribs)

"When the wheel has already been invented twice, why reinvent it a third time?"

Purely because it was shorter, silly as that sounds now :-)

I had a feeling that Lua would be faster than byzantine templates, so had speed on my mind, and "60 times slower" made me ask the question here... so perhaps you are right about editing the reference manual!