Topic on Talk:Quarry/Flow

Queries for orphaned pages don't work

30
Mess (talkcontribs)
Achim55 (talkcontribs)

You are right, my queries are killed as well. It's not the first time Quarry has slown down and - surprise!! - there are again a lot of queries that are "running" or queued since some weeks. Needs a clean restart, @Yuvipanda or someone else: please reset it, thanks.

Milimetric (WMF) (talkcontribs)

Hm, @Yuvipanda isn't maintaining this anymore sadly, but I'm watching these pages. I'll go try and figure out how to reset it.

Achim55 (talkcontribs)
Mess (talkcontribs)

User:Milimetric (WMF), may I suggest you to implement a maximum time limit of (just say) 1 hour for queued/hanged queries in order to prevent this overload? I can't figure out why some queries can avoid the regular limit of 30 minutes and can't be killed like any other query when the time period is over.

Milimetric (WMF) (talkcontribs)

Interesting, yeah, I thought there was a solid time limit in place. I'll double check that code if I can but sadly this is very low on my priority list. So far I found out I don't have rights to the Quarry box so I pinged some people to get that. I'll try and help if I can.

Mess (talkcontribs)
Milimetric (WMF) (talkcontribs)

Ok, I got access to the project and restarted the celery workers (the things that run the queries). Let me know if things don't improve and I can try restarting more stuff.

Achim55 (talkcontribs)
Mess (talkcontribs)

@Milimetric (WMF), maybe it should be better a whole cleaning: queries like , or are still running!

Milimetric (WMF) (talkcontribs)

I will now reboot the boxes, maybe that will help. Which editor? The quarry code editor?

Achim55 (talkcontribs)

Oh no, sorry, the editing of this page I had in mind. It differs a lot from wikis' and adding links is not an optimal solution as it isn't made for intuitive use. Dan, thank you for your efforts here!

Milimetric (WMF) (talkcontribs)

Ok, I rebooted all the boxes, and verified that some queries are running ok. I still see the status of the queries that @Mess pointed out as "Running" but that can't be because the box was rebooted. So that must just be invalid state in the db, those aren't running anymore.

Milimetric (WMF) (talkcontribs)

@Achim55, yeah, Flow is a work in progress. For me, I'm a newcomer to wikitext so it's much much easier to work with Flow. If you're interested in working on it, I know the people on the team and they're super nice and some of the smartest people I know. Here's some links if you're interested:

browse the code: https://github.com/wikimedia/mediawiki-extensions-Flow

contribute: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/Flow

The team hangs out in IRC under #wikimedia-collaboration

Mess (talkcontribs)

@Milimetric (WMF), I tried to re-run several times [my query], but nothing happened - and it's the same also for the [other one] re-executed by @Achim55. I'm starting to doubt about the codes of our queries, but they can't be faulty, as they worked regularly in the past.

Achim55 (talkcontribs)

Confirmed. Doesn't work yet.

Mess (talkcontribs)

I'm trying also to use the "LIMIT" clause to reduce the output in [this query test], but it's useless. It's like in some particular cases statements as "NOT IN" (or maybe what follows inside the parentheses) influence negatively the whole query (if I remove that, the query test runs without problems). Anyway, I don't know where to bang my head and solve this crazy puzzle.

Milimetric (WMF) (talkcontribs)

@Mess & @Achim55, I should have looked at your queries earlier. Yes, those NOT IN clauses are very intense to compute over categorylinks and pagelinks tables, those tables have a huge number of records. I would suggest re-writing. Sorry I don't have much time now to help. What I would try to do is re-write the NOT INs as JOINs. This might give the optimizer a better chance to make a plan that runs faster. If that doesn't work, do me a favor and explain in plain English what you're looking for and I can think about it when I have some more time.

Mess (talkcontribs)

@Milimetric (WMF), you're absolutely right. I've recently created two separated queries ( 20065and 20068) as a temporary solution for my task, even if I've already realized there was something "strange" in the query codes for their excessive time-consuming extraction (that we didn't care in the past simply because they didn't stopped themselves like now). I'll work again on the original code to find an ultimate solution, but probably I can carry on also with this alternative.

Mess (talkcontribs)

@Milimetric (WMF): HOORAY! I DID IT! I've found what was bad in my query: in practice, the part where it searched for orphaned redirects was the real "point of failure", as it was badly implemented, so I just fixed it with a better code and now the query 18648 not only works again, but also accomplishes its task in only 3 minutes (a great improvement if compared with the 15 minutes it took previously). So, @Achim55, follow the advices above and try to find the part of code that takes too much time to run (I suggest you to split up the query and execute every single piece separately).

Achim55 (talkcontribs)

Pardon?? "Write better code so you will no longer recognise the malfunction of Quarry." Fine. I think one should fasten the brake a bit more so that some action will become necessary...

Milimetric (WMF) (talkcontribs)

Well, no matter how amazing Quarry is, a statement like select * from revision will always break it on large enough wikis. No tool is substitute for writing good code. Think of writing bad code like throwing lots of kilowatts of energy in the trash. You wouldn't leave a 100W bulb on all day for no reason, so don't write inefficient SQL :)

Achim55 (talkcontribs)

Sorry for having been misunderstood: As I pointed out many months ago queries like "SELECT * FROM page" are regularly not killed but stay "running". That's the point. The "killing mechanism" is broken. In the near future we can celebrate an anniversary, I found a query that "runs" for nearly one year now. I don't know what happens in the background, but it's our experience that quarry gets slower the more queries are "running" for more than one hour instead of being killed.

Milimetric (WMF) (talkcontribs)

Oh, as far as I can tell those queries are killed, the interface just doesn't show that status as "Killed". So there are some bugs but they're not affecting the system performance as far as I can tell. There's a discussion going on about replacing Quarry with a tool developed independent of WMF, so we can take advantage of regular updates. Here's that thread: https://phabricator.wikimedia.org/T169452

Zhuyifei1999 (talkcontribs)

Queries are killed, but the status updating system may break due to various reasons. See phab:project/board/800/ column "Bugs (Endless Query Run)". Unfortunately many of the old logs are lost so I'm unable to debug them. Please file a ticket if you see any such behaviour.

Achim55 (talkcontribs)

@Zhuyifei1999, that's fine so far. So let me ask in a different way: Half a year ago needed 10 ... 12 minutes for one run. Today it needs 25...30 minutes. Why is quarry working extremely slowly now?

Zhuyifei1999 (talkcontribs)

This is not Quarry's problem. The query runs on the wiki replicas and they may get slower due to excess use and more data processing, and for Wiki Replica issues it's best to consult the DBA. However, I may try to optimize the query a bit.

Zhuyifei1999 (talkcontribs)

Never mind. I'd suggest you to ask the DBAs regarding query optimization or wait till quarry switch to the new Wiki Replica servers some time soon (in next next few weeks/months).

Zhuyifei1999 (talkcontribs)

Quarry has switched to the new Replica servers yesterday, which is said to be 5x faster, so you may want to try the query again. Note that 'USE stopthisrun' won't work on the new servers (and idk what even are 'stopthisrun')

Achim55 (talkcontribs)

Thank you very much for notifying. I use non-existing db 'stopthisrun' for quickly killing my query if I see an error after I already had started it.

Reply to "Queries for orphaned pages don't work"