Any idea why that happens? I get it after long running queries. Sample: quarry:query/86096. If I happen to have the window open, I sometimes see the actual results before, meaning the query was successful. Usually I forget to export it before it disappears.
Talk:Quarry
Hello! it seams to be an internal bug in Quarry. Could you open a bugreport on Phabricator for it? Thanks!
Just had the same problem at query 86864. Is there a bug report about it by now? I could not find one with that error so probably not (please link it here). I refreshed the page and then submitted the query again, hopefully it works now.
No, I might be mistaken, but frequently nothing happens once one adds a problem to phab. Just creates cost at WMF and work for bug sorters.
It wouldn't be an issue or unexpected if that frequently happens but it's the most common thing that happens and even for major problems, including major issues open for over a decade. Changing that is I think step 1 and I made a concrete proposal for that here (and several more that are linked there).
Hi there,
I've started trying out Quarry recently, and I really like the service. It's focused, snappy, and makes results easy to link to. However, I've hit a couple limitations in what I want to do.
I figured from the start I'll ultimately need to code up something more complex at Toolforge, but I realized a couple things would allow for more heavy-lifting with Quarry:
- Is there any reason the watchlist table is entirely redacted, instead of just user fields being blanked? I guess this one's more a general question about the DB replicas
- How exactly is the result-set data stored from successful queries? And would it be possible to provide it through SQL, even temporarily in a cache somehow? Maybe similarly to the ToolDB databases? My thinking is that could allow decomposing queries, then joining or filtering their result-sets, all asynchronously through Quarry.
I can fill out feature request tickets at Phabricator, but I thought I'd ask here first in case I'm missing something obvious.
> Is there any reason the watchlist table is entirely redacted
T59617: Make watchlist table available as curated foo_p.watchlist_count on labsdb
> How exactly is the result-set data stored from successful queries?
https://github.com/toolforge/quarry/blob/main/quarry/web/results.py
Perfect, that answers my questions exactly. I'll look into it further and maybe I can contribute some on the software end.
Is there a character count for Query names? Because there is a new editor, Yesh0305, who is writing ridiculously long query names and the table at https://quarry.wmcloud.org/query/runs/all gets all out-of-shape. I've posted to their talk page but I don't think they even realize that they have a talk page. I've looked at their global contributions to reach out to them on their home Wikipedia (which I think is tewiki) but they had none so they must use a different username on Quarry. Maybe there could be a reasonable character limit on names, like 20-30 characters. What do you think?
You mean https://quarry.wmcloud.org/Yesh0305 ? @User:Yesh0305
I don't think name such as "Compare each top editor's total edits against the overall average: How much more than average as percentage" is problematic. It's actually fairly descriptive as name.
Personally, I'm either too lazy or try to keep them short because it becomes the download name, but in principle, in a list of queries, the above can be sensible.
It says the query was stopped but I did not stop it.
Bug report: https://phabricator.wikimedia.org/T377010 I can't run any queries because they get stopped!
I had that too. I assumed a dbadmin stopped it as it was running for a long time.
That could be the case. It could also be because there were issues with the database or because some limit was hit. I think at a minimum it should display some error message / info. One of the queries did run through now and with the bug report above I guess this is solved here.
From locally running python, what's the best way to run a query and download the result?
Supposedly Manual:Pywikibot/MySQL can't work with Quarry, except from toolserver.
There has never been support for Quarry in Pywikibot, but recently support for Wikimedia Superset was added. Use SupersetPageGenerator
in code or -supersetquery
from command line.
Interesting suggestion: I should try to figure out how to get Superset to work. The access to create new datasets seems to be limited.
Is there a way to trigger the update of query from python and then load the result of the most recent run without knowing the run number?
According to https://quarry.wmcloud.org/query/71599, the English Wikipedia deleted 440,817 pages in 2022 and only 54,216 pages in 2023. Does anyone know of a possible explanation for this?
I think COUNT(log_namespace = 0)
is incorrect. When I reproduce the stats using:
SELECT LEFT(log_timestamp, 4) AS year, COUNT(*) FROM logging_logindex WHERE log_namespace = 0 AND log_type = 'delete' AND log_action = 'delete' GROUP BY LEFT(log_timestamp, 4);
I get 89,872 deleted main space pages in 2022 and 81,099 in 2023. For even namespaces (log_namespace % 2 = 0
), it's 440,817 and 391,807.
Which is more efficient?
- AND NOT ( lt_title = "ABC" ) AND NOT ( lt_title = "XYZ")
- AND NOT in ( "ABC", "XYZ")
- AND lt_title <> "ABC" AND lt_title <> "XYZ"
Agree that none is ideal.
As the first thing, the query engine builds a query plan, then executes the query according to the plan. You can check if the query plan is always the same (e.g., using Toolforge SQL Optimizer). If it is, there is no difference.
I prefer NOT IN
.
How to specify tables from two different databases (wikidatawiki_p and commonswiki_p)?
I tried
- SELECT * FROM `commonswiki_p`.`pages` LIMIT 1
- USE DATABASE commonswiki_p
- USE commonswiki_p;
to override what's specified in the GUI.
It's not possible since around 2021. See Topic:W6tzj276xib56phf.
They are completely separate DB servers, you cannot make queries across multiple servers in the same query.
Apparently it was possible (see sample in the topic referenced by Matej) but then un-featured.
I found an easier solution, as the gap between Wikidata and Commons is only partial: one table at Commons is updated ( wbc_entity_usage), but not the other (page_props): quarry:query/86040
"Apparently it was possible " Yes, until the infrastructure ran into scaling problems.
Are there any measure in place to keep the databases in sync? The gap mentioned above is minor in percentages (maybe 0.1%), but in absolute numbers 4600 is a lot.
Is there a way to find Commons files that
- use P625 SDC property
- do not transclude c:Module:Coordinates.
Seems we might need the "Pages with maps" category again.
See also: c:Commons:Bots/Work_requests#Add_missing_Template:Location
The enwiki database has been on replag for an entire week now. It should hopefully be fixed in the next week or so.