User Details
- User Since
- Oct 8 2014, 5:48 PM (497 w, 2 d)
- Availability
- Available
- IRC Nick
- Milimetric
- LDAP User
- Milimetric
- MediaWiki User
- Milimetric (WMF) [ Global Accounts ]
Yesterday
It would be cool to do a quick spike into Scalar and the customization we'd need there. Abstain as a voter here, I like all the options just fine and I have bad aesthetics when it comes to reading docs because I just start hacking and see what happens :)
+1 for Option 2. For what it's worth, when we initially put up the endpoint docs on wikitech we were just doing so while we waited for a better end user experience than the swagger UI afforded us. I especially like the integration with wikitech described in option 2 (the discovery pages that would lead wiki users to the docs)
Tue, Apr 16
+1, SSR is kind of a pain if done in fancier ways, but in this way you get a lot for free and it helps even reduce code. As a bonus the user gets a great experience.
Mon, Apr 15
I've broken this down into subtasks but I'm keeping it as something between an epic and an actual task. It's coordinating and has all the acceptance criteria, it was just too big. So I'll leave the other two subtasks on the boards while I'm on vacation and put this in paused. This can be resumed whenever you'd like to continue work on coordinating and deployment.
Thu, Apr 11
Approved, welcome back Andy :)
Approved
Thu, Apr 4
Wed, Apr 3
Mon, Apr 1
Fri, Mar 29
I found a candidate bug. The script used to ask for the year and month, and after the change it asks for the day. generate_druid_unique_devices_per_domain_daily_aggregated_monthly.hql seems to have been adapted to give the correct result, but evidence to the contrary, the druid output seems to be only one day. Running now to prove or disprove.
Merge request 582 seems to have changed how we do this monthly druid segment aggregation, so the answer must be around here. Again I checked the new source table, now Iceberg (wmf_readership.unique_devices_per_project_family_daily) and again that seems to have data for all of January, for example.
Looked at this a bit today.
Mon, Mar 25
@VirginiaPoundstone: Looks like Giuseppe patched varnish to send more requestctls, so maybe that completely or partially solves the problem. I'd have to look through the data to see. I'm going to do a good job focusing and only do that if you put it in the sprint :) (should take no more than an hour, but it's probably not like a few seconds if I want to be more thorough)
Fri, Mar 22
I made the puppet change but I need an SRE to merge. This is not well documented indeed, we should talk about a better way to maintain this interface that so many people use.
Approved!
Mar 20 2024
deleted from meta
I believe this dataset that's already being published is strictly better and in my opinion should replace the current active editors by country data: https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/ (also the monthly version)
Ah, thanks @will for finding T358793: Decommission AQS 1.0, @brouberol and others can go ahead and take AQS 1 offline and follow through with decommissioning. Take note of what Eric said there, the servers themselves are still useful, just AQS 1 is going away.
I'm working to find the relevant tickets, but AQS 1 should be sunset and I think it's ok to take it offline for now and follow through with the rest of the process. I've just been absent for a couple months and might be missing some nuance.
Mar 17 2024
Feb 5 2024
Jan 9 2024
Jan 8 2024
@VirginiaPoundstone this issue came up again (thanks very much to @xcollazo who remembered this task). I support option b) in Xabriel's plan above, and I think this should be triaged with high importance as a production issue. This table is used by lots of people and it seems to me it'll keep failing. If the folks looking into it don't remember this, it's a lot of time wasted.
Quick mention of this other task where some of the work took place: T353296. Relevant to this, the gerrit change https://gerrit.wikimedia.org/r/c/analytics/refinery/+/982899 included updates to the following pipelines/datasets:
Jan 4 2024
TL;DR; the data pipeline up to AQS seems fine, my guess is we're not filtering properly to exclude redirects in AQS 2, timeline corresponds with the reported problem. Sorry for the inconvenience, working on a fix.
@Mayakp.wiki the patch to watch is: https://gerrit.wikimedia.org/r/c/operations/puppet/+/981352/. This has not yet been merged and deployed. When it is, you'll start seeing the changes in x_analytics.
Datahub allows you to add descriptions at sub-field level. We should at some point get to consensus about where we want all this description stuff to live. We talked about:
Dec 22 2023
Dec 12 2023
Quick recap for anyone looking to implement lineage. First, a note regarding lineage as part of centralized configuration. I think this would be very useful, and I'm in no way suggesting that we slow down on the work that @JAllemandou and @lbowmaker are leading on that front. The reality is that a centralized config may take a few more months to get implemented. In the meantime, we could instrument lineage in the airflow DAGs in a few minutes per DAG. Done in a standard way, this would be very easy to migrate to centralized config. In addition, as we implement this we may find exceptions and edge cases that would inform the centralized config. If anyone disagrees with anything here, you are very welcome, please don't take this as a "decision". Just a thought. If we agree with this and there's some slow-down to migrate back to the centralized config, I hereby promise that I'll do it myself on all DAGs.
The following is a quick rundown of what I would think about if something goes wrong, and how I would check.
Dec 11 2023
A full list of current use cases could only be compiled by reaching out to researchers who download this dataset. Limited to what we know, current use cases are roughly:
MediaWiki History is described in detail in the following places:
The algorithm is explained at length starting here.
A shortened and updated list of Changes and Known Problems.
MediaWiki History is described in detail in the following places:
wmf_raw.mediawiki_pagelinks and wmf_raw.mediawiki_page_props is available with snapshot 2023-11
Dec 8 2023
I agree, @stjn, hopefully that's not as hyper-urgent and maybe @VirginiaPoundstone + @lbowmaker can triage.
Dec 7 2023
I'm really sorry this didn't get through the pipeline sooner, someone only told me about the issue last week. Had I known sooner I would have made the fix sooner. We are going to bring this up in our retro.
Dec 6 2023
The above patches do what I suggested in a comment on the talk page: https://meta.wikimedia.org/wiki/Talk:Requests_for_comment/Hiding_the_number_of_Russian/Belorussian/Kazakh_contributors_on_the_statistics_map which is to gray out the countries currently on the protection list and explain that the data is hidden. If and when the country list chagnes, we should update this or make it more reactive to the data itself.
Sqooping from the production replicas would mean applying the same sanitization rules on our side. I see the filter here is:
This is the varnish code (VCL) that does analytics-y things to create and update the X-analytics header. Adding stuff here would prevent us from having to change varnishkafka. Or maybe I misunderstood the whole thing, which is always possible in Varnish land :)
Dec 5 2023
This sounds like it would work... but I do want to point out a potential maintenance issue:
Nov 30 2023
Nov 29 2023
Nov 28 2023
I would like to emphatically support Timo in T169027#9362252 here. And just to re-state what I think is the most critical part of the argument:
merged and deployed right now, used to fix another instance of the webrequest duplicate map key failures. Note for future selves: it would be good to figure out where these are coming from still.
Nov 27 2023
Besides the great discussion above, I just want to point out some related things.
Nov 20 2023
@SGupta-WMF may I please have permissions to the doc too? Will asked me to review
Nov 16 2023
+1 for leaving writing to Hive tables alone (and erring towards correctness and jobs failing and hopefully comments that we can find)
+1 to instead focusing on the Iceberg migration
My apologies for the late review, +1 to Scott's point of resolving this and making it public.
Nov 14 2023
Nov 13 2023
Nov 9 2023
Since the dumps for enwiki and ukwikinews are both complete now, I looked at the snapshot hosts 101[0123]. I see that the code that seems to be failing in the stack trace has been updated to -wmf.4 (the stack traces are from -wmf.2 and -wmf.3 respectively). So this seems like it was fixed by someone else, deployed, and the snapshot hosts resumed their work.
Full output from email: