Page MenuHomePhabricator

Milimetric (Dan Andreescu)
Staff Engineer (Data Engineering)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (497 w, 2 d)
Availability
Available
IRC Nick
Milimetric
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Yesterday

Milimetric added a comment to T361889: Decision: OpenAPI spec viewer for AQS.

It would be cool to do a quick spike into Scalar and the customization we'd need there. Abstain as a voter here, I like all the options just fine and I have bad aesthetics when it comes to reading docs because I just start hacking and see what happens :)

Thu, Apr 18, 7:48 AM · Data Products (Data Products Sprint 12), Tech-Docs-Team, Documentation, AQS2.0
Milimetric added a comment to T361887: Decision: AQS user documentation approach.

+1 for Option 2. For what it's worth, when we initially put up the endpoint docs on wikitech we were just doing so while we waited for a better end user experience than the swagger UI afforded us. I especially like the integration with wikitech described in option 2 (the discovery pages that would lead wiki users to the docs)

Thu, Apr 18, 7:45 AM · Data Products (Data Products Sprint 12), Tech-Docs-Team, Documentation, AQS2.0

Tue, Apr 16

Milimetric added a comment to T362268: Design the technical architecture for MPIC.

+1, SSR is kind of a pain if done in fancier ways, but in this way you get a lot for free and it helps even reduce code. As a bonus the user gets a great experience.

Tue, Apr 16, 11:14 AM · Data Products (Data Products Sprint 12), Metrics Platform Backlog

Mon, Apr 15

Milimetric moved T362551: Commons Impact AQS: endpoints with unit tests from Sprint Backlog to Code Review / Tech Input on the Data Products (Data Products Sprint 12) board.
Mon, Apr 15, 4:17 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric updated the task description for T362552: Commons Impact AQS: integration tests and deployment for endpoints.
Mon, Apr 15, 4:17 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric updated the task description for T362551: Commons Impact AQS: endpoints with unit tests.
Mon, Apr 15, 4:17 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric changed the point value for T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints from 21 to 5.
Mon, Apr 15, 4:16 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric moved T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints from Code Review / Tech Input to Paused on the Data Products (Data Products Sprint 12) board.

I've broken this down into subtasks but I'm keeping it as something between an epic and an actual task. It's coordinating and has all the acceptance criteria, it was just too big. So I'll leave the other two subtasks on the boards while I'm on vacation and put this in paused. This can be resumed whenever you'd like to continue work on coordinating and deployment.

Mon, Apr 15, 4:16 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric created T362552: Commons Impact AQS: integration tests and deployment for endpoints.
Mon, Apr 15, 4:15 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric created T362551: Commons Impact AQS: endpoints with unit tests.
Mon, Apr 15, 4:14 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric moved T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 12) board.
Mon, Apr 15, 4:04 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics

Thu, Apr 11

Sj awarded T249419: RFC: Render data visualizations on the server a Love token.
Thu, Apr 11, 1:33 PM · Wikimedia-Performance-recommendation, JavaScript, MediaWiki-extensions-Graph, covid-19, TechCom-RFC
Milimetric added a comment to T361742: Requesting access to shell access to analytics client servers for AndyRussG.

Approved, welcome back Andy :)

Thu, Apr 11, 10:45 AM · Patch-For-Review, SRE, SRE-Access-Requests
Milimetric added a comment to T362113: Requesting access to analytics-privatedata-users for Steph Toyofuku.

Approved

Thu, Apr 11, 10:43 AM · Patch-For-Review, SRE, SRE-Access-Requests

Thu, Apr 4

Milimetric set the point value for T361669: Implement Category Metrics Snapshot API to 3.
Thu, Apr 4, 11:12 AM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric set the point value for T356748: Adding a AQS 2.0 endpoint guide to 2.
Thu, Apr 4, 11:12 AM · Data Products, AQS2.0
Milimetric set the point value for T361668: Go project and solution setup to 8.
Thu, Apr 4, 11:12 AM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric changed the point value for T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints from 34 to 21.
Thu, Apr 4, 11:11 AM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric moved T354823: [PHP] Remove dispatch method from Code Review / Tech Input to To Deploy on the Data Products (Data Products Sprint 11) board.
Thu, Apr 4, 11:09 AM · Data Products (Data Products Sprint 11), Patch-For-Review, Metrics Platform Backlog, good first task

Wed, Apr 3

Milimetric moved T358699: [Commons Impact Metrics] Create Airflow job that generates the datasets in Iceberg from Sprint Backlog to In Process on the Data Products (Data Products Sprint 11) board.
Wed, Apr 3, 9:43 PM · Data Products (Data Products Sprint 12), Patch-For-Review, Commons-Impact-Metrics
Milimetric claimed T358699: [Commons Impact Metrics] Create Airflow job that generates the datasets in Iceberg.
Wed, Apr 3, 9:43 PM · Data Products (Data Products Sprint 12), Patch-For-Review, Commons-Impact-Metrics

Mon, Apr 1

Milimetric moved T360501: Update ReadMe Doc for Tests Framework from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 11) board.
Mon, Apr 1, 4:14 PM · Data Products (Data Products Sprint 12), AQS2.0
Milimetric moved T360735: [User Story] Build the backend service for MPIC from Code Review / Tech Input to Done on the Data Products (Data Products Sprint 11) board.
Mon, Apr 1, 4:09 PM · Metrics Platform Backlog, Data Products (Data Products Sprint 11)

Fri, Mar 29

Milimetric added a comment to T361242: Unique devices tables have missing or incorrect data for January and February 2024.

I found a candidate bug. The script used to ask for the year and month, and after the change it asks for the day. generate_druid_unique_devices_per_domain_daily_aggregated_monthly.hql seems to have been adapted to give the correct result, but evidence to the contrary, the druid output seems to be only one day. Running now to prove or disprove.

Fri, Mar 29, 6:32 PM · Data-Engineering, Movement-Insights, Patch-For-Review, Data-Platform
Milimetric added a comment to T361242: Unique devices tables have missing or incorrect data for January and February 2024.
NOTE: one key finding here is that DataHub is not kept in sync with these data migrations. If we don't address this, DataHub will become more of a source of confusion than clarity.
Fri, Mar 29, 6:06 PM · Data-Engineering, Movement-Insights, Patch-For-Review, Data-Platform
Milimetric added a comment to T361242: Unique devices tables have missing or incorrect data for January and February 2024.

Merge request 582 seems to have changed how we do this monthly druid segment aggregation, so the answer must be around here. Again I checked the new source table, now Iceberg (wmf_readership.unique_devices_per_project_family_daily) and again that seems to have data for all of January, for example.

Fri, Mar 29, 6:04 PM · Data-Engineering, Movement-Insights, Patch-For-Review, Data-Platform
Milimetric added a comment to T361242: Unique devices tables have missing or incorrect data for January and February 2024.

Looked at this a bit today.

Fri, Mar 29, 5:53 PM · Data-Engineering, Movement-Insights, Patch-For-Review, Data-Platform

Mon, Mar 25

Milimetric added a comment to T342577: Data Quality - requestctl not getting set.

@VirginiaPoundstone: Looks like Giuseppe patched varnish to send more requestctls, so maybe that completely or partially solves the problem. I'd have to look through the data to see. I'm going to do a good job focusing and only do that if you put it in the sprint :) (should take no more than an hour, but it's probably not like a few seconds if I want to be more thorough)

Mon, Mar 25, 5:14 PM · Data Products (Data Products Sprint 13), SRE, Traffic
Milimetric moved T358718: [Commons Impact Metrics] Create a new AQS service with all the endpoints from Sprint Backlog to In Process on the Data Products (Data Products Sprint 11) board.
Mon, Mar 25, 4:07 PM · Data Products (Data Products Sprint 12), Commons-Impact-Metrics
Milimetric created T360914: Update Dashiki Cloud Instances.
Mon, Mar 25, 3:59 PM · Data Products (Data Products Sprint 13)

Fri, Mar 22

Milimetric moved T356444: NEW BUG REPORT Wikipedia clickstream datasets link on Dumps "Other" page should point to HTML readme from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 11) board.

I made the puppet change but I need an SRE to merge. This is not well documented indeed, we should talk about a better way to maintain this interface that so many people use.

Fri, Mar 22, 4:21 PM · Patch-For-Review, Data Products (Data Products Sprint 11), Data-Platform
Milimetric added a comment to T359561: Add user fabfur to analytics-privatedata-users.

Approved!

Fri, Mar 22, 11:26 AM · Patch-For-Review, Data-Platform-SRE (2024.03.25 - 2024.04.14), SRE, SRE-Access-Requests

Mar 20 2024

Milimetric added a comment to T347970: [L] MachineVision: archive and remove all events and event schemas.

deleted from meta

Mar 20 2024, 4:39 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), MachineVision
Milimetric added a comment to T360073: Wikistats "Active Editors by Country" does not follow definition for active editors.

I believe this dataset that's already being published is strictly better and in my opinion should replace the current active editors by country data: https://analytics.wikimedia.org/published/datasets/geoeditors_weekly/ (also the monthly version)

Mar 20 2024, 3:34 PM · Data Products, Data-Engineering, Movement-Insights, Data-Platform
Milimetric updated subscribers of T360522: aqs endpoint health alerting about mismatched check.

Ah, thanks @will for finding T358793: Decommission AQS 1.0, @brouberol and others can go ahead and take AQS 1 offline and follow through with decommissioning. Take note of what Eric said there, the servers themselves are still useful, just AQS 1 is going away.

Mar 20 2024, 2:23 PM · Patch-For-Review, Data-Platform-SRE (2024.03.04 - 2024.03.24), Data-Engineering
Milimetric added a comment to T360522: aqs endpoint health alerting about mismatched check.

I'm working to find the relevant tickets, but AQS 1 should be sunset and I think it's ok to take it offline for now and follow through with the rest of the process. I've just been absent for a couple months and might be missing some nuance.

Mar 20 2024, 2:21 PM · Patch-For-Review, Data-Platform-SRE (2024.03.04 - 2024.03.24), Data-Engineering

Mar 17 2024

rokejulianlockhart awarded T249419: RFC: Render data visualizations on the server a Like token.
Mar 17 2024, 7:02 PM · Wikimedia-Performance-recommendation, JavaScript, MediaWiki-extensions-Graph, covid-19, TechCom-RFC

Feb 5 2024

Mayakp.wiki awarded T333223: Adding user_is_temp to the user table a Barnstar token.
Feb 5 2024, 7:50 PM · MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MW-1.41-notes (1.41.0-wmf.10; 2023-05-23), Anti-Harassment, Data-Persistence, Data-Engineering, Temporary accounts

Jan 9 2024

Milimetric set the point value for T353296: Netherlands appears twice as "The Netherlands" or "Netherlands" in country coded data to 3.
Jan 9 2024, 6:14 PM · Movement-Insights, Data Products (Data Products Sprint 05), Data-Platform
Milimetric set the point value for T352793: MediaWiki History Plan: Maintenance Plan to 2.
Jan 9 2024, 6:13 PM · Data Products (Data Products Sprint 05)
Milimetric set the point value for T352790: MediaWiki History Plan: use cases and potential work to 2.
Jan 9 2024, 6:13 PM · Data Products (Data Products Sprint 05)
Milimetric assigned T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow to mforns.
Jan 9 2024, 1:20 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products, Movement-Metrics, Movement-Insights
Milimetric moved T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow from Sprint Backlog to In Process on the Data Products (Data Products Sprint 05) board.
Jan 9 2024, 1:20 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products, Movement-Metrics, Movement-Insights
Milimetric added a project to T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow: Data Products (Data Products Sprint 05).
Jan 9 2024, 1:19 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products, Movement-Metrics, Movement-Insights

Jan 8 2024

Milimetric updated subscribers of T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow.

@VirginiaPoundstone this issue came up again (thanks very much to @xcollazo who remembered this task). I support option b) in Xabriel's plan above, and I think this should be triaged with high importance as a production issue. This table is used by lots of people and it seems to me it'll keep failing. If the folks looking into it don't remember this, it's a lot of time wasted.

Jan 8 2024, 9:02 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Data Products, Movement-Metrics, Movement-Insights
Milimetric added a comment to T353956: Traffic anomaly detection triggers alerts because of a MaxMind Country rename.

Quick mention of this other task where some of the work took place: T353296. Relevant to this, the gerrit change https://gerrit.wikimedia.org/r/c/analytics/refinery/+/982899 included updates to the following pipelines/datasets:

Jan 8 2024, 5:39 PM · Data Products (Data Products Sprint 05)

Jan 4 2024

Milimetric added a comment to T354074: Wikistats - incorrect number of content articles for Latvian Wikipedia .

TL;DR; the data pipeline up to AQS seems fine, my guess is we're not filtering properly to exclude redirects in AQS 2, timeline corresponds with the reported problem. Sorry for the inconvenience, working on a fix.

Jan 4 2024, 9:12 PM · Data Products (Data Products Sprint 07), Data-Engineering, Analytics, Data-Engineering-Wikistats
Milimetric added a comment to T346463: Identify and label prefetch proxy data in our traffic.

@Mayakp.wiki the patch to watch is: https://gerrit.wikimedia.org/r/c/operations/puppet/+/981352/. This has not yet been merged and deployed. When it is, you'll start seeing the changes in x_analytics.

Jan 4 2024, 2:57 PM · Traffic, Movement-Insights, Data-Engineering
Milimetric added a comment to T307040: Propagate field descriptions from event schemas to Hive event tables.

Datahub allows you to add descriptions at sub-field level. We should at some point get to consensus about where we want all this description stuff to live. We talked about:

Jan 4 2024, 2:46 PM · Patch-For-Review, Product-Analytics, Data-Engineering

Dec 22 2023

xcollazo awarded T352793: MediaWiki History Plan: Maintenance Plan a Pterodactyl token.
Dec 22 2023, 7:28 PM · Data Products (Data Products Sprint 05)

Dec 12 2023

Milimetric updated subscribers of T312566: Emit lineage information about Airflow jobs to DataHub.

Quick recap for anyone looking to implement lineage. First, a note regarding lineage as part of centralized configuration. I think this would be very useful, and I'm in no way suggesting that we slow down on the work that @JAllemandou and @lbowmaker are leading on that front. The reality is that a centralized config may take a few more months to get implemented. In the meantime, we could instrument lineage in the airflow DAGs in a few minutes per DAG. Done in a standard way, this would be very easy to migrate to centralized config. In addition, as we implement this we may find exceptions and edge cases that would inform the centralized config. If anyone disagrees with anything here, you are very welcome, please don't take this as a "decision". Just a thought. If we agree with this and there's some slow-down to migrate back to the centralized config, I hereby promise that I'll do it myself on all DAGs.

Dec 12 2023, 8:06 PM · Data-Engineering, Data-Catalog
Milimetric added a comment to T351117: Move analytics log from Varnish to HAProxy.

Hi @Milimetric sorry for the late reply, I'll try to answer to your question but consider we're still investigating about all pro and cons of this "migration", and for sure we'll share our thought and our action plan before moving on with this...

Dec 12 2023, 4:17 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

The following is a quick rundown of what I would think about if something goes wrong, and how I would check.

Dec 12 2023, 3:56 PM · Data Products (Data Products Sprint 05)

Dec 11 2023

Milimetric moved T352790: MediaWiki History Plan: use cases and potential work from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 05) board.

A full list of current use cases could only be compiled by reaching out to researchers who download this dataset. Limited to what we know, current use cases are roughly:

Dec 11 2023, 9:21 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352790: MediaWiki History Plan: use cases and potential work.

MediaWiki History is described in detail in the following places:

Dec 11 2023, 9:00 PM · Data Products (Data Products Sprint 05)
Milimetric moved T352793: MediaWiki History Plan: Maintenance Plan from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 05) board.
Dec 11 2023, 8:59 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

The algorithm is explained at length starting here.

Dec 11 2023, 8:59 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

A shortened and updated list of Changes and Known Problems.

Dec 11 2023, 8:56 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

MediaWiki History is described in detail in the following places:

Dec 11 2023, 8:32 PM · Data Products (Data Products Sprint 05)
Milimetric claimed T352790: MediaWiki History Plan: use cases and potential work.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric claimed T352793: MediaWiki History Plan: Maintenance Plan.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric moved T352790: MediaWiki History Plan: use cases and potential work from Sprint Backlog to In Process on the Data Products (Data Products Sprint 05) board.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric moved T352793: MediaWiki History Plan: Maintenance Plan from Sprint Backlog to In Process on the Data Products (Data Products Sprint 05) board.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T353134: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2023-11-27.

wmf_raw.mediawiki_pagelinks and wmf_raw.mediawiki_page_props is available with snapshot 2023-11

Dec 11 2023, 3:00 PM · Discovery-Search (Current work), Data-Engineering, Structured-Data-Backlog, Image-Suggestions, CirrusSearch

Dec 8 2023

Milimetric updated subscribers of T333716: "Active editors by country" doesn't display numbers for Belarus, Kazakhstan, Russia.

I agree, @stjn, hopefully that's not as hyper-urgent and maybe @VirginiaPoundstone + @lbowmaker can triage.

Dec 8 2023, 7:10 PM · Russian-Sites, Data-Engineering, Data-Engineering-Wikistats

Dec 7 2023

Milimetric added a comment to T339318: Indicate that some country data are unavailable on Wikistats.

I'm really sorry this didn't get through the pipeline sooner, someone only told me about the issue last week. Had I known sooner I would have made the fix sooner. We are going to bring this up in our retro.

Dec 7 2023, 2:49 PM · Trust-and-Safety, Russian-Sites, Data-Engineering, Data-Engineering-Wikistats
Milimetric added a comment to T333716: "Active editors by country" doesn't display numbers for Belarus, Kazakhstan, Russia.

@Milimetric: this is great, but I think it should be also indicated under the map that some countries do not have any results, so people can see this easier. For example, page view stats have this in the bottom: Those countries with less than 100 views are not reported and are blank in the map. Seems like the absence of data for privacy reasons is good to report there as well. Can you also add that?

Dec 7 2023, 2:47 PM · Russian-Sites, Data-Engineering, Data-Engineering-Wikistats

Dec 6 2023

Milimetric added a comment to T333716: "Active editors by country" doesn't display numbers for Belarus, Kazakhstan, Russia.

The above patches do what I suggested in a comment on the talk page: https://meta.wikimedia.org/wiki/Talk:Requests_for_comment/Hiding_the_number_of_Russian/Belorussian/Kazakh_contributors_on_the_statistics_map which is to gray out the countries currently on the protection list and explain that the data is hidden. If and when the country list chagnes, we should update this or make it more reactive to the data itself.

Dec 6 2023, 10:05 PM · Russian-Sites, Data-Engineering, Data-Engineering-Wikistats
Ladsgroup awarded T249419: RFC: Render data visualizations on the server a Love token.
Dec 6 2023, 7:45 PM · Wikimedia-Performance-recommendation, JavaScript, MediaWiki-extensions-Graph, covid-19, TechCom-RFC
Milimetric updated subscribers of T352879: Update the sqoop configuration for mediawiki to obtain linktarget from the production replicas, instead of wikireplicas.

Sqooping from the production replicas would mean applying the same sanitization rules on our side. I see the filter here is:

Dec 6 2023, 4:31 PM · Data-Platform-SRE, Data-Engineering
Milimetric added a comment to T346463: Identify and label prefetch proxy data in our traffic.

This is the varnish code (VCL) that does analytics-y things to create and update the X-analytics header. Adding stuff here would prevent us from having to change varnishkafka. Or maybe I misunderstood the whole thing, which is always possible in Varnish land :)

Dec 6 2023, 12:10 PM · Traffic, Movement-Insights, Data-Engineering

Dec 5 2023

Milimetric updated subscribers of T352650: Migrate current-generation dumps to run from our containerized images.

This sounds like it would work... but I do want to point out a potential maintenance issue:

Dec 5 2023, 5:07 PM · MW-on-K8s, Dumps-Generation, Release-Engineering-Team, serviceops
Milimetric created T352793: MediaWiki History Plan: Maintenance Plan.
Dec 5 2023, 4:54 PM · Data Products (Data Products Sprint 05)
Milimetric created T352790: MediaWiki History Plan: use cases and potential work.
Dec 5 2023, 4:49 PM · Data Products (Data Products Sprint 05)
Milimetric renamed T352787: [Sprint 05 GOAL] MediaWiki History Knowledge Hub from [User Story] <title> to [User Story] MediaWiki History Plan.
Dec 5 2023, 4:43 PM · Data Products (Data Products Sprint 05)
Milimetric created T352787: [Sprint 05 GOAL] MediaWiki History Knowledge Hub.
Dec 5 2023, 4:43 PM · Data Products (Data Products Sprint 05)

Nov 30 2023

Milimetric added a comment to T351909: Duplicate keys in x_analytics header corrupt some wmf_raw.webrequest rows and break refinement of wmf.webrequest.

Is it possible to have the monitoring log some information about the rows such that we can figure out where they're coming from?

Nov 30 2023, 4:36 PM · Data Products (Data Products Sprint 04), Data-Engineering

Nov 29 2023

Milimetric moved T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint from In Process to Sign Off on the Data Products (Data Products Sprint 04) board.
Nov 29 2023, 6:37 PM · Data Products (Data Products Sprint 04)
Milimetric updated the task description for T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint.
Nov 29 2023, 6:37 PM · Data Products (Data Products Sprint 04)

Nov 28 2023

Milimetric added a comment to T169027: Provide iframe sandboxing for rich-media extensions (defense in depth).

I would like to emphatically support Timo in T169027#9362252 here. And just to re-state what I think is the most critical part of the argument:

Nov 28 2023, 7:23 PM · MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Security, Technical-Debt, Commons, MediaWiki-File-management, Multimedia
Milimetric moved T351909: Duplicate keys in x_analytics header corrupt some wmf_raw.webrequest rows and break refinement of wmf.webrequest from In Process to Done on the Data Products (Data Products Sprint 04) board.

merged and deployed right now, used to fix another instance of the webrequest duplicate map key failures. Note for future selves: it would be good to figure out where these are coming from still.

Nov 28 2023, 6:30 PM · Data Products (Data Products Sprint 04), Data-Engineering

Nov 27 2023

Milimetric reassigned T347953: Spike : AQS 2.0 Versioning Options from Milimetric to SGupta-WMF.
Nov 27 2023, 5:12 PM · Data Products (Data Products Sprint 04), AQS2.0
Milimetric moved T347953: Spike : AQS 2.0 Versioning Options from Code Review / Tech Input to Sign Off on the Data Products (Data Products Sprint 04) board.
Nov 27 2023, 5:12 PM · Data Products (Data Products Sprint 04), AQS2.0
Milimetric added a comment to T351117: Move analytics log from Varnish to HAProxy.

Besides the great discussion above, I just want to point out some related things.

Nov 27 2023, 5:07 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Nov 20 2023

Milimetric added a comment to T347953: Spike : AQS 2.0 Versioning Options.

@SGupta-WMF may I please have permissions to the doc too? Will asked me to review

Nov 20 2023, 5:33 PM · Data Products (Data Products Sprint 04), AQS2.0
Milimetric moved T347998: Commons Impact Metrics - Implement prototype from Done to Sign Off on the Data Products (Data Products Sprint 04) board.
Nov 20 2023, 5:12 PM · Data Products (Data Products Sprint 04)
Milimetric moved T347998: Commons Impact Metrics - Implement prototype from In Process to Done on the Data Products (Data Products Sprint 04) board.
Nov 20 2023, 5:12 PM · Data Products (Data Products Sprint 04)
Milimetric set the point value for T351195: WikimediaEvents: Remove partial migration of *UIActions instrument to 3.
Nov 20 2023, 5:10 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Technical-Debt, Data Products (Data Products Sprint 04), MediaWiki-extensions-WikimediaEvents, good first task
Milimetric moved T348571: Create tasks for remaining Dumps work with Product Manager from Paused to Sprint Backlog on the Data Products (Data Products Sprint 04) board.
Nov 20 2023, 5:07 PM · Data Products

Nov 16 2023

Milimetric added a comment to T351388: Add a spark global config for better file commit strategy.

+1 for leaving writing to Hive tables alone (and erring towards correctness and jobs failing and hopefully comments that we can find)
+1 to instead focusing on the Iceberg migration

Nov 16 2023, 9:58 PM · Data-Engineering (Sprint 5), Data-Platform-SRE
Milimetric added a comment to T342487: [Event Platform] Actor performing suppression revealed publicly.

My apologies for the late review, +1 to Scott's point of resolving this and making it public.

Nov 16 2023, 5:31 PM · Data-Engineering (Sprint 6), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), SecTeam-Processed, Privacy Engineering, Event-Platform, Vuln-Infoleak, Security
Milimetric moved T345874: XMLDumps broken on deployment-mwmaint02 due to Jade Extension related content from Backlog to Done on the Dumps-Generation board.
Nov 16 2023, 3:31 PM · Dumps-Generation, MediaWiki-ContentHandler, Beta-Cluster-Infrastructure

Nov 14 2023

Milimetric claimed T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint.
Nov 14 2023, 3:18 PM · Data Products (Data Products Sprint 04)
Milimetric created T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint.
Nov 14 2023, 3:18 PM · Data Products (Data Products Sprint 04)

Nov 13 2023

Milimetric moved T349416: Synthesize results for Product Analytics' review of Metrics Platform event types from Sign Off to Done on the Data Products (Data Products (Sprint 03)) board.
Nov 13 2023, 5:06 PM · Data Products (Data Products (Sprint 03))
Milimetric moved T348731: Follow up on remaining requests to pageviews endpoints from Sign Off to Done on the Data Products (Data Products (Sprint 03)) board.
Nov 13 2023, 5:06 PM · RESTBase Sunsetting, Wikifeeds, Content-Transform-Team-WIP, Data Products (Data Products (Sprint 03))

Nov 9 2023

Milimetric edited P53125 category tree parsing.
Nov 9 2023, 10:22 PM
Milimetric moved T350898: Failure on enwiki and ukwikinews from Active to Done on the Dumps-Generation board.

Since the dumps for enwiki and ukwikinews are both complete now, I looked at the snapshot hosts 101[0123]. I see that the code that seems to be failing in the stack trace has been updated to -wmf.4 (the stack traces are from -wmf.2 and -wmf.3 respectively). So this seems like it was fixed by someone else, deployed, and the snapshot hosts resumed their work.

Nov 9 2023, 7:00 PM · Dumps-Generation
Milimetric added a comment to T350898: Failure on enwiki and ukwikinews.

Full output from email:

Nov 9 2023, 5:16 PM · Dumps-Generation
Milimetric moved T350898: Failure on enwiki and ukwikinews from Backlog to Active on the Dumps-Generation board.
Nov 9 2023, 5:14 PM · Dumps-Generation