Archive
Archives

Catalan pronunciations edit

Hi, just a note to be careful when adding Catalan pronunciations. For example, you added a pronunciation of ê to esquetx, which is wrong (it should be é) and unlikely in any case, since ê generally only occurs with inheritances and some old borrowings, and esquetx is a recent borrowing from English. I have documented the sources of pronunciation in the documentation to {{ca-IPA}}; in particular, only trust the DCVB for Balearic pronunciations and don't trust cawikt at all. Benwing2 (talk) 02:34, 28 January 2024 (UTC)Reply

@Benwing2: Ok, I thought cawikt was fairly reliable. Btw, thanks for your great work on the Catalan corner! Jberkel 10:42, 28 January 2024 (UTC)Reply

Statistics edit

Hi Jberkel, willst du noch einen neunen Update der Statistik machen? Dein letzter stammt schon wieder vom 1. Juli. Ja, ich weiß dass es eine Menge Zeit und Computerkraft beansprucht, aber ich denke wir alle möchten das einfach schon mal wieder wissen. :) Steinbach (talk) 17:18, 22 February 2024 (UTC)Reply

@Steinbach Hallo, würde ich gerne regelmäßig machen, aber es gibt immer noch Datenprobleme mit den HTML-Dumps: phab:T305407. Die letzten einigermaßen kompletten Daten sind vom letzten Juli. Die WMF-Leute arbeiten daran, aber irgendwie dauert das ewig, bin schon ständig am nachfragen :( Jberkel 17:42, 22 February 2024 (UTC)Reply

HTML Dump edit

Hi, I saw your posts complaining about the lack of HTML dumps as I had the same issue. I ended up creating my own HTML dump using the API to rapidly download millions of entries. I used the 20240220 XML dump as a base so that the two dumps would include exactly the same revisions. Note that the same wikitext can produce different HTML code at different points in time, so I can't guarantee that the page looks exactly as it did at the time of the XML dump.

  • Pages included: non-redirects in namespaces 0 (main) and 118 (reconstruction)
  • Number of lines: 7,952,575
  • Time generated: February ‎20, ‎2024, ‏‎7:49:52 PM to ‎February ‎22, ‎2024, ‏‎1:16:18 AM (EST)
  • Uncompressed size: 112,213,194,308 bytes
  • Compressed size: 5,482,140,342 bytes

Would you be interested in the code or the dump itself?

Ioaxxere (talk) 20:05, 22 February 2024 (UTC)Reply

@Ioaxxere Lol, I'm close to starting a project myself, given the glacial progress on the WMF side. Yes, I'm interested, how did you get the HTML, how long does it take? Is it the Parsoid rendered version which is used in the HTML dumps? If you want we can join forces and run it as a community project. Jberkel 09:44, 23 February 2024 (UTC)Reply

The script works by grabbing HTML data using a revision ID. For example: https://en.wiktionary.org/w/api.php?action=parse&oldid=65853771&format=json. I'm not sure what parser is used but it seems to correspond with "view page source" in my browser. Here is the code:

Then I verified the output with this code:

Which produced:

These correspond with pages in the XML dump that have recently been deleted.

I don't have the time/resources to generate these on a regular basis, but you're welcome to adapt this code for your purposes!

Ioaxxere (talk) 19:56, 23 February 2024 (UTC)Reply

Oh god, I just realized that adding &parsoid=true to the API query gives *far* better data. Time to rerun... Ioaxxere (talk) 20:09, 23 February 2024 (UTC)Reply
Cool, thanks! We could run it on WMF infrastructure. Great to see that 50 lines of Python yield better results than the WMF's buzzword soup of Kafka, DAGs and what have you… How long does it take to do a full run? Jberkel 15:20, 26 February 2024 (UTC)Reply
nm, you already had in your post, almost 2 days… :) Jberkel 15:57, 26 February 2024 (UTC)Reply
Even if the WMF some day manage to produce useful dumps again, we'll still need wiki-specific namespaces such as Reconstruction, so it'll be useful to have some way of generating them ourselves. Jberkel 15:58, 26 February 2024 (UTC)Reply

ScribuntoUnit vs. UnitTests edit

I just discovered there are two unit testing frameworks here, Module:UnitTests used by everyone but you, and Module:ScribuntoUnit used by you. The former is older than the latter, so I'm not sure why you imported the latter from Wikipedia, but I think we should consolidate. Can you think about converting your unit tests to use Module:UnitTests? Benwing2 (talk) 20:34, 10 March 2024 (UTC)Reply

Hi, just wondering if you got my msg. Can you at least clarify why you imported and started using Module:ScribuntoUnit in preference to our own module? BTW I just discovered a third unit test framework, Module:QFQ/UnitTests, used only on Module:mnw-translit. Benwing2 (talk) 07:43, 14 March 2024 (UTC)Reply
Hi @Benwing2, sorry had short Wiktionary hiatus. It's been a long time (~ 10 years), but I think when I first looked at Module:UnitTests it was a spaghetti mess and didn't have the features I wanted. That's probably no longer the case, and I agree it's better to standardize on one framework. Jberkel 09:27, 15 March 2024 (UTC)Reply

catalogue raisonné edit

Wwoww, Jberkel, you're fast. Wanted to cite the same Guardian passage here, and it was already there ... MistaPPPP (talk) 12:55, 19 March 2024 (UTC)Reply

Apologies edit

I need to apologise to you also, about my simple edit in my archaic paragraph about certain 'etymologies that discredit Wiktionary' that it should have completely disrupted the edit section including yours - there should really be mechanism in place to stop this from happening, since any innocent editor could well make a similar mistake that if not detected quickly as both Surjection and I did, it could cause linguistic mayhem! Regards, Andrew Andrew H. Gray 11:40, 29 March 2024 (UTC)

On ass... edit

What Doyle said was about this:

https://en.m.wiktionary.org/wiki/arse#English

Here, ass is another way of spelling arse (as in dumb). Lunatone3000 (talk) 22:24, 4 April 2024 (UTC)Reply