Help talk:Ranking

Latest comment: 4 months ago by Vojtěch Dostál in topic Marking inactive social media accounts

Use of prefered rank edit

"Prefered" rank should not be used to distinguish between two sourced data of the same rank but one being more recent than the other: a distinction according to the date between different data sets has to be done according to a date parameter like "date of publication" : we have to avoid to do a manual operation when an automatic selection can be done. Date selection like language selection can be done easily by code when appropriated data is available. An evaluation of the sources can't be done by the code so we have to keep the ranking only for assess sources for their relevance and reliability. Snipre (talk) 07:57, 21 December 2013 (UTC)Reply

RfC to validate this page edit

As far as I know The meaning of "deprecated" in english is far from "unsourced" as the help page states at that point, so to avoid edit warring I suggest we do things the right way and start a RfC. TomT0m (talk) 13:41, 14 December 2013 (UTC)Reply

This page is brand new; let's give it a chance to evolve a bit before considering an RfC. I think it's more a matter of informing contributors, rather than reconciling opposing views on expected usage. LaddΩ chat ;) 14:02, 14 December 2013 (UTC)Reply
A RfC is a request for comment to the community' that's all. I think it's the least we must do before adopting a new policy about the usage of such an impotant feature as it may influence the query engine results. and before twisting the planned use by developers (deprecated means "was OK in the past but is not anymore"), otherwise Wikidata won't be understandable at all. TomT0m (talk) 20:49, 14 December 2013 (UTC)Reply
@TomT0m:. You are right: "deprecated" is not "unsourced" but how do you want to assess a data without any information about the source ? Deprecated is an evaluation of the source so without source no possible evaluation and as no option is given in the list for "unsourced" only "deprecated" can be used. Snipre (talk) 07:43, 21 December 2013 (UTC)Reply
Snipre Do not agree, the policy of the project is that there is exceptions to the general sourcing policy for example, hence all and every statements are not supposed to be sourced. I think only a properly sourced statement can be ranked prefered is more than enough. Plus practically it will take ages to source everything, and our sourcing policy relative to imports for example is not totally stable. TomT0m (talk) 11:27, 21 December 2013 (UTC)Reply

Updates as part of documentation overhaul edit

Hi all,

I recently made some edits to Help:Ranking as part of a larger sitewide documentation overhaul (more info on this here).

Changes include the following:

  • eliminated more technical language/jargon about queries; also indicated that queries have not yet been implemented
  • proofread and edited some of the language
  • added examples for normal ranks
  • added 'how to' section with screenshots

Updates were recently made to this page with the reason in the edit summary given as "erasing general misconceptions" about ranks. I have respected these changes, but still have some remaining questions, specifically about the relationship between ranks and sources:

  • can or should preferred and deprecated ranks be added to statements that don't have any sources?
  • what is the status of statements that have the imported from Wikimedia project (P143) property?
  • should there be info on this page included for bots specifically?

FYI, I also removed a sentence about it being possible to have "multiple preferred statements when there is no consensus." I am not quite sure on this why it would be - and why such statements wouldn't just be assigned normal ranks (especially given that they would still be retrieved with queries so long as no other preferred ranks exist).

Please let me know if you have any concerns about these changes or suggestions on further improving the documentation.

Thanks. -Thepwnco (talk) 23:30, 2 July 2014 (UTC)Reply

can or should preferred and deprecated ranks be added to statements that don't have any sources? No
what is the status of statements that have the imported from Wikimedia project (P143) property? IMO, deprecated or at least no rank
should there be info on this page included for bots specifically? Bots follow the same rule as humans. Snipre (talk) 13:45, 3 July 2014 (UTC)Reply
Different point of view:
  • can or should preferred and deprecated ranks be added to statements that don't have any sources? Adding ranks to statements without a source should not be restricted. That would miss the concept of ranks and references. Ranks are not an evaluation of a statement's sources nor of their existence.
  • what is the status of statements that have the imported from Wikimedia project (P143) property? I am not sure that I understand "status" here. There is and should be no restriction for any statements to have assigned specific ranks. Basically, the intention of ranks is to only mark most up-to-date statements or statements that have a consensus on the one hand and outdated or erroneous values on the other hand. There is no connection to the type or quality of references.
The sentence "ranks indicate the quality or reliability of the source of information" is wrong and, please, please excuse me, totally misses the point of ranks. Indicating quality or reliability with ranks overloads the concept of ranks with additional meaning (please consult the definition of ranks in the Wikibase Data Mode primer). If there is any community consensus on overloading the meaning of ranks, please point me to it and I apologize.
  • should there be info on this page included for bots specifically? Not necessary in my opinion.
Another observation so far: The opening sentence is a bit shaky. "Discussed elsewhere" - where, and why is that important on the help page? Apart from that, yes, ranks are for differentiation but, differentiation in what way? It is just the introduction but still, it sounds a bit too generic. Random knowledge donator (talk) 15:23, 3 July 2014 (UTC)Reply
@Snipre, Random knowledge donator: thanks both for chiming in. It seems like we have a disagreement here on the relationship between sources and ranks. I myself am a bit confused about the idea of sources having no influence at all on ranks (except, perhaps, for cases when a statement is considered common knowledge), especially as it seems that the deprecated rank was specifically designed so that statements which are incorrect but sourced can still exist in Wikidata (with no negative effect on how queries are handled). To me, saying that ranks have nothing to do with the quality or reliability of the data misses the point, which is that when Wikibase is queried, the statements which are returned are not selected in a completely arbitrary fashion. There are motivations behind why a statement is designated as preferred, deprecated, and so on and this is often very much a factor of its sources. Furthermore, it seems to me as though sources are an important accountability and quality assurance mechanism for ranks which could otherwise be manipulated by users as a form of edit warring.
The definition of ranks in the Wikibase Data Model primer also doesn't seem completely accurate. For instance, it recommends applying several preferred ranks to statements with multiple values, even when the values are neutral, for example the various children of a person. Would it not be better—and less work—to just have multiple normal ranks (the default) in cases such as these?
-Thepwnco (talk) 16:20, 3 July 2014 (UTC)Reply
The last point I can answer. We initially assumed only preferred statements would be indexed for querying. So you would need to mark everything you want to find as preferred. We have since changed that. Now normal statements will be returned in search if no preferred ones are available. --Lydia Pintscher (WMDE) (talk) 16:59, 3 July 2014 (UTC)Reply
A statement without a source should definitively have a lower rank than one with one source. Then imported from Wikimedia project (P143) is used for data imported from wikipedia and as wikipedia is not a source, the previous comment can be applied. The most correct way to rank them is to avoid to put a rank because there are no possibility to judge the rank of something without a source: no time, no author,...
In my opinion the rank concept is not well used: to specify the more recent statement we should use the time properties used in the reference definition (date of publication, point in time,....). Rank should be used to distinguish between the quality of the sources in order to offer a easy way to data users to select a good value without having to evaluate the different values when several possibilities exist.
To solve the problem of preferred and normal, preferred should be used only when several values are available and when no temporal distinction can be made using time properties. I prefer no ranking than deprecated or we have to assume than every statement which is older than another one is deprecated. Snipre (talk) 17:55, 3 July 2014 (UTC)Reply
Please do not conflate the existence of a source with ranks. A statement can be perfectly fine without a source and should not be marked deprecated just because it has no source. If the re-user of our data wants to give more weight to statements with sources they can very easily do that. If we mix those two concepts too much we very quickly run into trouble. --Lydia Pintscher (WMDE) (talk) 21:55, 3 July 2014 (UTC)Reply
@Lydia Pintscher (WMDE): So rank should be used only when several values for the same statements are available. Snipre (talk) 08:16, 8 July 2014 (UTC)Reply
@Snipre, Random knowledge donator, Lydia Pintscher (WMDE): just an update - I've made a few changes to this page to clarify the relationship between ranks and sources -- the inclusion of sources is now just a recommendation rather than a requirement (see the section on preferred statements). I've also updated Help:Sources so it is clear that imported from Wikimedia project (P143) is not an acceptable source for statements. -Thepwnco (talk) 17:38, 7 July 2014 (UTC)Reply
In my opinion, one problem is, once again, the terminology. "Rank" implies hierarchy or even rating, although, the concept of Wikidata ranks is not really supposed to express a hierarchy. It is, more or less, a three state switch and, I think, the metaphor of "weight" I had introduced was not too far off. It is tempting (and logical) to express reference quality with something called "ranks". However, it is not the original intention. The basic concept of ranks, as I perceive it, is just to provide convenience and improve performance regarding queries. Apart from that, it would be possible to get along without ranks quite nicely since everything has an order in the first place. The concept of creating another layer of order by putting ranks on top of the basic order is hard to grasp. Mixing ranks with the existence or quality of references makes the matter even more complex since ranks receive meaning additional to the one applied by the original purpose and both, the original meaning and the additional meaning put on top, may even contradict. Consequently, I am appealing to get any relation to references out of ranks - even recommendations. The basic concept is abstract enough to be hard to understand and should not be overloaded with an additional one.
By the way, I like the "How-to" section! :) Random knowledge donator (talk) 09:03, 8 July 2014 (UTC)Reply

Qualifier reason for deprecated rank (P2241) edit

It might be worth mentioning the new qualifier. --- Jura 06:37, 13 November 2015 (UTC)Reply

How to tag disputed facts? edit

Is there a way to label a property value as disputed or controversial, so that it won't be used at places that display normal and preferred values? I'm thinking of cases like John Barrymore birth date, where scholars can't decide between the two known dates. Could we have a type of rank to mark such cases, so that the end user is warned not to take the property at face value? Diego Moya (talk) 12:09, 19 June 2016 (UTC)Reply

With the statement disputed by (P1310)   qualifier. author  TomT0m / talk page 13:59, 19 June 2016 (UTC)Reply
I think "normal rank" for both is fine. Obviously neither are preferred, but certainly neither are deprecated. --Izno (talk) 21:23, 19 June 2016 (UTC)Reply

Ranking combination? edit

In a statement with multiple values, what is the difference in choosing between Preferred over Normal, and Normal over Deprecated? Both values have reliable sources, but I find it confusing which rank combination is "safer/better" to use for the the most recent Population (P1082) for a town. Should I:

  1. leave the older population as Normal and set the new as Preferred
  2. set the old one as Deprecated and leave the new one as Normal?

Sanglahi86 (talk) 11:28, 8 October 2016 (UTC)Reply

@Sanglahi86side note : A statement has only one main value. You're referring to several statements with the same property, maybe you meant with a property with multiple value. It's the 1) solution. The deprecated rank is only for statements who are now believed to be wrong like mistakes. For exampel a population claim with, say 1000 inhabitants source by source X for date 1900 rank deprecated means the source X claims there were 1000 inhabitants in 1900 but we know know this is a wrong number. The property reason for deprecation Search can be added in such claims to give more details about the mistake that has been made at the time. Another example is for example someone that were believed to be dead but later was found alive. author  TomT0m / talk page 11:58, 8 October 2016 (UTC)Reply
Thank you very much for the info. Sanglahi86 (talk) 05:25, 9 October 2016 (UTC)Reply

help:deprecation <- translation admin edit

I have added a link and text to Help:Deprecation would a translation admin please review edit and confirm/amend and submit to have the translation updated. Thanks.  — billinghurst sDrewth 00:21, 11 March 2018 (UTC)Reply

Rank and sparql edit

Hello Is there a way in sparql to select the most accurate information (if there is a preferred ranked value, show it only, instead of other values) ? something like 'filter most recent and accurate values" ? For instance, if an element country was situated in Poland, then Lituania, then USSR, and now Poland, how do we show the current value which is Poland ?

--Bouzinac (talk) 10:45, 30 July 2018 (UTC)Reply
By default sparql will show only value(s) with the highest rank. The help page describess this with For templates and queries, per default preferred statement(s) for a property will be used if they exist, otherwise normal statement(s) will be used.. In your case the rank of the value Poland should be preferred. For instance Gdańsk (Q1792) should give in sparql country (P17) value Poland (Q36) as this is the perferred rank. Other values like German Empire (Q43287) will not be shown. To select also those values you should work with statement nodes as described in b:SPARQL/WIKIDATA Qualifiers, References and Ranks. HenkvD (talk) 21:52, 30 July 2018 (UTC)Reply
For multiples values for a same properties (say : population (P1082), patronage (P3872), etc), is there a way for a bot to automatically up-rank the lastest year value and make sure older years are at normal rank ? Bouzinac (talk) 09:53, 21 December 2018 (UTC)Reply

Link from the rank interface? edit

Perhaps this page should be linked to from the box for selecting ranks. --Yair rand (talk) 22:07, 24 October 2018 (UTC)Reply

Changing the rank with the API edit

Can someone answer here please? --Jobu0101 (talk) 09:27, 29 February 2020 (UTC)Reply

ŝanĝi la vorton “malpreferata” edit

Estas konflikto inter nomo de la rango “malpreferata” kaj ĝia priskribo “valoro estas malĝusta sed ofte kredata”.

“Preferi” laŭ ReVo: “elekti ion kiel pli bonan aŭ akceptindan” [1], do “malpreferi“ signifas (laŭ mi) “ne elekti ion pro esti malpli bona/akceptinda”, pri datumoj - “ne elekti ion pro ĝia malprecizo/malaktualeco”; tamen la priskribo diras “MALĜUSTA sed ofte kredata”.

En la pola versio de Vikidatumoj oni uzas la vorton “ranga nieaktualna” (neĝisdata/neaktuala rango); en la angla “deprecated” (evitinda); do eble necesus ŝanĝi nomon de tiu ĉi rango al: “erara” aŭ ”evitinda”.


Jen klarigo por kio mi volas uzi rangojn:

mi aldonas la aserton language of work or name (P407) al diversaj komputilaj programoj, sed kiel marki ke la programo/verko estis oficiale publikigita en iu lingvo, sed tiu lingvo ne estas plu prizorgata (de la komunumo - parolante pri malfermkoda programaro) do la traduko estas nekompleta?

1. Ĉu ŝanĝi la rangon de neprizorgata lingvo al “malpreferata” kaj aldoni reason for deprecated rank (P2241)=incompleteness (Q26162470)?

2. Ŝanĝi rangon de ĉiuj bonaj lingvoj al “preferata” kaj lasi la rangon “normala” al nekompletaj?


REDAKTO: mi probable trovis la malprecizajn frazojn en translatewiki.net: [2] [3]


Kastanoto (talk) 20:26, 19 August 2020 (UTC)Reply

Por nepluprizorgataj tradukoj kaj nelastaj eldonoj la kauzo eble devus esti "ne plu vera" au "pasinte". --AVRS (talk) 22:22, 19 August 2020 (UTC)Reply

Editing session during Data Quality Days 2021 edit

--- Jura 09:48, 8 September 2021 (UTC)Reply

Marking inactive social media accounts edit

A lot of folks have been deactivating their Twitter/X accounts over the last year or so. I've been different ways to express this in Wikidata. For example, deprecation is sometimes used to indicate accounts that were once active but no longer are (see this query, for example: https://w.wiki/82X2).

Help:Deprecation/List of Reasons for Deprecation lists several reasons that seem to apply here, such as "dormant account". But this seems to be in conflict with the policy to not use deprecation for statements that were once correct.

I personally like deprecation, because the red color makes it instantly visible that this is not the account you want to follow. It's also easy to query. Furthermore, end dates cannot always be ascertained -- an account may have been deleted at some point in the past, but there may be no source available that tells you when.

Should we:

  • continue to use deprecation for this? If so, I think the policy needs to be made clearer to be more consistent with actual practice, or these edits risk being disputed or reverted
  • use only a qualifier like "has characteristic: abandoned account" with no change to the statement's rank? If so, I think items like Q66107668 ("abandoned account") are confusing -- how can this be a valid reason for deprecation, as the item's description suggests, if the statement was once correct?
  • do something else?

Eloquence (talk) 00:33, 26 November 2023 (UTC)Reply

Use of deprecated rank is generally not consistent, and that is valid for your scenario as well. However, ranks can be considered as mere visibility controllers which by themselves do not convey any additional semantic meaning to a claim. Via reason for deprecated rank (P2241) and reason for preferred rank (P7452) qualifiers you can provide an editorial background for the use of ranks though. If you want to discourage users from using a particular claim (here: social media handle), there needs to be another claim for the same property with a higher rank. This can well be a no value claim with preferred rank for example. The "red color" for deprecated claims is not that relevant since the web UI is basically a tool for Wikidata editors only, not a place where data users access data (this is very different at Wikipedia).
Apart from that your assessment of the situation is pretty much correct, but as you can see it is difficult to infer clear direction. —MisterSynergy (talk) 00:56, 26 November 2023 (UTC)Reply
@Eloquence You could use has characteristic (P1552) - inactive (Q29415492). Like this Q2680952#P2847 RVA2869 (talk) 13:55, 26 November 2023 (UTC)Reply
@MisterSynergy, RVA2869:
Thank you for the responses! Right now, hundreds of notable accounts are migrating from Twitter/X to platforms like BlueSky and Mastodon, and we have no consistent way of querying that information. While I like the "has characteristic: inactive" way of expressing an account status, I don't like that there are so many ways to say the same thing (e.g., Q66107668, Q11381163). I know, I know -- probably a common issue in many areas of Wikidata. :-)
I think the best thing we can do here is probably to draft a specific guideline for how to manage account migrations and inactive identifiers on other platforms. That way, whatever we settle on, we can at least point people towards that and say "this is the preferred way of doing it". Does that make sense, or does such a guideline already exist beyond what's stated on this page?--Eloquence (talk) 07:03, 27 November 2023 (UTC)Reply
I lean towards deprecation and a reason. But I am open to other solutions too, as I find it important to show the status of deactivated accounts and finding a solution is almost more important than which one. However, I would prefer solutions that are clear and easy to query for, and the deprecation really fits that. Ainali (talk) 22:55, 1 December 2023 (UTC)Reply

(Un-indent)

So here's a rough guideline draft that I think we could add to Help:Ranking for now, maybe with a section redirect from Help:Deprecating accounts.

Deprecating accounts

When a person or organization makes a decision to no longer use an account on an external system, such as a social media platform, it is appropriate to mark such statements as "deprecated", regardless of whether such accounts were active in the past. This ensures that active accounts can be quickly identified and queried by rank. Please keep the following considerations in mind:

  • Always indicate a reason for deprecated rank.
    • For accounts that still exist but where the operating entity has made it clear that they are no longer in use, use abandoned account as the reason. References are especially important here since it may not be self-evident that an account was explicitly abandoned (e.g., there might be a blog post that offers an explanation).
    • For accounts that still exist but that have been inactive for a long period of time without any explicit explanation (suggested: >6 months), use dormant account as the reason.
    • For accounts that have been fully deactivated (i.e. deleted), use deactivated account as the reason.
  • Where possible, please indicate an end time.

Thoughts?--Eloquence (talk) 00:07, 2 December 2023 (UTC)Reply

I think this is great. Perhaps another bullet point would be useful too:
And perhaps the introduction to the list could include a reservation like "Always indicate a reason for deprecated rank. The list below holds the most common ones (others may exist in rare cases):" Ainali (talk) 08:36, 2 December 2023 (UTC)Reply
I think rules should be simple and have no exceptions. When we have a rule that deprecation is not used for statements which were once correct, I am in favor of being consistent everywhere. Therefore, I don't think deprecation is a good approach here and I instead support the solution using end time (P582) or has characteristic (P1552) as outlined above. I stand ready to help with standardization of our approaches, so that querying by qualifier is simple. Vojtěch Dostál (talk) 10:44, 2 December 2023 (UTC)Reply
@Vojtěch Dostál:
I think if were to try to standardize on usage of deprecation for things that were never true, that would be a very large effort well beyond the scope of just account migrations. We are very very far from consistency with what's stated in Help:Ranking as of today, from what I can tell.
In Help:Deprecation/List of Reasons for Deprecation we currently enumerate a ton of reasons that are inconsistent with the idea that a statement was never true: items that have been merged, wikis that have migrated, accounts that have been abandoned, accounts that are dormant, accounts that are compromised, accounts that have been suspended, statements that are obsolete (it even says "value was applicable in the past, but no longer is"), "subject of website changed", "former quality" (again, it explicitly says "denoting the value is no longer true"), "subject heading replaced by name authority", "this wiki has been disabled", "excommunicated", .... and so on and so forth.
As an editorial note, I think that's partially because this idea of "was never true" is very counterintuitive. It's not how the term "deprecation" is used in many other contexts -- e.g., in a technology context, we use "deprecation" to note that a feature that was once available should no longer be used. I understand the appeal of a was never true meaning, but I think we can make that clear by just using the "reasons for deprecated rank" that already exist -- some implicitly make it clear that a statement was never true, while others make it clear that it may have been true in the past.
In other words, when we have such a massive inconsistency with the stated guidelines, we can either embark on a vast standardization effort across Wikidata to track down all those uses of deprecation that contradict the "was never true" usage, or we could simply acknowledge that deprecation is used for both, and ask that people always specify a cause for deprecation, making it clear why it is being used.
We could update the main language in the wiki page to this effect as well, acknowledging current usage while putting some guardrails around it. That way we don't carve out any special rules for account migrations. Would that fit your "consistent everywhere" and "simple rules"? In my view it would only be describing actual practice, which has drifted very very far from what's stated in the guidelines.-Eloquence (talk) 22:10, 2 December 2023 (UTC)Reply
@Eloquence Help:Deprecation/List of Reasons for Deprecation is only an automatic table listing qualifiers which are used somewhere, it's not an authoritative help page... As for the stated "massive inconsistence", I don't think this reflects reality in Wikidata. My experience is that 99% of deprecation usages are correctly used on statements which were never true. Vojtěch Dostál (talk) 09:24, 3 December 2023 (UTC)Reply
Return to "Ranking" page.