Data retention guidelines
This document helps explain how we fulfill this commitment, by describing our guidelines for data retention, system design, and ongoing auditing and maintenance. These guidelines are meant to be a living document — they will be updated over time to reflect current retention practices.
To what data do these guidelines apply? How long do we retain non-public data?
Unless otherwise indicated, we retain the following types of data for no more than the following periods of time:
For the purposes of this table, "user account" means username, user ID, or IP address; "reader" means visitor to a Wikimedia project.
How long do we retain public data?
Wikimedia hosts Wikipedia and the associated projects as part of our mission to collect, document, and freely distribute the sum of human knowledge to the world. Accordingly, when you make a contribution to any Wikimedia Site, including on user or discussion pages, you are creating a permanent, public record of every piece of content added, removed, or altered by you. The page history will show when your contribution or deletion was made, as well as your username (if you are signed in) or your IP address (if you are not signed in). We may use your public contributions, either aggregated with the public contributions of others or individually, to create new features or data-related products for you, or to learn more about how the Wikimedia Sites are used. If you mistakenly included your personal information in a contribution to a Wikimedia Site and you would like to have it removed, please consult the community’s oversight policy
. Keep in mind that the transparency and integrity of our sites’ revision histories is essential to our mission, and the Foundation supports our community’s right to reject oversight requests in order to protect the projects.
If you choose to register for an account with the Wikimedia projects, you will be asked to select a username. Usernames are retained until the user requests that the account be renamed
, or goes through the community courtesy vanishing
For the purposes of these guidelines:
Some examples of "public information" would include:
- (a) your IP address, if you edit without logging in;
- (b) your gender, if it is disclosed under your user profile;
- (c) any personal information you disclose publicly on the Wikimedia Sites, such as your real name or age.
Some examples of types of information that are considered to be "nonpublic information" include:
- (a) your IP address, if you edit while logged in;
- (b) your email address, if you provided one to us during account registration (but didn’t post it publicly); and
Data is "de-identified" when it has been aggregated or otherwise retained in a manner such that it can no longer be used to identify the user.
Data is "aggregated" when the data associated with a specific user has been combined with data from others to show general trends or values without identifying specific users.
An example of how data can be aggregated includes:
Using ranges rather than specific numbers, such as recording that there are "between 1 and 10 editors in language X in country Y" rather than recording that there are 4 editors.
Exceptions to these guidelines
If we make exceptions to these guidelines, we will notify the community by describing the exception on this page.
- Data may be retained in system backups for longer periods of time, not to exceed 5 years.
- When we conduct a survey or other research, we will provide you with a privacy statement specifying the term of retention for information (including personal information) collected through your participation in such research. In certain cases, information may be retained indefinitely for educational, development, or other related purposes, unless otherwise indicated in the relevant privacy statement. Such information may be retained in raw, aggregated, or de-identified form until we receive a request from the participant to delete the information.
- Research related to COVID-19: The Wikimedia Foundation Research team is conducting research regarding COVID-19 and its impact on Wikipedia. Retaining de-identified readership data from COVID-19 related articles will enable us to better understand how to prioritize content creation, to understand what happens to readership when there's a "shock to the system", and to empower the research community to answer such questions. By "COVID-19 related articles", we mean articles that link to the COVID-19, SARS-CoV-2 and 2019–20 COVID-19 pandemic Wikidata items. For comparison purposes, we will retain data from a small number of articles unrelated to COVID-19 as well. In order to collect sufficient data, and obtain a picture of readership as time passes, we will be retaining this de-identified data beyond the 90-day retention limit, for a period of one year, ending on March 1, 2021. (Note that this includes a one-month extension due to staffing changes, in order to allow for the project’s completion.). For technical details about the sampling and de-identification process, please see the project page on GitHub.
- Editing research: There is a short-term extension applying to data collected as part of experimental features to improve replying on talk pages. In order to collect and analyze sufficient data, this data must be kept beyond the standard 90-day period. The retained data will be deleted, aggregated, or de-identified within 180 days.
- KaiOS Wikipedia App: The Inuka Team at Wikimedia is researching whether providing content recommendations on the app homepage leads to increased engagement. In order to collect and analyze sufficient data, some data will be kept beyond the standard 90-day period. The retained data will be deleted, aggregated, or de-identified within 270 days. This retained data does not include any IP addresses or information about which articles are being read.
- investigate and defend ourselves against legal threats or actions;
- help protect against vandalism and abuse, fight harassment of other users, and generally try to minimize disruptive behavior on the Wikimedia Sites;
- prevent imminent and serious bodily harm or death to a person, or to protect our organization, employees, contractors, users, or the public; or
- detect, prevent, or otherwise assess and address potential spam, malware, fraud, abuse, unlawful activity, and security or technical concerns.
Audits and improvements
The Foundation is committed to continuous evaluation and improvement of these guidelines, and to periodic audits in order to identify such improvements. As we make changes to existing and systems, we will update these guidelines to reflect our changing practices.
Design of new systems
- inclusion of these data retention guidelines as requirements during the design process;
- legal consultation during the design and development process; and
- inclusion of privacy considerations in the code review process.
Ongoing handling of new information
Despite our best efforts in designing and deploying new systems, we may occasionally record personal information in a way that does not comply with these guidelines. When we discover such an oversight, we will promptly comply with the guidelines by deleting, aggregating, or de-identifying the information as appropriate.
If you think that these guidelines have potentially been breached, or if you have questions or comments about compliance with the guidelines, please contact us at privacy
Last edited on 3 August 2021, at 21:44
Content is available under CC BY-SA 3.0
unless otherwise noted.