Talk:Data retention guidelines: Difference between revisions

Content deleted Content added
Skalman (talk | contribs)
Design of new systems: just interested on an overview level
Skalman (talk | contribs)
How long do we retain non-public data?: should mention list of read articles
Line 37:
** ''"Non-personal information associated with a user account: Collected from user: Indefinitely"'' While the given examples seem okay, this category seems broad and that's particularly bad since the data is kept indefinitely. The given examples seem okay, since they're ''almost'' already public data (first edit, when a user has verified email, and whether the user edits through mobile are public data). E.g. the list of read articles is not public, but could be covered by this category.
**: That's a fair point. It's hard to draw a line and nail down what ''almost public'' data means, but the goal of this section and the list of examples we provided is to try and characterize this category of data as much as possible, without providing an exhaustive list (which we can't do, as Michelle notes below). The bottom line is that we want to commit to retaining indefinitely the same kind of data about individual users that we would be comfortable sharing publicly. What makes this data subject to different terms than metadata collected and published when saving an edit is that it's ''passively collected'' and not explicitly released under Wikimedia's terms of use. So, in short: while [[Schema:ServerSideAccountCreation|user X registered an account on a mobile device]] or [https://en.wikipedia.org/w/index.php?namespace=&tagfilter=visualeditor&title=Special%3ARecentChanges user X edited a page via Visual Editor] or [[Schema:Echo|user X was thanked by user Y for an edit s/he made]] could all be considered examples of ''almost public data'', as they don't disclose anything that falls within the definition of PII, "list of articles read by user X" definitely does: we can't and we won't retain or release this data, unless the user intentionally decides to do so). Maybe the best way to frame this distinction is to say that deciding whether ''almost public data'' could be publicly released is not a question settled on legal grounds but on whether it's appropriate and desirable (if needed, a decision could be made via a community consultation or an RFC). Michelle, is that an appropriate distinction? Hope this helps clarify what we're trying to do here, any suggestion to improve the language and terminology is welcome. [[User:DarTar|DarTar]] ([[User talk:DarTar|talk]]) 01:58, 31 January 2014 (UTC)
**:: [[User:DarTar|DarTar]]: A list of read articles is ''not'' explicitly listed as "personal information", nor is it explicit in the "How long do we retain public data?" table. I realize that the reason for this might be that it's simply not saved and thus not relevant, but I'd like to see it mentioned somewhere what WMF considers a list of read articles to be. If you wish you could add a note like "currently not kept at all", but things may change, and this seems like a basic piece of information. [[User:Skalman|//Shell]] 06:54, 11 February 2014 (UTC)
** ''"Non-personal information associated with a user account: Optionally provided by a user: Logs of terms entered into the site's search box"'' I realize that "optional" here means that not every WM site visitor must search, but since it's a key part of any wiki it doesn't feel like I "optionally provided" it - I ''must'' do it to see the article I'm interested in (ignoring other search engines). No biggie, but feels a bit weird.
**:I see your point here, Shell. We weren't sure how to best phrase the differentiation between information collected from the user and information provided by the user. We're open to suggestions though if you or anyone else has one. [[User:Mpaulson (WMF)|Mpaulson (WMF)]] ([[User talk:Mpaulson (WMF)|talk]]) 01:17, 10 January 2014 (UTC)
Return to "Data retention guidelines" page.