Wikipedia:Contributor copyright investigations/Darius Dhlomo

From Wikipedia, the free encyclopedia

This page has been removed from search engines' indexes.

This CCI case
CCI pages
CCI case main page
'bot task explanation
how to help
'bot approval discussion
cleanup discussion
changes to the 10,000 articles
Policy
Copyright policy
On this page

Quick overview[edit]

This is an investigation of the contributions of a long-term, very prolific editor (User:Darius Dhlomo, or "DD") who is currently blocked indefinitely. In approximately 4 years and 164,000 edits, DD created almost 10,000 articles and made non-minor contributions to over 13,000 more. Unfortunately, a large number of his contributions appear to contain text copied from other sources. All 23,000+ of these articles are now under investigation, with Uncle G's major work 'bot about to blank the articles DD created, for manual review by the editing community. A follow-on bot operation is under discussion to revert each of the other 13,000 articles to the revision just prior to DD's first nontrivial addition to that article.

The articles are almost entirely about athletes and sporting events, loaded with names and event results and the like. Quite a few, like Rishat Shafikov, have essentially no text and (by Wikipedia's understanding) no copyright problems since they contain only factual data. However, any nontrivial amount of continuous text ever inserted by DD into any article is considered suspect at this point. For more info, see the "scale of the problem" section of the ANI discussion. You can also participate at that page in more general discussion about how to deal with the incident.

Note: we don't have any particular reason to think that DD infringed copyrights on purpose. He instead seemed to have a poor understanding of WP's copyright policy and what constitutes unacceptable copying.

This list, created by VernoWhitney (talk · contribs), includes the 23,000+ articles which need review. Uncle G's major work 'bot (talk · contribs) is going to blank the first subset, placing this notice on them, whilst retaining the categories and interwiki links (and ensuring that the blanked pages do not show up on short page lists). See here for more on that and here for the 'bot discussion.

Instructions for editors on how to help are linked to from the notice on the blanked article and are what editors will find if they come across a blanked article.

Questions asked over and over again[edit]

Can't we just ignore the violations? This is good content.
Naturally it's good content. It's good content taken from someone else's work. It is being used without that person's explicit permission; and we don't have permission to pass it along to others as if it were free content, originally written by an actual Wikipedia editor. Remember our fundamental principles. We're here to make free content. This is a free encyclopaedia, not just an encyclopaedia.
Can't we just deal with the violating articles?
No. We don't know which ones they are. We've found no mechanical means for determining which articles to even look at. Automated processes such as CorenSearchBot only caught a very few of the articles the first time around. Darius Dhlomo's method of copying and pasting sentences and paragraphs in different orders, and interchanging proper nouns and pronouns, defeated them. That's the reason why we have to have humans review every article.

Note: User:Boissière has been experimenting with some approaches for coarse mechanical separation of articles that contain sizeable blocks of text from those that don't,[1] which may make spotting large vios easier, but human review will still be needed.

Can't we just leave the articles alone until they are reviewed?
No. We cannot continue having Wikipedia publish the articles indefinitely until we get around to reviewing them at some undefined point in the future. First, we have a legal obligation to actually take reasonable steps once we know that something like this has occurred. Second, we have an obligation to re-users of Wikipedia content, such as people who print articles into books, to only publish free content that they can re-use, and not have them end up fixing copyright violations into print.
Can't we just use our normal Wikipedia:Copyright problems process for this?
We are. If a human reviewing an article finds that it has been copied from elsewhere by Darius Dhlomo, the article will be sent to that board. But we cannot just send 23,000 articles through that process and expect some lone administrator to review them. The idea here is to distribute the review work to the Wikipedia editorship in general. After all, it's not just administrators that can check for copyright problems.
What about those articles that have been dealt with by the normal editing process?
Again, we don't know which ones they are until a human being comes along, reviews the edit history, and checks that none of the infringing content (if any) added to the article remains, even in disguised, edited, form. (Derivative works are also not permitted.)
Isn't this copyright paranoia?
To quote Angr, as supported by Cary Bass: There is no such thing here. We don't delete non-free content because we're afraid of getting sued. We delete non-free content because it's non-free. If you're about to point to avoid copyright paranoia, you should really have actually read it first. That page is where that statement comes from. Avoid accusing others of copyright paranoia is worth reading here, too.
Surely most people don't want this?
Actually, there were a fair number of people who pressed for more draconian measures than this. They wanted all of the tens of thousands of articles deleted to be restarted cleanly from scratch. As more volunteers arrived to evaluate the situation and to offer their help, this measure was decided as a more moderate solution. Several of the people who have since expressed an opinion that this is going too far have later acknowledged that they were basing that opinion on an underestimate of the scale of the problem.
Why aren't we just leaving this up to administrators?
Because, simply and frankly put, that's lazy, unfair, and unworkable. This is the whole community's problem. Things like this endanger the entire project that we are working upon. It's everyone's problem if we want to have an encyclopaedia around to work upon in the first place. Moreover, not having administrator privileges does not mean that one lacks the ability to check an article for copyright violations. Indeed, we rely upon non-administrators doing that on a daily basis. It's a part of our normal process. Furthermore, we simply don't have the administrator workforce to scale to this kind of task. The usual CCI investigation involves the dedicated work of just a handful of people. This task would take years to address. It took years for one editor to create, after all.

List of articles to clean up[edit]

Extended content
  • Page 1 - Articles created by Darius Dhlomo 1 through 1,000  Done
  • Page 2 - Articles created by Darius Dhlomo 1,001 through 2,000  Done
  • Page 3 - Articles created by Darius Dhlomo 2,001 through 3,000  Done
  • Page 4 - Articles created by Darius Dhlomo 3,001 through 4,000  Done
  • Page 5 - Articles created by Darius Dhlomo 4,001 through 5,000  Done
  • Page 6 - Articles created by Darius Dhlomo 5,001 through 6,000  Done
  • Page 7 - Articles created by Darius Dhlomo 6,001 through 7,000  Done
  • Page 8 - Articles created by Darius Dhlomo 7,001 through 8,000  Done
  • Page 9 - Articles created by Darius Dhlomo 8,001 through 9,000  Done
  • Page 10 - Articles created by Darius Dhlomo 9,001 through 9,657  Done
  • Page 11 - Articles with non-minor contributions by DD 1 through 1,000  Done
  • Page 12 - Articles with non-minor contributions by DD 1,001 through 2,000  Done
  • Page 13 - Articles with non-minor contributions by DD 2,001 through 3,000  Done
  • Page 14 - Articles with non-minor contributions by DD 3,001 through 4,000  Done
  • Page 15 - Articles with non-minor contributions by DD 4,001 through 5,000  Done
  • Page 16 - Articles with non-minor contributions by DD 5,001 through 6,000  Done
  • Page 17 - Articles with non-minor contributions by DD 6,001 through 7,000  Done
  • Page 18 - Articles with non-minor contributions by DD 7,001 through 8,000  Done
  • Page 19 - Articles with non-minor contributions by DD 8,001 through 9,000  Done
  • Page 20 - Articles with non-minor contributions by DD 9,001 through 10,000  Done
  • Page 21 - Articles with non-minor contributions by DD 10,001 through 11,000  Done
  • Page 22 - Articles with non-minor contributions by DD 11,001 through 12,000  Done
  • Page 23 - Articles with non-minor contributions by DD 12,001 through 13,000  Done
  • Page 24 - Articles with non-minor contributions by DD 13,001 through 13,542  Done

Cleanup instructions[edit]

All contributors with no history of copyright problems are welcome to contribute to clean up.

If contributors have been shown to have a history of extensive copyright violation, it may be assumed without further evidence that all of their major contributions are copyright violations, and they may be removed indiscriminately in accordance with Wikipedia:Copyright violations. However, to avoid collateral damage, efforts should be made when possible to verify infringement before removal.

When every section is completed, please alter the listing for this CCI at Wikipedia:CCI#Open_investigations to include the tag "completed=yes". This will alert a clerk that the listing needs to be archived.

  • {{CCI-open|Contributor name|Day Month Year|completed=yes}}

Finding articles to examine[edit]

If you're interested primarily in articles from a specific category (e.g. Volleyball players), you might try a category intersection tool like CATSCAN to locate articles in your category of interest that have been blanked by the bot. Articles blanked by the bot are in Category:Articles tagged for CCI copyright problems, so you'll want to intersect that category with your category.

Text[edit]

  • Examine the article or the diffs linked in the subpages of this CCI.
  • If the contributor has added creative content, either evaluate it carefully for copyright concerns or remove it.
  • If you remove text presumptively, place {{subst:CCI|name=Darius Dhlomo}} on the article's talk page.
  • If you specifically locate infringement and remove it (or revert to a previous clean version), place {{subst:cclean}} on the article's talk page. The url parameter may be optionally used to indicate source.
  • If there is insufficient creative content on the page for it to survive the removal of the text or it is impossible to extricate from subsequent improvements, replace it with {{subst:copyvio}}, linking to the investigation subpage in the url parameter. List the article as instructed at the copyright problems board, but you do not need to notify the contributor. Your note on the CCI investigation page serves that purpose.
  • To tag an article created by the contributor for presumptive deletion, place {{subst:copyvio|url=see talk}} on the article's face and {{subst:CCId|name=Darius Dhlomo}} on the article's talk page. List the article as instructed at the copyright problems board, but you do not need to notify the contributor.
  • After examining an article:
  • replace the diffs after the colon on the listing with indication of whether problem was found:
    • If a problem was found, add {{y}} ("Yes, there's a problem with this one.") If the article is blanked and may be deleted, please indicate as much after the {{y}}.
    • If a problem was not found, add {{n}} ("No, there's no problem with this one.")
  • Follow with your username and the time to indicate to others that the article has been evaluated and appropriately addressed. This is automatically generated by four tildes (~~~~)
  • If a section is complete, consider collapsing it by placing {{collapse top}} and {{collapse bottom}} beneath the section header and after the final listing.