Topic on Talk:Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity

Re: Contributors, tool developers and partner organizations can understand accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

5
Kerry Raymond (talkcontribs)

I would like it if the CopyPatrol had tools that understood about CC sources (it would be nice if the people did too) and other random editors who think they can spot copyright violation.

I work with a lot of material released under CC-BY, and I get tired of the accusations of copyvio (including speedy deletion of my new articles) despite my clear attribution within the article that it is drawing from an identified CC-BY source. I follow the rules but others don't even seem to understand those rules.

The first common misunderstanding is that an assertion of copyright over a webpage does override a CC notice. If an organisation is to license its content under CC-BY, then it must be the copyright holder to do so. There is no incompatibility between a copyright notice and a CC license; it's to be expected. It is not a case of "find a copyright statement and declare a copyvio", but look for the CC license.

In particular, our tools need to look beyond the individual webpage which does not contain a CC-BY statement, but to the copyright notice on the bottom of the page links to a copyright statement which CC-BYs all (or most) pages within the website. As an example, suppose I were to copy content (appropriately attributed) from this webpage onto Wikipedia:

https://www.business.qld.gov.au/industries/farms-fishing-forestry/agriculture/land-management/health-pests-weeds-diseases/weeds-diseases/woww

As you can see, there is nothing on this page that says it is CC-BY, but the link at the bottom of the page labelled Copyright links to this page:

https://www.qld.gov.au/legal/copyright

which says that the whole website (except where explicitly excluded) is CC-BY-4.0.

We also need easier ways to attribute suitably licensed CC-material, particularly when there are large CC-BY websites (such as the one above - the primary portal for the Queensland Government). I don't mind doing a "big" attribution statement when I created an article largely from a CC-BY source, e.g,

https://en.wikipedia.org/wiki/Junction_Park_State_School

but if I just want a few sentences from that source, and then a few sentences from another CC-BY source etc, then the attribution starts to become very heavyweight (could end up with more bytes of attribution than the text taken from those sources!) and indeed such "end of article" attribution cannot indicate which portions of the article were derived from it. I think it would be much better if the cite templates took a parameter to signal the licensing of a source where this is relevant and displayed the little "CC-BY" icon (or whichever license icon is applicable). Alternatively we need better attribution templates. Currently I have to write my own attribution templates as in the case above, or manually write an attribution as in:

https://en.wikipedia.org/wiki/New_Mapoon,_Queensland

because the existing templates are inadequate. Another piece of the problem is that some people believe you must use one of the existing templates to avoid the accusation of copyright violation.

CC-BY material is a great way to get a lot of content onto Wikipedia. In the example above, we were using such content to fill knowledge gaps on Wikipedia about Indigenous Australian communities (several articles were expanded using that Qld Govt website). Let's encourage the use of CC-BY content instead of the current practice of making it as unpleasant and as difficult as possible.

Now, above I am discussing CC-BY content. But if the content is CC0 or PD, then I don't have to make attribution at all. But again we have no clear policies and proceses in place about how to do this. I often write biographies based on obituaries that appeared in the pre-1955 Australian newspapers (which are both out-of-copyright in Australia and extensively digitised), e.g. I might want to write a biography based on this newspaper article:

https://trove.nla.gov.au/newspaper/article/53048297

Do I need to do anything more than cite it? Is there anywhere I should be asserting {{PD-Australia}}? Again, if the cite templates included the capability for me to add the {{PD-Australia}} to an appropriate field, there would be a clear assertion by me that this is public domain material and hence not a copyvio. At the moment, I left at the untender mercies of any random editor who decides to call it a copyright violation.

Ocaasi (WMF) (talkcontribs)

Hi Kerry,

Again it's a hard problem to distinguish license metadata on web pages, most of which is unstructured and not machine-readable. It's also hard to know that a CC statement is accurate, also because most people don't use a machine-readable format.

Speaking from my experience more as an editor than Foundation Staff, I believe it is customary to just link to the CC content in the edit summary of the addition. As far as citation template fields go, you'd likely want to propose this at https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1 which is the main citation template for ENWP.

Cheers, Jake

Kerry Raymond (talkcontribs)

I have previously asked at the citation template page without success. They don't see it as a matter for the reader.

Kerry Raymond (talkcontribs)

I think linking in the edit summary is insufficient for attribution, particularly when the license specifies a desired attribution. I don't think we encourage people to license on CC-BY if we don't make a genuine effort to attribute.

Ocaasi (WMF) (talkcontribs)

Kerry, I follow the guidance here: https://en.wikipedia.org/wiki/Wikipedia:Copying_within_Wikipedia.

"Copying content from another page within Wikipedia requires supplementary attribution to indicate it. At minimum, this means providing an edit summary at the destination page – that is, the page into which the material is copied – stating that content was copied, together with a link to the source (copied-from) page, e.g., copied content from [[page name]]; see that page's history for attribution. It is good practice, especially if copying is extensive, to make a note in an edit summary at the source page as well. Content reusers should also consider leaving notes at the talk pages of both source and destination....

The Wikimedia Foundation's Terms of Use are clear that attribution will be supplied: in any of the following fashions: a) through a hyperlink (where possible) or URL to the article or articles you contributed to, b) through a hyperlink (where possible) or URL to an alternative, stable online copy which is freely accessible, which conforms with the license, and which provides credit to the authors in a manner equivalent to the credit given on this website, or c) through a list of all authors. (Any list of authors may be filtered to exclude very small or irrelevant contributions.)...

Attribution can be provided in any of the fashions detailed in the Terms of Use (listed above), although methods (a) and (c) — i.e., through a hyperlink (where possible) or URL to the article or articles you contributed to; or through a list of all authors — are the most practical for transferring text from one Wikipedia page to another. Both methods have strengths and weaknesses, but either satisfies the licensing requirements if properly done.

Reply to "Re: Contributors, tool developers and partner organizations can understand accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories."