Wikidata:Requests for comment/Cleaning up the ontology of anonymous
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Clear support for changing the model how we handle anonymous. I'll update Wikidata:WikiProject_Visual_arts/Item_structure#Use_of_creator_(P170)_in_uncertain_cases. Multichill (talk) 11:10, 30 May 2021 (UTC)
An editor has requested the community to provide input on "Cleaning up the ontology of anonymous" via the Requests for comment (RFC) process. This is the discussion page regarding the issue. If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
Currently, anonymous (Q4233718)instance of (P31)human (Q5). This means that if someone asks whether question that check whether two books are by the same author, Wikidata will answer that they are from the same author the author is set in both cases as anonymous. If unknown value would be used instead of anonymous (Q4233718) this problem wouldn't appear. To the extend that it's desired to distinguish unknown (Q24238356), untraceable copyright owner (Q60711924) and anonymous (Q4233718), object of statement has role (P3831) can be used as a qualify to distinguish those.
Given that the discussion on the talk page of anonymous (Q4233718) doesn't progress, I think it's a good idea to solve the issue here through a RfC. ChristianKl ❪✉❫ 11:53, 9 November 2020 (UTC)
Discussion
[edit]- There seem to be several questions: what to use for creator (P170) (notably by paintings)? what statements to have on anonymous (Q4233718) ? how to query works by anonymous (Q4233718)? A change to one of these obviously breaks the other two for those who had been using them over the last 8 years. --- Jura12:16, 9 November 2020 (UTC)
WikiProject sum of all paintings has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. --- Jura12:16, 9 November 2020 (UTC)
- I think the third question isn't "how to query" but what do we mean by "instance of". Do we mean the thing that's usually meant when the concept is used outside of Wikidata or something that works different in a few expections and where people who assume that "instance of" means what it appears to mean get into problems when an expection is in the data at which they are looking. ChristianKl ❪✉❫ 13:16, 9 November 2020 (UTC)
- My opinions on this: (1) anonymous definitely should not be an instance of human; it is an abstract concept, not a specific person. In cases where there is a specific individual or group that goes by the name "Anonymous" then that should be a separate item specific to that context. (2) For the value of an author, creator, etc. statement, 'unknown value' seemss the right choice, not an "anonymous" item. If the author name was specifically stated to be "Anonymous" (or equivalent in some other language) then use that string value in a object named as (P1932) qualifier. ArthurPSmith (talk) 17:46, 9 November 2020 (UTC)
- I agree with ArthurPSmith above. I understand that this is different from the documented best practice for creative works, which is based on art historical practice. The long-standing Wikidata:WikiProject Visual arts/Item structure best practice is "creator (P170) ... maker of this creative work or other object (where no more specific property exists). Paintings with unknown painters, use "anonymous" (Q4233718)". I think it is most important to try to achieve consensus on this one way or the other, and then change existing items to match the new consensus. - PKM (talk) 21:39, 9 November 2020 (UTC)
- My concern with using "unknown value" is that changing the snaktype can be a difficult learning curve for users (especially when dealing with large uploads) and departs from curatorial practices where "anonymous" is clearly understood. I also think sometimes users might want to query for texts that are anonymously written or attributed, so it helps to have anonymous as a statement value. Rather than reinvent the wheel, can't we just filter anonymous from queries for creators? Valeriummaximum (talk) 11:52, 11 November 2020 (UTC)
- We could use an edit filter to warn users when they add anonymous (Q4233718) as a statement value. I wrote a filter that should be able to detect incorrect usage of anonymous (Q4233718), and tells editors the correct way to add it. (it would be helpful if an admin could verify and add that edit filter and set it to Log, so there is a list of edits that this filter would warn on if later set to Warn). We could also use a bot to automatically change anonymous (Q4233718) to unknown value. --SixTwoEight (talk) 16:04, 11 November 2020 (UTC)
- @Valeriummaximum: Do you not run into the question of finding paintings by the same person, where a single "anonymous" value would lead to incorrect conclusions? But if this is all handled properly in the context of artworks and standard curatorial practice, then I wouldn't be opposed to using for example the current anonymous (Q4233718) restricted to that context (i.e. use only for artworks where the meaning of "anonymous" is well understood). ArthurPSmith (talk) 18:21, 11 November 2020 (UTC)
- @ ArthurPSmith, SixTwoEight: I totally agree with you all that it is a weird ontology. When we say a painting was created by "anonymous" and another painting was created by "anonymous", we end up with a curious result that "anonymous" created thousands of paintings and other works of art throughout all of human history...but it seems like an issue that queriers in visual arts community already know how to deal with. If someone was querying for prolific painters in the 16th century, they would know to filter items created by an anonymous creator; alternatively, if they wanted to find examples of unattributed 16th century painters, the anonymous value would be useful (people who work on the topic of forgery and attribution would, I think, want to find a list of paintings that have anonymous creators). It's an ontology people in the visual arts group already use. I fear that changing the snaktype, while strictly more accurate, might present difficulties in the long-run (I think especially of institutions that want to upload large amounts of data--changing snaktypes with large uploads is not always easy with existing tools). I think changing anonymous (Q4233718) to be a subclass of human is a good fix.Valeriummaximum (talk) 12:38, 12 November 2020 (UTC)
How about subclass of (P279): human (Q5)? NMaia (talk) 23:07, 9 November 2020 (UTC)
- Using subclass of (P279) seems fine. ChristianKl ❪✉❫ 00:27, 10 November 2020 (UTC)
- I might think of a reason why this doesn't work, but at first thought yes subclass of (P279) seems ok. ArthurPSmith (talk) 18:36, 10 November 2020 (UTC)
I have written a query to find properties that use anonymous to identify people (it excludes some legitimate usage of anonymous (Q4233718), such as for named after). In total, 25 properties currently have anonymous (Q4233718) as a statement value. I've already cleaned up a few properties with just a few uses of anonymous (Q4233718). -- SixTwoEight (talk) 15:40, 10 November 2020 (UTC)
It has been a while since I last brought it up. In the meantime we got structured data on Commons and here we made much more extensive use of unknown value. I would love to change the way we use anonymous (Q4233718). I would say creator (P170) -> unknown value and use object of statement has role (P3831) -> anonymous (Q4233718) and keep all the other qualifiers the same. So the query would change from "?item wdt:P170 wd:Q4233718" to "?item p:P170/pq:P3831 wd:Q4233718". We can just have a bot do these changes and do clean up on a regular basis. Ping project doesn't work for large projects so I'll leave a message on the talk page. Multichill (talk) 20:49, 11 November 2020 (UTC)
- I personally like this approach. I think we need to keep "anonymous" in the statement. I'm just not sure if object of statement has role (P3831) is an intuitive qualifier to use. I am also still a little worried about using "unknown value" because I have problems with this on quickstatements--I tried to run an upload on the sandbox
qid,P170,qal3831 / Q4115189,somevalue,Q4233718
and it would not run the qualifier (though I see user:SixTwoEight could do it, so there's clearly some issue in my commands but I do worry for this reason that using somevalue increases the entry barrier for institutions to upload their data).Valeriummaximum (talk) 13:02, 12 November 2020 (UTC)- What you entered should work, but there's a bug in QuickStatements that causes it not to be able to add qualifiers or remove statements with unknown value or no value. The reason it worked for me was because I was testing a bugfix that I wrote that fixes the issue. (that's also why those edits are tagged with "[TEST, do not approve] QS3", since it's my local QuickStatements OAuth app). --SixTwoEight (talk) 13:15, 12 November 2020 (UTC)
- When a tool is broken, the tool should be fixed. That shouldn't be a reason to block a change the data model. In the meantime you can use placeholder for "somevalue" (Q53569537). Multichill (talk) 11:57, 14 November 2020 (UTC)
- @ Multichill:Completely agree--but I'm just cautious about an editorial guideline that makes something very simple ('item--has creator--anonymous') to something very complex ('item--has creator--somevalue, object has role--anonymous') which will not be intuitive to editors or queriers or reflect existing standards. I agree that bugs in tools shouldn't be constraints on data models but maintaining the accessibility of WD should be a priority. Valeriummaximum (talk) 14:06, 14 November 2020 (UTC)
- If there's a bug in QuickStatments that makes it hard to enter this data, then that bug should likely be solved before making a change like the one we are contemplating here.
- @ Multichill:Completely agree--but I'm just cautious about an editorial guideline that makes something very simple ('item--has creator--anonymous') to something very complex ('item--has creator--somevalue, object has role--anonymous') which will not be intuitive to editors or queriers or reflect existing standards. I agree that bugs in tools shouldn't be constraints on data models but maintaining the accessibility of WD should be a priority. Valeriummaximum (talk) 14:06, 14 November 2020 (UTC)
- When a tool is broken, the tool should be fixed. That shouldn't be a reason to block a change the data model. In the meantime you can use placeholder for "somevalue" (Q53569537). Multichill (talk) 11:57, 14 November 2020 (UTC)
- What you entered should work, but there's a bug in QuickStatements that causes it not to be able to add qualifiers or remove statements with unknown value or no value. The reason it worked for me was because I was testing a bugfix that I wrote that fixes the issue. (that's also why those edits are tagged with "[TEST, do not approve] QS3", since it's my local QuickStatements OAuth app). --SixTwoEight (talk) 13:15, 12 November 2020 (UTC)
- As far as accessibility goes, having expections to general principles might increase accissiblity for a particular use-case but decreases it for other when people have to learn that there's one example where P31 doesn't mean what it means everywhere else. ChristianKl ❪✉❫ 20:56, 19 November 2020 (UTC)
- @Magnus Manske, Valeriummaximum, ChristianKl: I've written a patch to fix this.
- Clone https://github.com/magnusmanske/quickstatements, do
curl https://www.wikidata.org/wiki/User:SixTwoEight/QS_fix?action=raw > 628fix.patch && git am < 628fix.patch
, and you'll be running the fixed version (git am
will add it as a commit to the current branch). --SixTwoEight (talk) 23:15, 19 November 2020 (UTC)- @SixTwoEight: did you also do a pull request? I don't see it at https://github.com/magnusmanske/quickstatements/pulls . If not, can you please do a pull request? Multichill (talk) 11:29, 21 November 2020 (UTC)
- @Multichill:
Done: https://github.com/magnusmanske/quickstatements/pull/8 --SixTwoEight (talk) 22:26, 22 November 2020 (UTC)
- @SixTwoEight: When I click on that link I see a 404. Is it now merged? What's the current state? ChristianKl ❪✉❫ 00:22, 10 December 2020 (UTC)
- @ChristianKl: It appears GitHub's spam detection has (incorrectly) assumed I was a spambot creating that pull request, so has silently removed it. Manually committing the change from the command line should still work though. --SixTwoEight (talk) 01:56, 10 December 2020 (UTC)
- @SixTwoEight: When I click on that link I see a 404. Is it now merged? What's the current state? ChristianKl ❪✉❫ 00:22, 10 December 2020 (UTC)
- @Multichill:
- @SixTwoEight: did you also do a pull request? I don't see it at https://github.com/magnusmanske/quickstatements/pulls . If not, can you please do a pull request? Multichill (talk) 11:29, 21 November 2020 (UTC)
- I don't understand why people insist that anonymous is not human. For painters, anonymous always refers to at least one human, and sometimes more than one human. Jane023 (talk) 10:09, 30 November 2020 (UTC)
- @Jane023: It’s analog to the difference between « human » an « a distinct human ». The same as of a class like « boat » and an instance of that class like « Titanic ». Usually the meaning of a property like « author » is that the value of that property is the « distinct human » that is the author. If anonymous is to be treated like this, then « anonymous » is the distinct human and he created well, a lot of stuffs. Imagine you want to know which author has created the most work in Wikidata. There is chances this author is « anonymous » in a naive query. Of course it’s not exactly fair to the other ones as we intend the property « author » to refer to a distinct human, and not a value like « some human who did not gave its name ». A usual author item is intended to denote a distinct human ! Whereas anonymous is not, basically it means « some human ». So we’ll have to add a special treatment to the query, an exception, to handle that case.
- The thing is that to encode the fact that « someone but we don’t know whom », Wikidata has from the start a dedicated mechanism : the special value unknown valueHelp. Which has to be dealt with in queries as well if it’s encountered, but it’s a standard mechanism.
- To take the « boat analogy », it’s like if we add a property for travels to indicate that in a journey someone took a boat but we don’t know which one we used
- Which is not very useful because we know that the « took the boat » property has boats as values … and specific boat like « Titanic »
- Whereas a more idiomatic encoding in Wikidata would be
- There could be subtleties like if we knew if the boat was a sail boat or a cargo ship or whatever, which could be dealt with qualifiers of the statement, the same way as if we know the author signed « John Doe » or « anonymous » but we don’t know who it is we could use
- We would still have the information « John Doe » but no problem dealing with it in the query and risk to confuse John Doe as a real prolific author. author TomT0m / talk page11:10, 30 November 2020 (UTC)
- That being said, I guess I don't understand the risk invovled with attributing millions of works to the highly prolific Mr. Anonymous when this has become artistic convention for centuries. There must be other ways of dealing with this without interfering with the Qid for anonymous as creator or author. For data enthusiasts and art enthusiasts alike, please come up with some other method because I think the proposal is terrible. I see no reason to go against establish practise to include some complex item+qualifier for common usage of anonymous. Jane023 (talk) 11:41, 30 November 2020 (UTC)
- @Jane023: unknown valueHelp is not an item. It’s a special value dedicated to this kind of usecase on Wikidata. The goal is not to go to well established practice to use an « anonymous » value in database. The use of an « anonymous » value in common databases is on-contrary very often a work-around against the non-availability of a special value like unknown valueHelp. We should also note that of course common databases like authority do not deal with anonymous authors with a common « anonymous » item : they have several anonymous. For example the BNF in France : https://data.bnf.fr/fr/search?term=anonyme with one of them https://data.bnf.fr/fr/17821400/anonyme/ : note that this anonymous has its own id : 17821400 . This would mean to do the same on Wikidata it would have its own item, there would be plenty of anonymous items to do like this well established bibliographical data practice, which is not what you’d want, I guess.
- PS: to access those special values click on the rectangle icons next to the ranking one, on the left of the item name text zone when editing a statement, if you don’t know how. author TomT0m / talk page12:27, 30 November 2020 (UTC)
- @Jane023: Linked data isn't centuries old. The practice for thousands of papers is to record authority information on paper and not computers. If you like that practice it makes sense to continue with the paper. Recording information in linked data means that computers can interact in new ways with data and you can run queries instead of doing certain research by hand. Humans who interact with a piece of paper in which "author:anonymous" is written are not going to think that they are dealing with a human who's named anonymous. A human who reads information is able to make this distinction that the word anonymous is no human. When you tell a computer that anonymous is a human like Wikidata is currently, the computer is going to trust that anonymous is a human which is silly. ChristianKl ❪✉❫ 12:38, 30 November 2020 (UTC)
- It doesn't matter whether you use old fashioned card catalogues or modern-day linked data: if you link the data to unknown or to anonymous you will still have the same result: Many works will not be ascribed to the appropriate creator/author. I suppose you will want to do the same with private collection and I will object to that for the same reason. Jane023 (talk) 14:47, 30 November 2020 (UTC)
- @Jane023: Nothing is « linked to » unknown. Technically there is as many « unknown » as there is statements with unknown values is the sparql queries. For example see the values « ?unknown_author_id » of this query which searches for « author » statements with « unknown value ». The first result in my try is « Arbatel » with unknown author id « t7395251 ». « t7395251 » is different from any other unknown value in Wikidata, it’s unique and never reused. This is exactly analog to how the BNF has several ids for unknown authors (see for example the url of my previous message which has an anonyme with number 17821400 and other « anonyme » with different numbers (the only difference is that there could be several works authors with the same anonymous authors, but we could do the same in Wikidata by creating an item for this author and indicating we don’t know its name.) I think this is where we don’t understand each other. The situation is very different from having all the statement linked to a unique anonymous item as value. author TomT0m / talk page15:21, 30 November 2020 (UTC)
- I don't quite follow but I think what you want is "author name string" as a placeholder for cases where the author is known but doesn't have an item. That is not the same as anonymous. Jane023 (talk) 17:10, 30 November 2020 (UTC)
- Indeed if you think that we clearly don’t understand each other at all :) What I want is to use the Wikidata special value unknown value as a value for author (P50)
statements. I must repeat because this is very important that this special value is not an item. I can reexplain if you don’t know how to set it if needed : when editing a statement look for this icon :
. For example the statement Q626927#P50 uses unknown valueHelp and you can’t click on its value to go to an item. In the end we don’t have to create zillions of unknown items, nor we have to link zillions of works to a unique « anonymous » item.
- Even if you don’t understand everything, please trust us that it makes a real difference technically in the way we query the datas.author TomT0m / talk page17:52, 30 November 2020 (UTC)
- Indeed if you think that we clearly don’t understand each other at all :) What I want is to use the Wikidata special value unknown value as a value for author (P50)
- I don't quite follow but I think what you want is "author name string" as a placeholder for cases where the author is known but doesn't have an item. That is not the same as anonymous. Jane023 (talk) 17:10, 30 November 2020 (UTC)
- @Jane023: We already have people who add regularly statements to anonymous (Q4233718) that about having items in certain collections. It seems that people on Wikidata already agree that those statements are not useful and remove them. Having 1000 statements on anonymous (Q4233718) about collections that have anonymous creators is a bad idea. I do think we should continue not having those statements.
- It's in the nature of what most people mean when they say human that if you are a human that authored book A and you are also a human that authored book B that books A and B are authored by the same person.
- It's the nature of unknown authorship that if two books have unknown authorship they don't necessarily have the same author. The Wikidata data model has the unknown valueHelp to mark unknowns in a way. A programmer doesn't need to know about individual items to handle unknown valueHelp as it's a core feature of the data model of Wikibase. ChristianKl ❪✉❫ 16:25, 30 November 2020 (UTC)
- I still don't see the added value of changing the use of anonymous to unknown. It seems just to be a case of either one person doing a million things, or a million things with unique unknown items. Both are equally undesirable, but one is according to convention and the other is not. Jane023 (talk) 17:13, 30 November 2020 (UTC)
- It seems just to be a case of either one person doing a million things, or a million things with unique unknown items. it must be repeated that using unknown valueHelp means none of these two. You can see each unknown valueHelp as its own virtual item that do not have to be created. As there is many of these virtual not created items, it’s not a person who did one million things as in the anonymous (Q59755918) case. author TomT0m / talk page17:52, 30 November 2020 (UTC)
- You seem to be stuck on the word link. Think of it this way: with all anoymous works linked to Mr. Anonymous, there is lots of overlap per period and place among the objects. So for any given period or place, some of those works are in fact by the same creator. In your proposed vision, there will be no overlap at all among creators of such objects. Jane023 (talk) 08:17, 1 December 2020 (UTC)
- @Jane023: Not really true, conceptually speaking. In the « anonymous item » model, it’s as is everything overlapped, which is clearly wrong. In the « unknown » model … unknown values are neither assumed to be the same nor they really are assumed to be different (we can’t make the so called unique name assumption). It’s just that if we don’t add other data we can’t assume two unknown are the same. In other word « unknown » is just perfect to express … well « we don’t know », whereas with « anonymous item » we just try to say « ok, we don’t know, but let’s just do exactly as we knew it was a unique author that just wrote everything », which is … well way more twisted.
- Now if you want to query the overlaps between the « unknown » authors work to find candidates group of works we could have reasons to think there are from the same author, nothing stops you to do so in the « unknown » model. And if there is good source that make this hypothesis solid, we could even create an item and promote the « unknown values » to an item.
- And, anyway, any Wikidatan should be aware of unknowns because they virtually can pop pup everywhere on Wikidata, on any statement with any property … So … why make things more complicated than they should be and import another way to model that we don’t know who wrote something ? One Wikidatan is perfectly legitimate to use unknown for a work we don’t know who wrote. So basically with « anonymous item » we just add a second way to do this, which is not that good and redundant. This adds not really needed complexity, both conceptual and technical into handling special cases. author TomT0m / talk page09:44, 1 December 2020 (UTC)
- Well we have the concept of "notname" which is a handy shortcut for painters of a period and place whose works have been grouped together and documented as such. I have a problemm with doing this for artists where it has not explicitly been agreed by professional art historians. So I guess my main problem is with your willingness to overthrow the documented "anonymous model" in use currently by an overwhelming number of parties in the art world, to a much less specific Wikidata "unknown model" that will cause an explosion of possibilities at the work level and does not appear to serve any purpose other than to confuse newbies in the Wikidata user interface. Jane023 (talk) 10:34, 1 December 2020 (UTC)
- @Jane023: after reading the first sentences of en:Notname, this seems exactly analog to the procedure create an item for each notname in Wikidata. « is an invented name given to an artist whose identity has been lost » . This item is a person,
- (even
- …
- Replace all unknown values you can relate to this notname in the related revelant work and you are done. And this is exactly what I tried to explain earlier. So I don’t know what the problem would be. This is very different from linking all unknowns to a unique « anonymous » item in the sense that each notname is linked to an identified unique artist … As far as I understand, anonymous is not a notname, unless there is several anonymous and a way to disambiguate. author TomT0m / talk page10:49, 1 December 2020 (UTC)
- I am glad you agree. This is precisely why using unknown would not be suitable for anonymous either. Jane023 (talk) 12:03, 1 December 2020 (UTC)
- @Jane023: I just explained exactly the opposite. The « notname » model is perfectly OK to use together with the unknown valueHelp one and not incompatible at all : if there is almost nothing known about this anonymous, especially if we can’t find any other work likely to be tight to him, just use unknown valueHelp as the author (P50)
statement of the work. It’s not worth creating an item. If we know some stuffs, and other works of the same authors, create an item for this author with its notname as a label, and use this item in the author (P50)
statements where relevant. This is precisely why using unknown would not be suitable for anonymous either. I’m guessing you imply that we know something about the author, so we can’t say it’s unknown ? I must then object this is not really correct. Let’s say that it’s just that the little we know about it just does not really make it worth the effort creating an item for it. If we just know about him that he wrote the work, hence that he lived at the time the work has been written, and that we could not find any other plausible work of the same author … then why bother creating an item ? unknown value is in the end just a placeholder for something we can’t really identify or we know we don’t have much information about it. But in the end we know there is a human who could have its item if we knew more. author TomT0m / talk page12:21, 1 December 2020 (UTC)
- I realize you might want to refer to the case when an author published under a pseudonym or under an anonymous/without signing. Then he published anonymously. Using unknown valueHelp is perfectly OK in this case with author (P50)
because it means we know there is a human but we don’t know whom. If the author is later revealed we could build another statement with preferred rank with the right item. But it’s still true that the text is not signed or anonymously … then this information is still available if we still have two author statements, actually … and we can add the real signing with other statements that exists on Wikidata. author TomT0m / talk page12:33, 1 December 2020 (UTC)
- Nope. I am referring to the case where art historians have indicated attribution and whether it is notname or anonymous it is established practise. Jane023 (talk) 13:31, 1 December 2020 (UTC)
- @Jane023: I don’t quite understand why a simple rule « when art historian say anonymous put « unknown value » on Wikidata when it’s a notname create an item for this » » is incompatible with those practices and with what art historian says. Quite the opposite I think this is exactly compatible … author TomT0m / talk page13:42, 1 December 2020 (UTC)
- Your simple rule assumes that that the person inputting the information understands that it's OK to say this artwork was created by a possibly non-human entity such as an alien, a magical process, or some AI painting creation process. The simple fact of the matter is that most people adding paintings to Wikidata are using metadata from other sources where this Wikidata unknown translation of the anonymous concept simply doesn't exist. Jane023 (talk) 13:49, 1 December 2020 (UTC)
- Does not seem like a big issue to me, it can always been changed afterward, pretty straightforward to detect and just a question of … telling people. We’re all able to learn. author TomT0m / talk page14:16, 1 December 2020 (UTC)
- Fine. Your interpretation of "cleaning up the ontology of anonymous is ramming through replacement of the term "anonymous" with another term "unknown" juts because it's easy to do technically. Thanks for spelling that out. Jane023 (talk) 10:08, 2 December 2020 (UTC)
- @Jane023: I give up :( It’s not at all that. Once again, unknown value is everything but a term. And no, my argumentation is not at all reducible to it’s easy to do. But if I did not make that clear at this point it’s worthless trying to find other words or way to explain, I’ll just repeat myself. author TomT0m / talk page16:47, 10 December 2020 (UTC)
- Fine. Your interpretation of "cleaning up the ontology of anonymous is ramming through replacement of the term "anonymous" with another term "unknown" juts because it's easy to do technically. Thanks for spelling that out. Jane023 (talk) 10:08, 2 December 2020 (UTC)
- Does not seem like a big issue to me, it can always been changed afterward, pretty straightforward to detect and just a question of … telling people. We’re all able to learn. author TomT0m / talk page14:16, 1 December 2020 (UTC)
- Your simple rule assumes that that the person inputting the information understands that it's OK to say this artwork was created by a possibly non-human entity such as an alien, a magical process, or some AI painting creation process. The simple fact of the matter is that most people adding paintings to Wikidata are using metadata from other sources where this Wikidata unknown translation of the anonymous concept simply doesn't exist. Jane023 (talk) 13:49, 1 December 2020 (UTC)
- @Jane023: I don’t quite understand why a simple rule « when art historian say anonymous put « unknown value » on Wikidata when it’s a notname create an item for this » » is incompatible with those practices and with what art historian says. Quite the opposite I think this is exactly compatible … author TomT0m / talk page13:42, 1 December 2020 (UTC)
- Nope. I am referring to the case where art historians have indicated attribution and whether it is notname or anonymous it is established practise. Jane023 (talk) 13:31, 1 December 2020 (UTC)
- @Jane023: I just explained exactly the opposite. The « notname » model is perfectly OK to use together with the unknown valueHelp one and not incompatible at all : if there is almost nothing known about this anonymous, especially if we can’t find any other work likely to be tight to him, just use unknown valueHelp as the author (P50)
- I am glad you agree. This is precisely why using unknown would not be suitable for anonymous either. Jane023 (talk) 12:03, 1 December 2020 (UTC)
- @Jane023: after reading the first sentences of en:Notname, this seems exactly analog to the procedure create an item for each notname in Wikidata. « is an invented name given to an artist whose identity has been lost » . This item is a person,
- Well we have the concept of "notname" which is a handy shortcut for painters of a period and place whose works have been grouped together and documented as such. I have a problemm with doing this for artists where it has not explicitly been agreed by professional art historians. So I guess my main problem is with your willingness to overthrow the documented "anonymous model" in use currently by an overwhelming number of parties in the art world, to a much less specific Wikidata "unknown model" that will cause an explosion of possibilities at the work level and does not appear to serve any purpose other than to confuse newbies in the Wikidata user interface. Jane023 (talk) 10:34, 1 December 2020 (UTC)
- You seem to be stuck on the word link. Think of it this way: with all anoymous works linked to Mr. Anonymous, there is lots of overlap per period and place among the objects. So for any given period or place, some of those works are in fact by the same creator. In your proposed vision, there will be no overlap at all among creators of such objects. Jane023 (talk) 08:17, 1 December 2020 (UTC)
- It seems just to be a case of either one person doing a million things, or a million things with unique unknown items. it must be repeated that using unknown valueHelp means none of these two. You can see each unknown valueHelp as its own virtual item that do not have to be created. As there is many of these virtual not created items, it’s not a person who did one million things as in the anonymous (Q59755918) case. author TomT0m / talk page17:52, 30 November 2020 (UTC)
- I still don't see the added value of changing the use of anonymous to unknown. It seems just to be a case of either one person doing a million things, or a million things with unique unknown items. Both are equally undesirable, but one is according to convention and the other is not. Jane023 (talk) 17:13, 30 November 2020 (UTC)
- @Jane023: Nothing is « linked to » unknown. Technically there is as many « unknown » as there is statements with unknown values is the sparql queries. For example see the values « ?unknown_author_id » of this query which searches for « author » statements with « unknown value ». The first result in my try is « Arbatel » with unknown author id « t7395251 ». « t7395251 » is different from any other unknown value in Wikidata, it’s unique and never reused. This is exactly analog to how the BNF has several ids for unknown authors (see for example the url of my previous message which has an anonyme with number 17821400 and other « anonyme » with different numbers (the only difference is that there could be several works authors with the same anonymous authors, but we could do the same in Wikidata by creating an item for this author and indicating we don’t know its name.) I think this is where we don’t understand each other. The situation is very different from having all the statement linked to a unique anonymous item as value. author TomT0m / talk page15:21, 30 November 2020 (UTC)
- It doesn't matter whether you use old fashioned card catalogues or modern-day linked data: if you link the data to unknown or to anonymous you will still have the same result: Many works will not be ascribed to the appropriate creator/author. I suppose you will want to do the same with private collection and I will object to that for the same reason. Jane023 (talk) 14:47, 30 November 2020 (UTC)
- That being said, I guess I don't understand the risk invovled with attributing millions of works to the highly prolific Mr. Anonymous when this has become artistic convention for centuries. There must be other ways of dealing with this without interfering with the Qid for anonymous as creator or author. For data enthusiasts and art enthusiasts alike, please come up with some other method because I think the proposal is terrible. I see no reason to go against establish practise to include some complex item+qualifier for common usage of anonymous. Jane023 (talk) 11:41, 30 November 2020 (UTC)
- When I think of the term "ontology of anonymous" in the context of artists, since my area of interest is 17th-century paintings, I tend to think of all the ways I have used anonymous in the past and some of the questions I have today that are unsolved on Wikidata. First of all, most paintings from the 17th-century that still survive today have been attributed in the past to multiple people, including anonymous. Often the "anonymous" used in older texts (so from the 17th century onwards) do refer to period artists, and sometimes (from the 18th century onwards) they refer to later good faith copyists, often genteel women or paid copyists, who made copies for use in personal chapels or decoration of new buildings. Given this background, some later misattributions were done naively, and some were bad faith attributions to hide the painting in plain sight (e.g. during post-WWII sales of stolen Jewish art). In each case where it is known and documented, I add all attributions and use "preferred" for the one in use most recently, and "deprecated" for ones rejected. For what are most commonly called "doubtful" in catalogs, I will use what the holding institution says, and otherwise anonymous with the possible qualifiers "after a work by", "manner of", or "workshop of". These big three are important in different ways. The first one, "after a work by" implies there is an earlier work of the same subject and I will look it up and add it with "based on" and a link back with "derivative work". The second, "manner of", implies an artist who was influenced by the target artist's work (e.g. John Constable's clouds were influenced by Jacob van Ruisdael). These people are much harder to trace and track over time, but with the size of Wikidata's dataset I do believe we should be better able to do this and I am still looking for ways to capture this information. The third, "workshop of" is fairly specific and sometimes the hand of the master is included in this phrase specifically for some central part of the painting. When it comes to artist workshops, we are pretty good at gathering the documented members over time as artists move from one location to another. I think we should create "studio" items that contain documented pupils or assistants that we can use instead of anonymous. The risk is that there is nobody else doing this, because they don't have the data, so it would be an innovation. My view is the painting is either by the studio or not, and given the dating, it should be linked to the people in the studio in the period of that date. I would be willing to work on this if there was consensus. Jane023 (talk) 10:08, 2 December 2020 (UTC)
- @Jane023: can you point to a few examples where we currently have multiple assertions of authorship on Wikidata that you think are well-modeled? ChristianKl ❪✉❫ 13:28, 2 December 2020 (UTC)
- You can check out the list of copies Wikidata:WikiProject sum of all paintings/Copies where some paintings have been copied multiple times, both in their time and afterwards. I don't keep a list of specific cases, but you could run a query against attributions on the Rembrandt ot Titian paintings I guess. Nearly every older painter has specific examples that need special treatment. Religious works are especially tricky because churches are terrible at keeping records and after the Protestant reformation a lot of art got moved around or sold. Jane023 (talk) 14:16, 2 December 2020 (UTC)
- @Jane023: can you point to a few examples where we currently have multiple assertions of authorship on Wikidata that you think are well-modeled? ChristianKl ❪✉❫ 13:28, 2 December 2020 (UTC)
- I support the idea of using "unknown value" for items with unknown creator. From the ontological perspective, it is incoherent to claim that all of these items have the same creator. If it is established that some set of items has the same creator, then we can reify them individually. It has been my experience that resolving ontological issues by permitting false claims always leads to regret. Bovlb (talk) 22:42, 9 December 2020 (UTC)
- @Jane023, Bovlb, TomT0m, Jura1, ArthurPSmith:@NMaia, Multichill, Valeriummaximum, SixTwoEight: Given the above exchange I create a proposal so we can vote more directly on support/opposition. ChristianKl ❪✉❫ 14:35, 10 December 2020 (UTC)
- There can be a subtle difference between anonymous and unknown author, to which Jane023 also may be refer. Sometimes author is just unknown, sometimes it is deliberately hidden. --Infovarius (talk) 21:28, 10 December 2020 (UTC)
- There is more than a "subtle" difference between anonymous and unknown. I believe this whole page is misnamed, since there is now a proposal to change anonymous to unknown and there is no ontology to be found here. If you consider the case of depicted anonymous people, such as person depicted in Mona Lisa (Q11879536), who is currently human, what is the number of anonymous people that may be included in an item for it to remain human? Because I personally like to count humans in variouus lists. I noticed that I can no longer count female portrait sitters siince woman (Q467) and man (Q8441) were demoted to subclasses of human. This list Wikidata:WikiProject_Women/Portraits_of_Women_1550-1559 needs two statements to indicate I am looking for women to be returned and I find this counterintuitive. If this proposal is done I will be considering items for "Rembrandt follower" and "School of Rembrandt" but I want to make sure these remain human. If not, I will also consider using Q5 with various time period and male and female qualifiers in the depicts (P180) statements for creators and portrait sitters. Jane023 (talk) 08:16, 21 December 2020 (UTC)
Remove instance of (P31)human (Q5) from anonymous (Q4233718).
We will talk here about a human-centric-properties the following properties: author (P50), lyricist (P676), composer (P86), librettist (P87), publisher (P123), creator (P170), designed by (P287), donated by (P1028), architect (P84), legislated by (P467), illustrator (P110), translator (P655), owned by (P127), main subject (P921), depicts (P180), performer (P175), collection (P195), reviewed by (P4032) and collection creator (P6241).
The human-centric-properties should only be used in ways that don't violate value-type constraint (Q21510865). This means that classes like unknown (Q24238356), anonymous (Q4233718), untraceable copyright owner (Q60711924), unrecorded creator (Q102245112), unknown Bohemian (Q97393802), senior administration official (Q7450647), notname (Q1747829) and anonymous master (Q474968) should not be used with the above human-centric-properties.
Existing values should be replaced by bot from the above human-centric-properties. It's important that the bot keeps qualifiers and references intact. unknown (Q24238356) should be replaced by unknown valueHelp without any additional qualifiers. anonymous (Q4233718) should be replaced by unknown valueHelpobject of statement has role (P3831)anonymous (Q4233718). The other items should be replaced similarly to anonymous (Q4233718).
"To mark unknown status when importing a dataset from an authority, Wikidata uses unknown (Q24238356) as a default practice rather than making distinct items to communicate a designation of unknown status from a particular authority." should be removed from https://www.wikidata.org/wiki/Help:Statements#Unknown_or_no_values
The implementation of this proposal should be put on hold till QuickStatements supports adding unknown valueHelp.
Support
[edit]Support As proposer. ChristianKl ❪✉❫ 15:26, 10 December 2020 (UTC)
Support with the kind request to first replace usage before updating anonymous (Q4233718) because otherwise we'll have a ton of constraint violations. Multichill (talk) 17:40, 10 December 2020 (UTC)
Support and I would also be willing to help automatically convert usage of anonymous (Q4233718) to unknown value. --SixTwoEight (talk) 18:29, 10 December 2020 (UTC)
Support with the caveat that I do not understand the distinction we are drawing between "unknown" and "anonymous". We need to clarify that if our usage is going to be coherent and consistent going forward. Bovlb (talk) 18:42, 10 December 2020 (UTC)
Oppose I believe it is desired to have anonymous remain human. I believe there is more overlap in anonymous works indicating a smaller set of humans than unique humans, so reducing the number of humans to one is in my mind more accurate than introducing millions of possible humans. Jane023 (talk) 18:13, 17 December 2020 (UTC)
Support I support changing anonymous to be class of human rather than instance; I think using the statement 'has creator' 'anonymous' is fine, but I have no objection to changing it to unknown on the following conditions: a bot updates all existing statements; documentation is provided in a clear public way to users so that they understand that this is the standard; documentation is provided as to how to query 'unknown value'; tools are repaired so that contributors can easily upload statements with 'unknown value' rather than designing their own single-purpose bots to change the snaktype.Valeriummaximum (talk) 18:16, 19 December 2020 (UTC)
Strong support obviously. The problem is not human (Q5) but instance of (P31), anonymous is obviously not an instance. Cheers, VIGNERON (talk) 14:05, 20 December 2020 (UTC)
Support especially now that Quickstatements seems to have been fixed for this issue? ArthurPSmith (talk) 18:01, 22 December 2020 (UTC)
Support seems quite obvious, as said. Anonymous is not an instance of human.--DarwinAhoy!22:02, 23 December 2020 (UTC)
Opposition
[edit]Discussion
[edit]
- Could we please clarify the (intended) difference between "unknown" and "anonymous" and why it is important to represent them differently (via a qualifier on the latter)? Thanks, Bovlb (talk) 16:05, 10 December 2020 (UTC)
- @Bovlb: Hannolans wrote 'Anonymous and unknown are legal concepts for copyright and orphan works determination' over at the talk page of anonymous (Q4233718). It therefore seems desireable if the distinction we have in our data that are of interest to some in the source metadata community stay in our data in Wikidata and don't get removed. ChristianKl ❪✉❫ 16:36, 10 December 2020 (UTC)
- Thanks. @Hannolans: Do you happen to have a citation for that? Thanks, Bovlb (talk) 17:17, 10 December 2020 (UTC)
- To claim that a work is 'anonymous' research should have been done if the publication was really anonymous. and if later during the copyright term the authorship was not disclosed. (https://commons.wikimedia.org/wiki/Commons:Anonymous_works we use 'anonymous' and 'unknown author'. In the US you can register your work as 'anonymous' in the copyright register (https://www.copyright.gov/comp3/docs/compendium.pdf 615 Anonymous and Pseudonymous Works) In Germany there exist also a Register of Anonymous and Pseudonymous Works where the real name of the author can be submitted for registration of an anonymous work. So 'anonymous' is more or less a pseudonym, there was a known creator, but not disclosed, while 'unknown author' means that the museum couldn't determine the creator, but the work could have been published or disclosed with an author name unknown to the museum, or anonymous.
- The US copyright office mentions an example on page 79: "Joseph Cline is the author of a literary work titled Prime Color. Cline’s name did not appear on the first edition of the work. Instead, the first edition stated that the work was written “By Anonymous.” The U.S. Copyright Office will register the first edition as an anonymous work if the applicant identifies the author as “Anonymous” and/or checks the Anonymous box. In the alternative, the Office would accept an application that names Joseph Cline as the author (regardless of whether the Anonymous box has or has not been checked)." Not sure how we should state this in Wikidata but would be great if we do this correctly --Hannolans (talk) 11:50, 11 December 2020 (UTC)
- @Hannolans: If anonymous means that at a given point in time a museum did research and couldn't find an author, it would make sense to use point in time (P585) or something similar to tag when the research was done. ChristianKl ❪✉❫ 12:01, 11 December 2020 (UTC)
- Timestamp for attribution (Q230768) would be great. I would like to see some wikidata items of different situations and timeline of anonymous/unknown author works and see if we can map them (for example Diary of an Oxygen Thief (Q5272044)Go ask Alice (Q2533572)). Note we have also anonymous master (Q474968) and unknown (Q24238356) ( --Hannolans (talk) 12:22, 11 December 2020 (UTC)
- @Hannolans: If anonymous means that at a given point in time a museum did research and couldn't find an author, it would make sense to use point in time (P585) or something similar to tag when the research was done. ChristianKl ❪✉❫ 12:01, 11 December 2020 (UTC)
- Thanks. @Hannolans: Do you happen to have a citation for that? Thanks, Bovlb (talk) 17:17, 10 December 2020 (UTC)
- @Bovlb: Hannolans wrote 'Anonymous and unknown are legal concepts for copyright and orphan works determination' over at the talk page of anonymous (Q4233718). It therefore seems desireable if the distinction we have in our data that are of interest to some in the source metadata community stay in our data in Wikidata and don't get removed. ChristianKl ❪✉❫ 16:36, 10 December 2020 (UTC)
- @ChristianKl: I think you mixed up subject has role (P2868) and object of statement has role (P3831). Should be unknown valueHelpobject of statement has role (P3831)anonymous (Q4233718). I did an example edit. Multichill (talk) 17:26, 10 December 2020 (UTC)
- @Multichill: Yes, I corrected it. ChristianKl ❪✉❫ 17:28, 10 December 2020 (UTC)
Note: Quickstatements now supports unknown value correctly. --SixTwoEight (talk) 15:39, 22 December 2020 (UTC)