Part of the problem is that it's difficult to trust any of these documents. It seems that in this latest dump there are some that are blatantly forged.
Even if the others are genuine, this calls into question the veracity of the others
A lot of what's going on in that tweet you cited is hard to decipher without stepping back and taking stock of the context.
For instance, I don't know what to make of the significance of a cursor being visible in a document that is dumped (it might be a big deal, or it might not be at all) and I don't know how motivated the party is here to use that to prove a particular argument, or what it should or shouldn't prove.
In this case, you apparently have to be already bought in with the idea that BBC and Le Monde have a history of using fabricated info from the U.N., and that a reporter is untrustworthy for reasons explained at length in an entirely different twitter thread that you'd have to read to understand. After being that bought in, the fact that BBC, Le Monde, and/or this reporter are involved is treated as evidence in and of itself that the new info is unreliable. You have to agree with all that to even start in the middle and begin going through this analysis.
To state it plainly, whenever I see something like this that starts so deep in the middle of unexplained context, I treat it like an "indicator" (in the sense of an economic indicator) against the argument, because it feels like I'm being asked to skip critical steps.
"I don't believe it because I'm missing context" is a weak argument to make. The tweets read like the author isn't a native English speaker. The traditional vs. simplified characters accusation should be relatively easy to confirm if you want to put in at least a little effort. Looking up characters as someone who isn't familiar with them at all might be cumbersome but is absolutely possible.
But then again you wonder why such blatant mistakes would be made in the first place, if this was done by someone at least halfway professional.
I genuinely don't understand where the mistake is, even in in your paraphrased version which is intended to caricature. Yes, I really do think that missing context is a good reason to refrain from believing something, and you should too. I find it bizarre that that is disputed.
I actually went into a fair amount of detail about what specifically was contextually inadequate here, all of which you didn't engage with.
It's hard to understand what the tweet is suggesting unless you read the entire thread, one related twitter thread, and chase down an implied history of allegedly questionable BBC reportage the UN is complicit in, and join the author in making assumptions about what it all means. I stated all of this already.
And in additional to all the previous stuff I said, it's not obvious that a difference between traditional and simplified characters prove what you're asked to believe it proves, that it must have come from Taiwan. I'm assuming there's another thread somewhere that goes into detail about how Taiwan uses different characters in other documents, which is the basis for believing different characters here prove that its a forgery?
It's not about the effort involved in comparing the characters, it's about the underlying logic for the argument, which is assumed to have been proven elsewhere but not referenced.
Again "I don't believe it because I'm missing context" is a perfectly legitimate way to engage with something that's missing context. You appear to have abandoned that point to instead emphasize that there is indeed sufficient context (so I guess having enough context does matter to you, after all.)
I don't know if you're confusing me for somebody else, but I didn't link to any tweets. And as I've now said twice, which has been ignored twice, the thread alludes to numerous unmade arguments about the reliability of the BBC, the UN, Le Monde, and a reporter as background to motivate the inference that their reporting is unreliable.
And it makes an assumption about what is proved or not proved by traditional vs simplified characters (different = Taiwan), and that underlying assumption isn't backed up with an argument, and there's no reason to agree with that assumption without further context. A reader is supposed to already agree that that's how it works or else go scrolling through twitter timelines and searches to find where that argument is made.
Perhaps when you ignore all of this in your reply, and remind me that it's "directly started talking about [sic]" traditional vs. simplified characters, I can repeat this all again and hope the fourth time is the charm?
> I don't know if you're confusing me for somebody else, but I didn't link to any tweets.
sigh.. Correction: the link to the tweet form the guy you replied to that you seemed to refer to.
> And as I've now said twice, which has been ignored twice, the thread alludes to numerous unmade arguments about the reliability of the BBC, the UN, Le Monde, and a reporter as background to motivate the inference that their reporting is unreliable.
I clicked the link again. I don't see any of that. It starts with claims about 1) a cursor (which I ignored) and 2) the issue with the characters, which is further elaborated on in a couple follow-up tweets. Then your comment mentions all these news outlets and I don't understand how that connects to the claims about the characters, or makes them taken out of context. It looks like a pretty stand-alone claim/issue that should be something to quickly do research on if you care about the topic, nothing taken wildly out of context.
> I can repeat this all again and hope the fourth time is the charm?
1) software renders not Unicode characters but font glyphs
2) which font glyphs are chosen depends on many factors like installed fonts, OS, language/region settings, and so on
3) people author (and read) characters by how they look on their systems, what codepoints are used is not on anyone's mind
A differently configured system can uncover incorrect codepoint choices or rendering differences across machines, exactly what happened with the author of that tweet (supposedly living in Europe and not having the same old Windows machine as ones used in CCP apparat).
In fact, this happens all the time and is a routine headache for anyone building CJK sites viewed from different countries in the region (for example, I see some traditional Japanese characters, instead of their simplified Chinese versions, on http://cs.mfa.gov.cn/wgrlh/. Is there a hidden meaning? Is the site fake?). When it comes to MS Word and IME in old Windows versions, things are even wilder. I doubt the tweeter didn't know this, most likely it's a stall tactic.
That happens if you have no language hints, or the wrong one, e.g. posting in Simplified Chinese on a Taiwanese website. If this was written in something like MSWord by CCP officials, it should have the proper language hint, so render properly on any OS newer than XP.
Setting aside all other assumptions you make about the soundness of their setup overall, consistency of their input methods, newness of their inventory, etc., do you actually believe they would have the fonts with traditional glyphs in them installed and used at all? What for? Remember, this applies to the system as a whole. A character would be shown as simplified by the system even during input.
Again, I tried it and I got different results in different software (even on a Mac), with Pages in particular showing only simplified characters and straight layout (in contrast to Quick Look, which is what the tweeter must have used). Do you seriously think CCP officials have a fleet of Macs to check document appearance in case they are leaked and/or scan documents for "enemy" Unicode points? If not, how they would even know what code points are there, if all they ever see is simplified?
One needs to look at vocabulary, word choices and such. That is something that could actually point to fakery. Nothing like that was claimed yet, of course.
Because of the han unification, you can tell the font renderer which language context you're in and want things to be rendered. MSWord shows you the language in the status bar at the bottom, which is not only used for spell checking.
In html, you can add the lang attribute to a tag to tell the browser what language the contained glyphs belong to.
> do you actually believe they would have the fonts with traditional glyphs in them installed and used at all? What for?
Because ever since Vista, these come pre-installed regardless of your locale.
> Do you seriously think CCP officials have a fleet of Macs to check document appearance in case they are leaked and/or scan documents for "enemy" Unicode points?
I don't believe anything in particular, just adding technical context. The documents could also have been leaked through Taiwan or Hong Kong and then mangled there resulting in this.
> In html, you can add the lang attribute to a tag to tell the browser what language the contained glyphs belong to.
Well, you can visit probably any official Chinese government department website right now and see traditional Japanese characters instead of their simplified equivalents, if your machine happens to be configured that way. (Or at least the first one I stumbled across was like that, I pasted link somewhere in another comment. And I most certainly have Chinese fonts installed; in fact I see only simplified characters when I open the tweeted document in Pages.)
So they clearly do not make that effort even with documents actually crafted with foreign readers in mind. Presumably things can't be expected to be better if we are discussing secret documents intended for internal CCP consumption.
Right, and a premise of the tweet cited here is that different character sets mean you should just freely assume it's fabricated by Taiwan. It doesn't make that argument (at least not anywhere in the cited thread), it just presents an examination of characters with that as an underlying assumption.
"Western lie debunkers" will absolutely jump at any chance to say this is a fake, but that particular take is pathetic and indicative of problems with CJK literacy.
Unicode points and font glyphs are not the same thing, leading to situations where one Unicode character can be rendered as a different one (but similar) depending on OS and setup* -- and people enter and read characters by how they look, not by their Unicode points.
So the document can easily end up with 置's Unicode entity in the source without anyone finding out, even the person who entered it, if it always renders as a simplified version (without the left-bottom vertical line). And it will always render as a simplified version, because everyone involved is obviously using a simplified setup.
(If you have a Big Sur set up the same way as mine, you can observe for yourself by opening the same doc, such as the "Response Plan and Procedure for Escape and Disturbance Prevention During Class Times", in Quick Look and Pages and looking at the end of text following the first Arabic numeral "1" on the first page. Quick Look will show you a traditional/Japanese character at the end, while Pages will have a much better layout and consistently show simplified characters.)
The sad thing is that this initial stalling tactic is effective. Some will be swayed by his simple tweets and not have patience for the subsequent "debunking of the debunk" let alone their own research. This takes away the initial impact of the release.
* Software chooses a different glyph, the font provides a different glyph than required by Unicode standard, and so on. Example: https://stackoverflow.com/questions/54212157/. There was an in-depth article on CJK posted on HN some time ago, can't remember what it was called.
TL;DR yes, documents authored by CCP officials can easily have traditional Unicode points in them, because it is completely routine for software to be set up in a way that always renders those in simplified way.
That's interesting, I didn't realize that IMEs would silently offer you choices in different sets/styles than the one preferred by the locale, and that OS fonts could actually hide the difference.
If you're saying that an innocent error is what happened, you'd expect to see these weird traditional-in-simplified-context characters to appear across all sections of the documents, and not clustered together in a single paragraph (since that would be evidence that a single paragraph has been written by a different author than the rest of the document)
I believe if they can make it into a document in any number of ways (copy paste, input method, etc.) and no one would be able to tell, their existence alone is not an indicator.
That different authors could have written/rewritten/edited different parts of a document at different points in time is natural, what are reasons to think otherwise?
Even if the others are genuine, this calls into question the veracity of the others
https://twitter.com/Cinqscories/status/1529035490032340993