But if you analyze better known or games we know have sold well, the numbers seem to make sense. Try to compare to other new releases.
Over longer periods of time and when dealing with larger numbers/chances, these figures can actually be very accurate. Unfortunately, when dealing with smaller numbers/chances they can be wildly inaccurate.
Imagine a group of 100 people. I think that some of them may have a rare illness (in fact, there are 3 of them, but I don’t know this). The illness is very difficult and expensive to test for and so I can only test 7 of the 100 people. These seven people are my sample group.
The chances of me randomly selecting none of the sick people in my sample group is around 80% and this result would suggest that there were 0 sick people in the full group. This is of course wrong, as there are actually 3.
The chances of me finding 1 sick person is around 18% and this result would suggest that there were 14 sick people in the full group. We again have an inaccurate result that massively overestimates the number of sick people.
Selecting 2 (1.5% -> 29 sick people) or 3 (x<1% -> 43 sick people) of the sick people in my sample group provides even more ridiculous results. Regardless of how the chances play out, our results are wrong with the most likely outcome (80% or so) being us underestimating the actual number.
Using a larger sample group certainly helps to improve the reliability of these results, but when dealing with low probabilities, these types of calculation will never be reliable (unless perhaps your sample group covers 50% or so of the total population).
Game Stat are dealing with a sample group that represents a similar proportion of the full group in the above example (7/100 vs 7m/94m) and, if Shenmue 3 sold 100,000 copies or less on PS4, a much much lower probability (0.1% vs the 3%).
As I said in my last post, aside from the methodology being seriously flawed, the figures available on similar sites cast some real doubt on the reliability of the data.
The chances of PSN Trophy Leader finding more Shenmue 3 players in their user base of less than 600,000 than Game Stat found in a sample size of 7,000,000 are incalculably low. Of course, the PSN Trophy Leader stats could be wrong, but aside from being very unlikely in and of itself as we can literally see the PSN usernames of all of the players tracked as having played Shenmue 3 and the full list of users, their data is also backed up by similar data from two other trophy sites.
Edit: It’s also worth nothing that there was a short period last December where Sony accidentally gave out the exact number of people who had obtained each trophy through their my PS life app. These figures, along with the percentage of users who had obtained each trophy which has always been available, could be used to calculate a very precise figure for the total number of registered players of any given game.
Game Stat was built around that system and so figures for a lot of older games (released pre December 2018) are pretty on the nose whereas games released after are estimated using their current sampling system and are at the mercy of its many flaws.