I think you have dramatically overstated the malleability of statistical inference. Further, given that you have made such a bold assertion, you do owe us an example or two. I’d like to emphasize the opposite point: that with a given experimental setup and specified data, all statisticians will come up with the same inference in answer to the same question. However, one any of those strictures are removed, we may well arrive at different conclusions for a host of reasons. That is not as earth-shattering as it seems.
We may end up with different conclusions in response to different questions.
For example, consider data arrayed in the following 2x2 contingency table:
If asked what is the conditional probability a Democrat is a Woman, all statisticians would say
prob(W|D) = 45/90 = 50%
However if asked what is the conditional probability a Woman is a Democrat, they would say ,
prob(D|W) = 45/120 = 37.5%
Both conclusions are valid based on the data. The derivation of different conclusions from the same data here is not based on different assumptions, but on the different questions one is trying to answer.
Similarly, in a standard hypothesis testing scenario, we could arrive at different conclusions from the same data depending on the level of significance we set. This is the probability of rejecting a null hypothesis, H0, when it is true. This probability is the Type I error, denoted alpha. If the p value, the probability of the observed outcome assuming H0, is less than alpha, we reject H0 in favor of the alternative, HA. If my significance level is 1.0% ( 99.0% confidence) and yours if 5.0% (95.0% confidence) we could arrive at different conclusions.
For example , if we assume a coin is fair and conduct an experiment of 20 independent tosses, we could use the binomial distribution to compute the probability of getting 5 heads or less at 2.07%. If we had an alternative hypothesis the coin is unfair and the underlying heads probability is less than 50%, then I would not reject the hypothesis the coin is fair, but you would.
My experience is that the most frequent abuses occur when someone is trying to infer an underlying cause for a statistical difference in two populations. Suppose I make a predictive model of salary that uses several non-controversial “explanatory” variables to predict salary differences for people of type X versus those of type Y. If I then add a type X versus type Y categorical variable and it ends up having a non-significant coefficient, I might say the data does not require inclusion of a type X versus type Y categorical variable to explain salary differences. You might use a different model or simply say there are salary differences between type X versus type Y and conclude “ the statistics say type X versus type Y bias exists”. The difference in conclusions here is driven by the goal of arriving at a desired conclusion. Note one of the conclusions is technically outside the realm of statistics. This is the attempt to ascribe a causation to a correlation when the first rule is “correlation does not imply causation”.
I could go on, but the point is that your statement is valid only within a set of circumstances you ought to more carefully delineate.