Jump to content
IGNORED

A/B testing favors B over A


Recommended Posts

1 hour ago, mmerrill99 said:

Of course it depends on what the test is being used for - if the objective is other than testing whether a difference is heard then it's a perfect stealth weapon.

 

It was a fake test before fake news became a popular meme

 

Assuming this bias towards the second test is true, it's a flaw only if you do a small number of A/B tests. It in no way invalidates blind testing as a methodology. 

 

I usually switch between components for days while trying to evaluate differences. I am very well aware of my attention changing and often revealing differences that are not there. That's why I always try to confirm what I hear with repeated tests. I try to focus on something very specific in a familiar sound track, played for a very short time (less than a minute) before switching. I have a few favorite recordings I know very well, and I use specific portions of those recordings to test for different sound qualities. I do this blind, if I can set up such a test (and I always try to do it this way, if at all possible).

Link to comment
1 hour ago, mmerrill99 said:

And I'm saying - what's the point in this "randomizing" - as I said if we have a source of error which  is likely masking any small differences what's the point of randomising it - it only hides the effect - we would be better being aware of the effect in the results & discounting the results.

What is "likely" being masked and how did you determine this? If the "internal validity" of an experiment does not allow very very small differences to be detected then that has nothing to do with the experiment ability to detect significant differences.

The typical situation is that an attempt to disprove the null hyp either shows a difference or not -- if not and you know the resolving ability of the experiment, then you know that at best the difference is less than the resolving ability of the experiment. 

Custom room treatments for headphone users.

Link to comment
23 minutes ago, jabbr said:

....

The typical situation is that an attempt to disprove the null hyp either shows a difference or not -- if not and you know the resolving ability of the experiment, then you know that at best the difference is less than the resolving ability of the experiment. 

Well isn't that the point - the resolving ability of the 'particular' experiment (each home administered experiment is different) is unknown as there is often no control, no calibration to show this resolving ability - I'm talking about the usual blind tests called for on audio forums, not laboratory organised blind tests.

Link to comment
4 minutes ago, mmerrill99 said:

Well isn't that the point - the resolving ability of the 'particular' experiment (each home administered experiment is different) is unknown as there is often no control, no calibration to show this resolving ability - I'm talking about the usual blind tests called for on audio forums, not laboratory organised blind tests.

 

There's a huge difference between a sighted, long term A/B comparison with one or two attempts to switch components and a blind A/B test, repeated sufficient number of times to achieve statistical significance. While both can contain biases and other flaws, the blind test controls for many more variables and is much more objective and reproducible by others.

Link to comment
4 minutes ago, pkane2001 said:

 

There's a huge difference between a sighted, long term A/B comparison with one or two attempts to switch components and a blind A/B test, repeated sufficient number of times to achieve statistical significance. While both can contain biases and other flaws, the blind test controls for many more variables and is much more objective and reproducible by others.

A test which has consistent & inherent flaws is by definition "reproducible by others".

That doesn't make it objective

 

By that definition sighted testing is objective as it is reproducible by others

Link to comment
14 minutes ago, mmerrill99 said:

Well isn't that the point - the resolving ability of the 'particular' experiment (each home administered experiment is different) is unknown as there is often no control, no calibration to show this resolving ability - I'm talking about the usual blind tests called for on audio forums, not laboratory organised blind tests.

 

I was careful to discuss the need for randomization in the setting of multiple subjects. If the test isn't calibrated to start with then there is no way to determine its validity. As I said the fact that randomization doesn't solve every problem doesn't mean it doesn't solve any problems.

 

At home I do my own pseudo-blinded, casual listening impressions that I wouldn't describe as a formal experiment ;) ... doing ABX correctly is work 

Custom room treatments for headphone users.

Link to comment
3 minutes ago, mmerrill99 said:

Eliminating some biases does not make it "objective" - if you said eliminating all biases, you would be able to make that claim but I would  ask you to prove this!

 

My claim, as quoted below, is that a blind test is significantly more objective:

 

9 minutes ago, pkane2001 said:

blind test controls for many more variables and is much more objective and reproducible by others.

 

Link to comment
5 minutes ago, mmerrill99 said:

The differences perceived in normal listening are likely being masked.

The likelihood greatly depends on the calibration done eg was the volume carefully calibrated? If reasonable calibration is done then reasonable differences are not likely masked.

 

What this heightens however is the real need to correlate measurements with impressions -- in the simplest case the volume needs to be measured.

Custom room treatments for headphone users.

Link to comment
4 minutes ago, Jud said:

Regarding loudness, we are *really* good at detecting and remembering this, but really bad (as I mentioned before) at remembering other acoustic qualities.  So very often, when we think we are comparing two musical passages as a whole, I think it is very possible what we are in fact doing is comparing loudness of the end of passage A with the beginning of passage B.  We like louder (thus the loudness wars).  I think if the end of music sample A is softer than the beginning of music sample B, that alone might easily account for a preference of B over A.

Good point among a series of good points

Link to comment
2 minutes ago, Jud said:

- The "less than a minute" time frame isn't nearly short enough.  Scientific research shows echoic memory for everything except loudness lasts maybe 4-10 seconds.

Agreed. I mentioned less than one minute to highlight the difference with days and weeks recommended by those who believe in long term evaluation :) 

Link to comment
3 minutes ago, Jud said:

The "less than a minute" time frame isn't nearly short enough.  Scientific research shows echoic memory for everything except loudness lasts maybe 4-10 seconds.

 

Yes but... when I compare two sounds, be they components or instruments or recording I "feature extract" that is commit my impression to memory be that smoothness of a string in a certain octave or extension of bass or position of instruments on soundstage, and these features can be remembered longer than 10sec

Custom room treatments for headphone users.

Link to comment
7 minutes ago, mmerrill99 said:

@pkane2001, you never answered this - have you encountered this situation?

 

While I've encountered (many times) differences in sighted tests that I could swear were very obvious, I frequently could not make the same distinction in a blind test. To me, this is an indication of a failure of the sighted test to control for subjective variables, and not a failure of a blind test to discover differences.

 

Link to comment
5 minutes ago, mmerrill99 said:

sighted listening is biased towards false positives (hearing differences) ; blind testing is biased towards false negatives (not hearing differences)

 

The outcomes of both are very different so my analogy was flawed - one results in possibly always chasing what is better sounding - the other results in possibly not hearing what is better.

 

Which one is this hobby mainly about?

 

Merrill,

Thanks, I think that is the best, simple, 10,000 ft up, analysis of this whole fervor over audiophile testing methods I've yet seen, cutting through the bias, obscuring details, and bullshit ! It should be copied to a pinned thread to guide all of us.

 

And if one isn't in this hobby to chase better sound, then what the hell are they doing it for ? Waste money, exercise their oscilloscopes, or Online Armored Combat ?

 

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...