Jump to content
IGNORED

A/B testing favors B over A


Recommended Posts

This is an example of study bias. Very much like transistor bias in that a static effect pushed the voltage/impression in one direction.

 

The study designer needs to correct. One way is to randomize -- another is to do multiple tests of each amp combo with order mixed. etc etc 

 

Yep that's why it would be hard work to do a real study ;) and why results are questioned -- the real problem is when investigators with an agenda use bias to "prove" something 

 

The ultimate answer is that "repeatability" under different circumstances must not be ignored.

Custom room treatments for headphone users.

Link to comment
8 hours ago, jabbr said:

This is an example of study bias. Very much like transistor bias

 

You seem like a real smart guy, but every now and then (from a limited sample) I wonder if your brain snapped over to an alternate dimension  :o

 

I find comparing human perception and reporting behavior, to a simple solid state device, not in the ballpark at all !  Perhaps you could find another example with a little more nuance ?

 

The rest of your post is fine, and please don't take offense :)

 

 

Link to comment
37 minutes ago, Ralf11 said:

 

True, but it's not clear that they had an agenda.  They could just be clueless.

 

I think that's unnecessarily insulting to Jason and his local audio society.  The primary purpose of the gathering was to compare two amps, not to evaluate A/B testing strategies.  Normally in such situations the A/B order is randomized.  Jason decided to do an interesting experiment to see the extent to which a fixed (non-random) order would bias the results.  By reversing the A and B ordering between the two groups of listeners, he achieved an overall result that was not biased, yet still was able to demonstrate the favoring of B over A.

 

Also, please bear in mind that this is a group of audiophiles gathering to have fun, not to conduct scientific research, and that each session was limited to about 3 hours

HQPlayer (on 3.8 GHz 8-core i7 iMac 2020) > NAA (on 2012 Mac Mini i7) > RME ADI-2 v2 > Benchmark AHB-2 > Thiel 3.7

Link to comment
12 minutes ago, Daudio said:

I find comparing human perception and reporting behavior, to a simple solid state device, not in the ballpark at all !  Perhaps you could find another example with a little more nuance ?

 

"bias" in study bias and transistor bias are polysemes.

 

Study bias is a systematic weight placed toward one outcome of the study (an error)

Transistor bias is a constant voltage applied to the collector and added to the base such that the transistor has improved amplification.

 

In both polysemes, there is a "push" in one direction.

Custom room treatments for headphone users.

Link to comment
6 minutes ago, jabbr said:

"bias" in study bias and transistor bias are polysemes

 

Ok, I see that, but it seems just just a word game to me, as opposed to the vast gulf in complexity between a human being and a transistor. That's where my reaction came from.

 

And I learned a new word today !

 

Link to comment
2 minutes ago, jabbr said:

"bias" in study bias and transistor bias are polysemes.

 

Study bias is a systematic weight placed toward one outcome of the study (an error)

Transistor bias is a constant voltage applied to the collector and added to the base such that the transistor has improved amplification.

 

In both polysemes, there is a "push" in one direction.

A bias is a constant offset applied to a variable signal. Doesn't matter if the signal is a voltage or a statistical preference.

 

Your description of bias in a BJT circuit isn't quite right, however. Here bias is a constant current added to the base/emitter junction to bring the transistor's gain into the linear region. That is, of course, beside the point, which I agree with.

Link to comment
3 minutes ago, Daudio said:

 

Ok, I see that, but it seems just just a word game to me, as opposed to the vast gulf in complexity between a human being and a transistor. That's where my reaction came from.

 

And I learned a new word today !

 

 

Assume that I'm writing tongue-in-cheek, and lept on the opportunity to keep this on-topic -- we are talking about amplifiers right ;)

 

Custom room treatments for headphone users.

Link to comment

OK,  so let's say the order is randomized - so what - we still get the second listening being preferred & therefore subtle audible differences being masked (or more correctly, the one paid more attention to during listening). As attention is not a fixed element in listening, how are such listening tests going to deal with this variable?

Link to comment
1 hour ago, mmerrill99 said:

OK,  so let's say the order is randomized - so what - we still get the second listening being preferred & therefore subtle audible differences being masked (or more correctly, the one paid more attention to during listening). As attention is not a fixed element in listening, how are such listening tests going to deal with this variable?

The purpose of randomization is to reduce/eliminate systemic errors assuming sufficient sample size. (The preference for first vs second would cancel as roughly equal numbers of Amp A and Amp B would be listened first vs second.) ... but you need to have enough different people listening 

Custom room treatments for headphone users.

Link to comment
1 minute ago, jabbr said:

The purpose of randomization is to reduce/eliminate systemic errors assuming sufficient sample size. (The preference for first vs second would cancel as roughly equal numbers of Amp A and Amp B would be listened first vs second.) ... but you need to have enough different people listening 

Or enough rounds of comparisons.

Link to comment

I have no interest in the madness of trying to decide whether A is better than B, or B is better than A. If at least one of A or B appears not to inject significant flaws in the sound then that unit will get the nod from me ...

Link to comment
2 hours ago, mmerrill99 said:

I understand this but my point is that if there is always a bias towards preferring the sample which we pay more attention to, then this will make discriminating of subtle differences between A & B impossible

 

Let's explain a different way - if the second sample was always played at 1dB higher, it doesn't matter if we randomize the order of the samples - this very factor will probably mask real differences

and I'm not saying that randomizing order is the only way to remove measurement bias.. you are discussing something else: IIRC "internal validity" or the ability of a study to measure "signal" in the presence of confounding variables "error"

Custom room treatments for headphone users.

Link to comment
45 minutes ago, Lebouwsky said:

Short comparison has a pitfall in this hobby. And I did a lot, because there was a time when my system contained 19 tubes devided over 5 different types. I was a tuberoller, a very expansive hobby by the way.

 

Eventhough the tuberolling is more or less in the past (only 1 pair in de preamp), I learned a valuable lesson. The only way to judge a component is to settle down, listen to it for a couple of weeks and trust your feeling. If it feels right (which takes time) it is right. No 3 hour a/b session can do that.

I think the same effect is occurring with break in.  It takes time to trust your feeling.  For it to feel right.  However, in neither case do I think a change in sound is really behind reaching this point of feeling comfortable.

And always keep in mind: Cognitive biases, like seeing optical illusions are a sign of a normally functioning brain. We all have them, it’s nothing to be ashamed about, but it is something that affects our objective evaluation of reality. 

Link to comment
12 hours ago, mmerrill99 said:

There are probably many factors at play but one significant one is attention - when we focus attention we hear details that can have escaped our notice - essentially we hear differently as hearing is not a passive experience that happens to us, we are actively engaged in it, creating what we hear.

 

This makes sense to me... I realised that I unconsciously tend not to focus my attention in the same aspects of sound/performance when I do A/B comparisons, which is why I don't find them very effective.

My attention doesn't wander as much when I am comparing measurements. B|

"Science draws the wave, poetry fills it with water" Teixeira de Pascoaes

 

HQPlayer Desktop / Mac mini → Intona 7054 → RME ADI-2 DAC FS (DSD256)

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...