Wednesday, July 04, 2007

Rating 35/40mm rangefinder lenses - a trial of a statistical approach

The useful Leica M-mount group on Flickr is a great resource for those looking to pick a new rangefinder lens. You can look at the tags on various images and in theory see which 'look' different lenses have in terms of contrast, bokeh and so on. However, it is difficult to draw definitive conclusions sometimes. There are several reasons for this including:

1. The people who upload images probably only upload the very best of their images so the impression of image quality given may be skewed upwards.

2. Certain photographers upload proportionately more images than others and this depends on the lenses they own. If a photographer who dominates one lens pool is particularly good (or bad) then this will give a false impression of the lens image quality in general.

3. The quality of the images uploaded is not necessarily a good guide to the quality of the original image that was taken with the lens. Uploaded images may be badly scanned if they are from film cameras. Also people do not always calibrate their monitors or scanners and so the image they see when they upload and the image you see when viewed on a different monitor are not necessarily the same.

4. With film (especially black and white) the processing and the skills of the developer play an important role in the look of the image. Many photographers do not mention the processing they did or even the type of film they used (although this is less common).

I use statistics a lot in my regular job so I was curious to see what a statistical analysis of different lenses using images and data from this Flickr group would look like. For example, how much difference does it make to ones impression of a lens if a pool is dominated by a small number of prolific photographers rather than a general mix? Does it give a false impression? Indeed can any lenses really be differentiated reasonably objectively at all using this resource?

As an initial experiment I created a database of some 35mm and 40mm lenses that I am interested in. This is not a complete listing yet as I haven't decided whether to pursue this line of enquiry - it is quite labour intensive. However, my initial results may be of interest and at least start a discussion.

At present I have gone through the monochrome images on the Flickr group made using the following lenses:

Cosina Voigtlander -
Nokton 40mm single coated
Nokton 40mm multi-coated
Nokton 35mm/1.2 aspherical
Colour Skopar 35mm/2.5

Leica -
40mm Summicron-C
35mm Summilux
35mm Summilux aspherical
35mm summicron version IV

Zeiss -
35mm Biogon

I went through all the monochrome images in the respective lens pools and recorded the following data:

Username of member
Film used
Developer used
Film speed
Digital RF used (RD1 or M8)

Then I appended my own subjective evaluation of the image:
1=very poor (no images actually received this score)
2=Less than optimal
3=fine
4=very nice
5=knockout

To try and keep the data balanced I stopped recording data for a lens after around 90 images had been included. Some lenses have very many more images than others and this might introduce bias.

If I was unsure between 2 scores I would give half marks (so if I was hovering between a score of 3 and 4 I just gave 3.5). I didn't dwell very long over the evaluation. Things I took into account were the overall look, bokeh, contrast, and detail. I also recorded whether there was a noticeable 'glow' to the image (a highly subjective term) or whether there was any flaring (but I have not used this data yet). Of course this is where subjectivity comes in. However, using statistics allows us to see whether there are any real differences, evaluate variation and test which factors may or may not contribute to a subjective evaluation of a lens besides the lens itself.

The table below shows the average scores for each lens, the number of images in the database so far and the standard deviation for each lens (a measure of how much the scores varied - a higher S.D. means more variation).

The overall average scores


LensAverage scoreNumber in databaseS.D.
Nokton 40mm single coated3.3428.53
Nokton 40mm multi-coated3.2277.54
Nokton 35mm/1.2 aspherical3.7655.67
Colour Skopar 35mm/2.5 3.13 90.49
Leica 40mm Summicron-C3.5272.79
35mm Summilux3.2246.53
35mm Summilux aspherical3.6459.63
35mm summicron version IV3.2389.51
35mm Biogon3.1326.54
Total3.35542.63


Actually several things become apparent looking at this table.

First, one definitely is able to distinguish between preferences. There are significant differences in the scores (in a statistical sense). If differences in the image quality produced by different lenses could not be discerned in the group, then the average scores would have all been similar (as would the S.D. figures).

Second, none of the lenses was actually bad. All the lenses on average score more than 3 which is fine. So picking any of these lenses would give some satisfaction.

Third, there was a fair amount of variation and more variation in the scores of some lenses than others. For example, the 40mm Summicron-C and the 35mm Nokton had the highest standard deviations (more than the overall S.D. of 0.63). That means that there was more variation in the quality of these images (to my eyes) than those produced with the other lenses. The least variation was with the CV Colour Skopar. There could be various reasons for this. The Summicron-C is a relatively old lens and might give more varied results. Or the Skopar may have had less variation because many of the images were taken by one person of very similar subjects. We just don't know, but it reveals that the data available does reveal differences in perception.

Fourth, there is probably not enough data to evaluate some of the lenses accurately. For example, the 35mm Biogon scored rather low at 3.13, but there were only 26 images in the pool to look at so this score is unreliable as there need to be more images.

Fifth, there do appear to be some lenses that I would prefer rather than others. For example, the Nokton aspherical scored highest, the 40mm Summicron-C did surprisingly well, and perhaps not so surprisingly the 35mm Summilux ASPH. All these were above the sample average of 3.35.

But there is a problem with image data of this kind. As mentioned above some of the lens pools in the group are dominated by certain photographers. It would be better if there were a good mix of users uploading images, but this is not always the case. Plus the group is fairly new and is still developing. This problem means that the cases in the database are not truly independent. That is in a statistical sense it is not entirely valid to just compare average scores without taking into account other factors (such as the photographer's skill, or the developer and the film). If a great photographer (or a photographer whose images you particularly like) uploads 90% of the images for a particular lens it might give a biased view of what another user could achieve with the same materials. This issue can be addressed with statistical techniques.

For example, the Summilux ASPH had 59 monochromatic images from 6 photographers in the pool when I collected the data. One of these photographers contributed 34 of the images (more than half). In statistical terms we can apply a 'control' for this photographer and recalculate the score of the lens assuming that the score due to the ability of this photographer can be separated out from score due to the quality of the lens alone. When we do this the score drops from 3.64 to 3.36 which is only just above the sample average. So there does indeed appear to be an effect caused by the domination of a photographer in a pool.

However, this example is not strictly accurate either. Some of this photographer's images should be included along with everyone else's or we should control for every single photographer in the database (which is impossible to do in a simple way). The reality is probably somewhere in between. That is the Summilux ASPH lens's real score is somewhere between 3.36 and 3.64, but the raw data is biased.

Help is at hand. There are several statistical procedures that can accommodate this issue. I chose to use a procedure known as a random effects model which allows us to take into account the fact that the images are not really independent of each other (i.e. they were not all taken randomly by unique photographers). When this model is applied to the data the adjusted scores for the lenses come out as follows:

LensAdjusted score
Nokton 40mm single coated3.29
Nokton 40mm multi-coated3.32
Nokton 35mm/1.2 aspherical3.84
Colour Skopar 35mm/2.53.27
Leica 40mm Summicron-C3.70
35mm Summilux3.17
35mm Summilux aspherical3.58
35mm summicron version IV3.32
35mm Biogon3.16
Adjusted Total3.42


Now the results are slightly different. The 40mm Summicron-C looks even better relative to the overall average, while the Summilux ASPH is still very good, but not as good as the unadjusted figures suggested. The Nokton 35mm/1.2 ASPH also looks like a real bargain.

Conclusions

This rather long-winded approach has demonstrated some key things. First, that the Flickr M-Mount group is well worth using and can differentiate between the image quality of lenses, depending on the look one requires, so long as the viewer bears in mind that they can easily be swayed by the sample of contributing photographers in each lens pool. Second, as the number of contributions gets larger and more photographers join, the quality and reliability of the resource will just get better and better.

One problem remaining is the influence of film/developer combination. As I was going through the images I became aware that this also had a discernible influence on my scores. I particularly noted the effect of Agfa APX 100 and 400 on my preferences (films I have never used and are no longer available as far as I know). The next stage of this analysis will be to incorporate film and developer data into the statistical models to see whether it makes a significant difference to the lens scores. It would be great if users of M-Mount started to include the film, speed and processing data into the tags when they upload an image. Most photographers upload the film used, but many omit the speed and developer. Another option will be to incorporate the type of image (I have different criteria for evaluation when I look at a portrait than when I look at a landscape, say). This could be incorporated into the database.

I will also probably complete my database of 35mms and re-analyse the data, and then move onto other lenses if time permits.

For me at least the bargain Summicron-C looks like my next purchase based on this data analysis. A lens I would not have considered too seriously before doing this. And remember that this analysis is based on my preferences. Someone else may well come to different conclusions.

2 Comments:

Blogger honus said...

Mark,

Excellent work. This was the type of evaluation we had in mind when we formed the group. Glad you found the Group helpful.

Cheers,

Robert (aka Honus)

10:09 AM  
Anonymous Anonymous said...

Thanks for writing this.

2:20 PM  

Post a Comment

<< Home