Sigma Fp-L Rumor: Let's Talk About Sensor Tech

**Update: Obviously now we can safely say this rumor was entirely false, but this is still worth a read for a number of reasons**

The rumor mill is churning – as it does every day – with specifications of the upcoming Sigma Fp-L. The release of this successor model itself is almost certain, but these sensor specs are not. However, I wanted to talk a little bit about the sensor and its design and why it’s very exciting technology – and also how it’s based on a 30-year-old innovation.

*I need to note beforehand, this is a complicated topic, especially given the little information we actually know to be true – combined with technology that has almost certainly changed over a few decades. Not everything here is guaranteed to be the resulting product. I will also be using the terms “pixel” and “photosite” interchangeably as it refers to the sensor design; this refers to the sensor pixel, which due to interpolation, is different from pixels in the final file.

The sensor – named IMX513BQR and made by Sony (shocker) – has a 60.75MP effective resolution. It’s rumored to be the replacement for Foveon X3 — presumably because Foveon, while amazing in many ways, has a significantly limited shooting envelope; it doesn’t shoot video, it’s unusable for shooting color above ISO 400, maybe 800 depending on your threshold of acceptability. I’m not entirely convinced this is a Foveon replacement, but I’ll get to that later.

This new sensor, with a CFA layout dubbed “Sony ClearVid,” supposedly changes things up by rejecting the traditional square, side-by-side arrangement of pixels, instead opting to rotate the pixels 45 degrees resulting in a diamond-like orientation.

Why? Because by rotating the pixels, you’re now able to pack more together. The actual pixel size remains the same, but the horizontal and vertical pitches are reduced. The below diagram shows this concept. The design also increases vertical and horizontal resolution (at the expense of diagonal resolution), which is a benefit because a majority of the world exists in horizontal and vertical planes (thank gravity for that).

 

Rumored design of the Sigma Fp-L sensor. Source: DIY Photography

 

Why did I say “supposedly changes thing up”?

Because the rumors are fairly vague and not at all detailed and I’m not sure if they’re using the term pixel when they’re actually talking about the photodiodes of the CFA.

At any rate, this layout is theoretically great for a number of reasons: 1) you have the aforementioned superior vertical and horizontal resolution and 2) for a given pixel size you can pack more pixels into the same area. Essentially, for a given pixel size, you’re able to obtain greater horizontal and vertical resolution while retaining the larger pixels of what would be a lower-resolution sensor with the normal orientation.

By retaining the same pixel size, you also retain the same full-well capacity while simultaneously obtaining a higher resolution – something not possible in traditional orientation. The result is that each photosite is able to collect more light before saturating (larger pixels = more light per pixel, think of them like buckets collecting rain); this means you have a greater SNR. In theory, that should mean better dynamic range compared to the essentially identical resolution sensor found in the a7R IV (if it is indeed a 61MP sensor), but at the least it means superior low-light/noise performance assuming the CFA itself allows the same amount of light to pass through. Which, as I’ll note later, it might not.

The confusion of its actual resolution comes from whether or not the photosites themselves are rotated 45 degrees or whether they’re oriented as usual but with the CFA photodiodes rotated. Given the rumors of a 30MP green channel binned output as well as a 120MP mode, plus the effective 61Mp resolution, my guess is the pixels are indeed rotated. Since the rumor specifies a 3.76-micron pixel pitch – which is 61MP in a traditional layout, but when rotated would allow the packing of more pixels – I think this is where the “effective” resolution comes from.

This is not unlike the love-child of SuperCCD/EXR (we’ll get to it) and Foveon technology. The 30MP output should be equivalent, in theory, to a 30MP Foveon sensor, which Sigma has always claimed results in an equivalent bayer filter output with twice the spatial resolution – except this time, there actually are that many pixels and the Foveon-like output is half the spatial resolution, not double. It’s sort of a reverse Foveon, where each photodiode is sampling some light from each color channel, but not with the layered photodetectors for each spatial location as in a Foveon design.

Some History

Sony introduced their “ClearVid” (aka “Q67”) CFA in the mid-2000s, first used in some kind of camcorder. The design was later used, most notably, in the Sony CineAlta F65 cinema camera (still a current model), which had a 20MP sensor capable of outputting 8K resolution. The answer to “how?” – since 20MP isn’t nearly sufficient for a direct 8K readout – is exactly the same as this new 61MP sensor outputting 120MP. However, it should be noted that the sensor pixels themselves were not rotated in the original ClearVid design, only the photodiodes of the CFA.

But Sony didn’t invent this unusual though undeniably exceptional technology. They rode on the coattails of someone else…

Before Fujifilm was using Sony sensors in their own bodies, they were putting their own sensors in someone else’s bodies*. These sensors – deemed “Super CCD” – varied from model to model over the years; in the S1 Pro the photosites of the 3.1MP sensor took the form of a honeycomb tessellation, oriented diagonally. This translated to the same result of higher vertical/horizontal resolution. (I actually have an S1 Pro, of which I am working on an article about – it’s an interesting experience to use, to say the least).

*The S1 Pro, S3 Pro, and S5 Pro were based on the Nikon N80 (aka F80) 35mm body, modified with an extended base to house six AA batteries; essentially like an integrated vertical grip, but without the vertical controls.

 

The Fujifilm S3 Pro, which I had for a brief time.

 

A third generation of SuperCCD matrices (there were seven or eight generations over time) appeared in 2003, but in two different designs. The first, dubbed SuperCCD HR (high res), has pixels placed adjacently in the 45-degree orientation previously described, though since they were octagonal (unlike Sony’s ClearVid) the layout is a bit different. In order to produce a traditional image file – which is in horizontal/vertical planes (rows and columns) – the camera would have to interpolate, merging the two adjacent photosites. The recorded file can’t exist in the same zig-zag pattern and only half of the positions are filled. So, you end up with half the spatial resolution – aka twice as much. This is what happens when you output a 120 megapixel pattern of the sensor, so each line is read out and there are indeed twice as many vertical and horizontal lines. Normally a 2x linear increase would result in 4x the spatial resolution, but due to half of the positions being empty, you end up with 2x the spatial resolution.

 

Fujifilm SuperCCD variations (some of them, anyway)

 

The other third gen design, called SuperCCD SR (“Super dynamic range”), integrated two photodiodes per photosite – one of them larger than the other. The result is increased dynamic range since one photodiode (the larger one) is more “sensitive” and can operate at a lower charge, while the other smaller site is, by its nature, less sensitive. The idea is to combat blooming and the tendency of CCD sensors to easily clip highlights – and if you’re able to do that, you’ve increased your SNR (dynamic range). This isn’t unlike the design of dual gain output (DGO) sensors used in cameras like the Canon C300 Mark III, Canon C70, and Arri Alexa’s ALEV sensors, which read out each photosite at two different amplifications – one at normal exposure, the other at lower amplification to gather what would normally be clipped data. These signals are then fed into the ADC and blended together.

This wasn’t just a gimmick or theoretical. The S3 Pro, introduced at the beginning of 2004, was measured by DXO Mark with a dynamic range result of 13.5 EV. Let me repeat: 13.5 stops of dynamic range on an APS-C sensor in 2004.

To put that in perspective: it wasn’t until four years later in 2008 that its score was bested (by the Nikon D3X at 13.7 EV). In 2011, it was still 5th highest scoring camera for dynamic range. One spot above it, at 13.6 EV? The 80MP, $42,000 Phase One IQ180 medium format camera introduced in 2011. Every other medium format camera on the market was behind an APS-C camera from 2004. It should be noted that other cameras, mainly full-frame or APS-H, had superior color depth and low-light performance, but none could equal the Fuji in dynamic range. In 2005, the closest competitor was the Nikon D200 at 11.5 EV – a full two stops behind.

So, the dynamic range claims are not empty platitudes, they are borne out by testing. In fact, these cameras (the S3 and S5 Pro) were very popular with wedding photographers for this very reason (and their use of Nikon’s F-mount).

Let’s Talk About Other Filter Arrays

Again, Fujifilm. In 2007 or 2008, Fuji announced a new CFA – the EXR – whose goal was to take the advantages of the SuperCCD HR and SR technologies and combine them. As mentioned, the HR (high res) used photosites at 45-degrees and then interpolated these adjacent pixels into a double-resolution file. When combined with the SR (super dynamic range) tech, the camera could also use the dual read-out channels that record two separate exposures which are then merged – the entire result is a sensor with greater dynamic range that can be interpolated to a higher resolution. The combination is in many ways akin to a hypothetical Foveon/dual gain output hybrid.

Originally, the EXR technology was in a number of cameras (mostly small sensor compacts) using CCD technology. In 2011, Fuji debuted EXR CMOS – I believe first in the 2/3-inch sensor X10, which was a new line that would go on to include the X20, X30, and X70. The X10 and X20 cameras were basically the original X-Pro, except with a fixed zoom lens – a very good one too (I still own an X20) with a fast f/2.0-2.8 aperture. They had an optical viewfinder like the X-Pro, though no switchable EVF. The X30 later moved to EVF only, which was unfortunate, and abandoned the EXR technology in favor of… X-Trans. The X30 marked the end of SuperCCD and the beginning of Fuji’s continued use of CMOS technology.

A Rare Medium Format Camera

An interesting note in the (again rather vague) rumor is this is the same CFA used in Phase One’s IQ3 100MP Trichromatic back. The underlying sensor is certainly not – the resolution isn’t sufficient for a 61MP FF cut-out. Unfortunately, Phase One is incredibly reticent when it comes to revealing the actual design of the Trichromatic’s CFA.

 
 

This is about as much as Phase One has revealed, which isn’t much. All we really know is that it uses a stronger array to filter out the unwanted signals, thereby reducing false color, fringing, UV light pollution, etc. However, the wavelengths that the sensor picks up at each photosite are reduced, which is why the camera has a low base ISO of 35 vs. the regular IQ3’s ISO 50. It’s not a fabricated or extended ISO 35, however; the CFA cuts out more light, but it also cuts undesired signal. Since the pixel wells don’t receive that signal, it can be tuned to a true ISO 35. At any rate, it’s wholly indeterminable whether this is actually the same CFA or not; graphics like above are as much information as we have and still doesn’t answer the question.

A Cinema Camera

As mentioned before, Sony’s ClearVid technology is used in the Sony F65 cinema camera, which can produce an 8K video file despite its 20MP sensor. It can also bin down to 4K.

 

Sony F65 Layout

 

Conventional bayer arrays are arranged in a 2x2 RGGB grid – that is, four pixels, two green, one red, and one blue. The F65 ClearVid sensor, however, is arranged in the “zig-zag” manner previously described, with CFA photodiodes at 45 degree angles. So, instead of one pixel sampling only one color and using neighboring pixels to guess the color, each pixel can sample full green plus some red and some blue which overlap neighboring pixels. The above image shows this design and you can see what I previously mentioned – due to the zig-zag pattern, you can output an image with 2x the spatial resolution by sampling from twice as many horizontal and vertical lines, but only half of those positions are filled. You can see this empty space between each alternating photosite in the below photo. Note how half of the crosshairs are over nothing, yet there are still more photosites packed into the same total area. This is how it can produce 8K resolution with a 20MP sensor.

And since each pixel receives full green information – at a 4:1 ratio to red/blue – it can do a clean green channel output with half the spatial resolution (30MP in the case of the alleged Fp-L sensor). The below image – from a previously rumored 48MP “Foveon” sensor (Sony IMX311AQK) – displays a 24MP green channel image. It also shows how each pixel can sample R, G, and B light at varying ratios, which is not unlike what Foveon does. Foveon is in quotes because Foveon sensors are designed with stacked layers in the silicon, with each pixel receiving all visible light – though depending on if it’s red, green, or blue, it only penetrates the layers to a certain depth.

 
 

If this technology is so great, why hasn’t Sony used it yet?

A fair question. One issue is that the design works very well in a CCD (charged coupled device) sensor via binning and combing charges from neighboring pixels. When implemented in a CFA, you can’t combine the charges from pixels of alternate colors. Foveon X3 arrays were invented largely to address this issue – the goal was to build on the inherent advantages of CMOS sensors but also address the problems caused by conventional bayer pattern arrays. These advantages are numerous, from being able to capture all available light in a single spatial sampling site, to reducing color cross-talk and eliminating aliasing. The latter was a particular issue in the early days of CMOS technology, due to their lower resolution. The solution was to add an OLPF (optical low pass filter) but that’s a non-ideal solution as it results in a resolution reduction below the theoretical Nyquist limit of the sensor. Fuji addressed this with X-Trans, which eliminated aliasing without the need for an OLPF, but also had wildly varying results in the demosaicing process.

But as resolution has risen, AA filters have begun to disappear – this is why you almost never see them in a camera over 24MP – the higher the resolution, the more lines available to distinguish between different signals when sampling. Similarly, a higher resolution sensor requires a sharper, more aberration-free lens to induce aliasing (softer lenses induce a similar effect as an AA filter, plus additional aberrations).

There’s also the matter that significant gains have been made in CMOS technology – particularly low-light and resolution – since Sony’s ClearVid / Fuji’s EXR technology was last used. In fact, Fuji released very few EXR CMOS cameras (e.g. the X10/20) before jumping ship over to X-Trans. We’re now at a point where we use a 61MP sensor and pixel bin down to 30MP for clean color as well as interpolate up to 120MP. And it’s the 30MP mode that is truly appealing – which is something that prior ClearVid/SuperCCD/EXR sensors did not provide, almost certainly because the sensors were natively 12MP at best (most were 3 or 6MP) and binning would have resulted in a 6MP file. Not exactly appealing at the height of the megapixel race.

Is this a replacement for Foveon?

I don’t think so. What I believe is going on (and this is purely my opinion) is that Sigma’s advancements with Foveon have stalled for the time being – they’ve admitted that they were working on a new full-frame Foveon sensor, but have taken that idea off the table at the moment because it just isn’t working.

Part of this, I believe, is their new foray into video-centric cameras a la the Sigma Fp. And Foveon simply isn’t capable of keeping up with other sensors when it comes to video, more than anything else. Low-light and video are the Achille’s heel of (current) Foveon technology. This wouldn’t be an issue with this sensor, especially since low-light capabilities shouldn’t be affected, though if it is indeed the same CFA as in the Trichromatic, it will be less sensitive once it filters out unwanted signals. But the sensitivity reduction is still a far cry from Foveon.

I think a new Foveon sensor is still in the works, but it just simply isn’t there yet. I see this as a bridge between traditional bayer sensors and Foveon until such time that the new Foveon tech is capable of whatever Sigma’s goal is.

I don’t think Foveon is dead. I hope it isn’t. But I’m also excited to see if this sensor – or a similar one – comes to fruition. I do wonder if there will be issues, at least to begin with, in the demosaicing process – not unlike X-Trans suffered from for many years before, most notably, Adobe was able to produce results like Iridient or Capture One. Sigma made great strides with the introduction of an optional DNG file output in their Quattro and Quattro H bodies, allowing for a no-nonsense, direct-to-Photoshop (or Lightroom) workflow. In that case, I think the demosaicing process took place in-camera, as opposed to Sigma Photo Pro.

Either way, I’m interested to see what this technology does for us in 2021. My S1 Pro is now over two decades old and, to put it mildly, things have changed since then. My prediction is that this has very promising potential, especially with the 30MP output – which in itself is more resolution than many cameras on the market. ACR’s new “Enhance” feature (which I have yet to try) has made me begin to question how necessary 60MP+ files really are anymore. I do wish more 24MP or under sensors were made without low-pass filters, though.

At any rate, technology still presses forward as always, and this is yet one more exciting example.