Depending on their readout circuitry, CMOS image sensors can be classified into two categories: Passive Pixel Sensors (PPS) and Active Pixel Sensors (APS).
- Passive-pixel sensors (PPS) were the first CMOS image sensor devices used. In a Passive Pixel Sensor architecture, the photodiode is left floating for a certain amount of time, called integration time, where an electric charge is generated across the photodiode. At the end of the integration time, this charge is then carried off the sensor and amplified. This particular setup requires just one transistor which makes the PPS design small and easy to implement. The photodiode can take up more space in relation to readout circuitry and therefore the fill factor of PPS designs is larger than the fill factor of other designs (higher quantum efficiency). On the other hand, the downside of PPS designs is that they are slow to readout, lack scalability and provide a low signal to noise ratio (SNR) – resulting in higher noise. Here is an example for a passive pixel sensor circuit.
- Active Pixel sensors (APS) apply a slightly different approach to convert light into an electric signal. The charge accumulated in the photodiode is readout by sensing the voltage drop across the photodiode with a so-called source-follower transistor. Also, APS designs feature some additional circuitry to perform on-pixel amplification and noise-cancellation. It is this active circuitry that gives the active-pixel device its name. The APS design cancels some issues associated with the PPS design as APS designs are fast to readout and provide a higher signal to noise ratio (SNR) which results in lower noise levels. However, this design has some drawbacks itself. Due to its active and more complex readout circuitry, APS designs usually require three or four transistors, reducing space for the photosensitive area (or alternatively increasing total pixel size) and therefore providing a lower fill factor than PPS designs. It should be noted however that technology has advanced so greatly over the past years that pixel size and fill factor can not longer be considered a real problem. APS designs solve the speed and scalability issues of the PPS design. They generally provide a very economic power consumption and require less specialized manufacturing facilities. Unlike CCD sensors, CMOS-type APS sensors can combine the functions of image sensing and processing within the same integrated circuit. These advantages have made APS designs the dominant types of image sensors in todays applications, including digital cameras, digital radiography, military applications, and others. Here is an example of a traditional three-transistor (3T) APS design.
Basically each pixel needs a controller (RST) and voltage source (V-RST) for resetting, a voltage source for amplification (V-DD), a row select controller (SEL) and a column detector (COL). It can detect light as follows:
- The device is left in darkness so the photodiode is not illuminated and only an ignorable dark current can flow.
- The reset controller (RST) is turned on and off to make the voltage of the upper end of the photodiode to be V-RST, which means no signal – the pixel has been reset.
- The photodiode is exposed to light
- I. If no light is shining on the pixel, M-SF keeps on
- II. If light is detected, part of the charge accumulated on the upper end of the photodiode would leak to the ground (light current) and make the voltage decrease. Depending on the intensity of the incident light, the voltage of the upper end of the photodiode becomes different, then the resistance of Msf becomes different.
- The row select controller (SEL) is turned on and the current on the column detector (COL) is determined. A different resistance of Msf gives different current, which indicates different light power.
As the photodiode accumulates charge carriers as long as it is exposed to light, one requirement for 3T APS designs is to have the exposure controlled by a separate shutter. This would actually not be neccessary if all pixels could be read out at the same time, however in reality the readout process consumes some time so the first line of pixels gets addressed first while the lower lines get addressed later. This type of readout process is called rolling shutter because the activation of the select controllers is done sequentially from the top line to the lowest line in a progressive rolling fashion.
Conversely, if the pixel itself should have the ability to freeze signals until the selector line is activated regardless of the readout speed, an additional transistor is required. This requirement of an electronic shutter has led to the four-transistor (4T) APS design as shown below. The 4T APS design differs from the 3T APS design in that it has a global shutter controlled by a sample-and-hold-transistor (M-SAH). While the 4T APS design is a further improvement of the traditional design and offers more functionality, the additional SAH transistor again increases pixel size and reduces the fill factor which makes 4T designs more vulnerable to noise. Here is an example of a four transistor (4T) APS design.
A photodiode only responds to certain wavelenghts depending on their semiconductor materials. Their sensitivity can be changed by using different substrates and different dopant materials to modifying their photoelectric properties. In optical communication (optical fiber cables), the photodiodes used for signal detection often rely on a binary compound like Indium Phosphide (InP) for the substrate material instead of Silicon (Si). Indium Phosphide can be doped in the same way as Silicon, the only difference lies in the dopant materials. InP substrate doped with Zinc (Zn) results in a p-type InP material, InP substrate doped with Sulfur (S) or Tin (Sn) results in n-type InP. For the active photon absorption layer, various materials can be used, depending on the wavelenghts that need to be detected. Here is a list of some commonly used detector materials, their spectral response and other characteristics:
|Silicon||(Si)||low dark current, high speed||good sensitivity between roughly 400 and 1000 nm||used to detect visible light|
|Germanium||(Ge)||high dark current, slow speed||good sensitivity between roughly 900 and 1600 nm||used to detect infrared light|
|Indium Gallium Arsenide Phosphide||(InGaAsP)||low dark current, high speed||good sensitivity roughly between 1000 and 1400 nm||mainly used for infrared detectors, expensive production|
|Indium Gallium Arsenide||(InGaAs)||low dark current, high speed||good sensitivity roughly between 900 and 1700 nm||mainly used for infrared detectors, expensive production|
As described, the different semiconductor materials do not all react to the same photon wavelenghts but rather only respond to a unique photon energy (wavelenght). Silicon is a material that is suitable to respond to the wavelenghts of the visible light spectrum. However, not every photon will always contribute to a photocurrent but there is a ratio between the generated photocurrent to the incident light power expressed in amperes per watt (A/W). If every single photon resulted in a current, the A/W ratio would be 100% or 1. In reality, the spectral responsivity for a given material depends on the exact wavelenght of incident light and can be expressed in a spectral responsivity graph as shown below. The dependence on the wavelenthgs may also be expressed as a quantum efficiency or the ratio of the number of photogenerated carriers (electron-hole-pairs) to incident photons.
Compare this to the spectral responsivity of Indium Gallium Arsenide.
One problem of photodiodes is that they are unable to detect photon colors. Each photon with an energy larger than 1.11 electron volts (this energy level is determined by the band gap of silicon) has enough energy to produce an electron-hole pair in the intrinsic layer. This means that all photons of the visible light spectrum have sufficient energy to let the photodiode react to them. Electrons however cannot be distinguished from each other as they all have the same energy. Without some tricky feature, a camera sensor would not be able to detect color and could only create monchrome images, even if the most dazzling colors have illumiated the sensor.
The design feature that allows photodiodes to detect color is a piece of color filter applied in front of the diode. A color filter is only transparent for certain wavelenghts while photons with other wavelenthgs will be reflected from the filter. Camera sensors typically possess color filters of the base colors red, green and blue because all other colors can be created later by additive color mixing. In addition, each color filter provides a certain tolerance so that not only photons of one precise wavelength (such as blue with only 460nm) will be allowed through but also near-blue photons. Still, photons that have passed the color filter will only produce a current (with its intensity depending on the brightness of the available light) with no color information included. The crucial factor is now that the image processor knows which color the photons must have had when exciting the photodiode because the color filters are placed in a fixed pattern.
So initially fine color shades are discarded because a photon with aquamarine color will eventually pass a blue color filter and will therefore simply be registered as a full blue particle of light. However, considering that a photon with aquamarine color has a wavelength that is somewhere inbetween blue and green, such a photon has the same chance to pass a green color filter and get registered as a full green particle. Imagine a million aquamarine photons shining on an image sensor, approximately half of them are going to be registered in green pixels while the other half are going to be registered in blue pixels. The precise wavelength determines the distribution of photons to green and blue pixels. Imagine light with turquise color to be shining on a sensor where the photons wavelength has 60% green content and 40% blue. This very distinct color will result in a distribution of electrons with a ratio of 60 (green) to 40 (blue). The same happens to most other colors. Whenever inbound photons have a special wavelength, there will be a specific distribution (for example 70 red to 30 green for orange). The post processing of the image – called interpolation – will then calculate which precise color shade will result from the single relationships. There are exceptions however. Not every color can be described by one wavelenth. Pink color does not have one wavelength but rather is a combination of red and violet light. Only the subjective perception will subliminally add these two different wavelengths to pink color.
A digital camera sensor always uses a given arrangement of red, green and blue color filters depending on the type of color filter array (CFA). The most common color filter array used in today’s digital cameras is the Bayer Color Filter as seen below. This type of color filter arrangement actually consists of 50% green filters while the remaining half is split into 25% red and 25% blue. The reason for this uneven distribution of colors is the nature of the human eye that is more sensitive to green wavelenghts. This means that green contributes more to the perception of brightness and contrast than the other colors. The design of the Bayer Color Filter takes this fact into account and helps getting more natural results.
The color perception principle has one significant disadvantage. As a color filter is only transparent to a small spectrum near its basic color, roughly two thirds of the incident photons cannot pass the individual filter and will be reflected. Any color information available in these reflected photons is simply discarded and cannot be perceived by the pixel. Looking at an individual pixel with a green color filter, the resulting photocurrent allows conclusions about the intensity of green light, however there is absolutely no information on whether there was also red or blue light shining on this pixel and at which intensity. But if you look at the neighboring pixel with a red color filter applied, this time information on the intensity of red light is available, however no information on green and blue light. The blue pixel lacks information on green and red light. The camera’s image processing unit will use this fact to calculate missing color information based on the color intensities of the adjacent pixels. This calculation presumes for a given number of pixels that photons of any color should also have illuminated the pixels inbetween. The following examples will clarify the interpolation process: The figure below shows a small segment of a camera sensor where pixel no. 3 acquires information on green and blue color.
The interpolation has finally calculated a green value (G3) for the middle pixel. It shall be assumed that this middle pixel has a red color filter applied so that red information is already there without interpolation. This means that there is still a blue color value missing. Information on the intensity of blue light can be obtained by interpolation from the blue pixels diagonally adjacent to the middle (the ones without numbers).
As the final image is created step by step from three mosaics in the basic colors, the interpolation process is also referred to as demosaicing. There are numerous interpolation methods including more complex calculations to obtain missing color information. More complex calculations however require more computing effort and therefore faster camera CPUs. The figure below shows an unprocessed image with sensor raw data and the resulting image after demosaicing.
In digital photography, image noise describes an interfering signal similiar to noise in radio technology. Noise becomes noticeable in the form of spots with increased intensity in areas that would normally be illuminated uniformly. Just like noise in audio signals, it is an undesired phenomenon in photography and can degrade image quality significantly.
Noise generally affects the entire picture. For a particular pixel, noise adds to the available signal caused by photons and pretends to be light where normally the light intensity would be lower. Even though noise is unavoidable, its relation to the light signal can be so small that noise will not be experienced as unpleasant. The signal-to-noise-ratio (SNR) is a universal way of comparing the relative amounts of signal and noise. A high SNR will have very little visible noise while a low SNR will show clear noise interferences. There are various reasons for noise to occur and factors that influence the noise performance of a camera sensor. This paragraph will describe the reasons and forms of the most common types of noise as well as sophisticated methods of noise reduction. These are the most common types of noise:
- Dark Noise: This type of noise is always present, even in totally dark surroundings where no photons illuminate the sensor. The reason for dark noise is basically the supply voltage that is necessary to drive the sensor and to allow signal detection. In reality, the reverse bias voltage applied to the photodiodes does not perfectly prevent a current from flowing but sometimes allows a tiny ‘phantom current’ to leak at random. Any leaked electrons cannot be distinguished from those excited by light and will also contribute to the readout signal. Also, despite silicon being predominantly responsive for visible light, a photodiode can also be excited by thermal radiation. Thermal radiation simply is the effect of heat and the warmer an image sensor gets, the more dark noise is produced. Thermal radiation can be available from the surrounding (e.g. using the camera on a sunny day) but can also be induced by the operating circuits (e.g. the reset of a photodiode causes thermal radiation and therefore thermal noise). For typical exposures under five minutes, dark noise is typically negligible but it becomes prevalent on night photography where long exposures need to be made. The image below is an example for dark noise.
- Fixed Pattern Noise: Fixed pattern noise (FPN) describes a particular noise pattern on digital camera sensors where individual pixels tend to give brighter or darker intensities derogating from the general dark noise level. Another difference between FPN and dark noise is that FPN does not occur randomly but is characterized by the same pattern of ‘hot’ (brighter) and ‘cold’ (darker) pixels occurring with images taken under the same illumination. The reasons for the existence of FPN are tiny differences in the individual responsitivity of the photodiodes that might be caused by slight production variations concerning the precision of the pixel size or photodiode material. Again, this type of noise is also more prevalent on pictures taken with longer exposures. The image below is an example for fixed pattern noise.
- Banding Noise: Banding Noise is often caused by sensor readout, downstream amplification, analog-to-digital-conversion and high-frequency components. For this reason, banding noise is also referred to as readout noise or bias noise.
- sensor readout: Primarily, banding noise is caused by the readout electronics including the transistors involved in that readout process. Just like photodiodes, the transistors included in the individual pixels used to read out signals can also be affected by slight production variations due to their tiny sizes. Imperfections in the base silicon or in the template and etching can all affect the response of transistors. As such, each pixel in a sensor will not necessarily behave like all the rest when read out, producing differences.
- downstream amplification: Some readout designs also involve an additional downstream amplifier used in certain circumstances, used in addition to the on-pixel amplifiers. Banding noise introduced within the sensor die itself will be exacerbated by any downstream amplifier. These kinds of amplifiers usually kick in at really high ISO, such as 6400 and higher, which is why relatively clean output at ISO 1600 and 3200 suddenly becomes much worse at even higher settings. Derogations of column amplifiers can also result in vertical stripes of higher light intensity running across the image (see below).
- analog-to-digital-conversion: Another source of banding is the analog-to-digital converter. Some digital cameras like the Canon EOS 7D use ‘split parallel readout’ where four readout channels are directed to one DIGIC 4 image processor and another four channels are directed to another DIGIC 4 processor in an interleaved fashion. This can induce vertical banding due to different responses of the DIGIC image processors that each contain four analog-to-digital-converter units. As even bands are sent to one DIGIC’s analog-to-digital-converter units and odd bands are sent to the other DICIC’s analog-to-digital-converter units, a perfectly identical processing is unlikely and slight differences manifest as vertical bands.
- frequency: The final factor to contribute to banding noise is the use of high frequencies. High frequency logic has a tendency to be noisy. Digital cameras with both high resolution and fast shutter rates such as the Canon EOS 7D will have to process large quantities of pixel signals in a limited amount of time. At eight frames per second and with a resolution of 18 million pixels, the total number of pixels processed per second must be at least 144,000,000. Even with eight analog-to-digital-converter units avaiable in the Canon EOS 7D, each unit must process 18 million pixels every second. That requires a higher frequency than slower framerates, which can introduce additional noise. The image below is an example for banding noise.
- Photon Noise: Photon noise, also referred to as Poission noise or shot noise, is a fundamental property of all light sources. A light source with a uniform brightness does not emit photons in a uniform way but is rather pulsating while photon groups of different polulations are emitted. Also, the timing for new light pulses to occur is not predictable by the observer. The question on how many photons can be expected during a given timespan can only be answered by probability theory. The probability can be calculated by the Poission distribution. Photon noise is independent from other noise sources and constitutes the dominant source of image noise in bright-light situations. The image below is a greatly exaggerated simulation of photon noise as this type of noise can only hardly be perceived in practice.
As nearly all digital cameras are affected by noise, those cameras often include components to either reduce noise or to refine pictures instantly after a photo has been taken. However, even if the camera does not perform automatical noise-reduction, digital images can still be improved later by suitable computer software. Here is a description of some common noise reduction techniques:
- Dark Noise Reduction: As dark noise is mainly produced by thermal radiation, some camera sensor designs include cooling elements behind the array to actively decrease the sensor’s temperature. Also, dark noise can usually be reduced by processing software.
- Fixed Pattern Noise Reduction: The reduction of fixed pattern noise usually involves a technique known as ‘dark frame subtraction’. In many digital cameras, this type of noise reduction is performed automatically for exposures longer than one second. Dark frame subtraction describes a dual exposure of the sensor. The first exposure is with the shutter open so that the scene will be recorded in the usual way. After the sensor has been read out, another exposure of the same length is made with the shutter closed to capture the noise pattern produced by the sensor in pitch-dark. This noise pattern which includes hot pixels is mathematically subtracted from the first picture, leaving it virtually free of fixed pattern noise.
- Banding Noise Reduction: As described above, banding noise has several reasons and therefore, many factors can be considered to reduce it. Most of these factors can only be influenced by the manufacturer while there is one factor that can indeed be controlled by the photographer.
- As the downstream amplification is a multiplicator of both the signal and noise, lower amplification by using lower ISO numbers is the most effective action to reduce nearly all types of noise.
- Another way banding noise can be reduced is to convert the analog signal to digital earlier, preferrably on the camera sensor itself. Analog signals tend to accumulate noise the more they travel through readout electronics and amplifiers, whereas digital signals can be transferred without a loss of quality.
- An increase in the number of analog-to-digital-converter units improves parallelism, reducing the speed each unit must operate at and therefore allowing lower frequency components to be used. Reducing the frequency of processing units contributes to lower the camera’s vulnerability to noise.
- Improved manufacturing techniques as well as better silicon wafers can be used to normalize the response curve for each transistor or logic unit, allowing them to produce cleaner results, even at higher frequencies.
- Photon Shot Noise Reduction: The only way to reduce the effect of photon noise is to capture more signal. Therefore, longer exposures are required, and the number of photons captured in a single shot is limited by the sensor’s saturation level. On the other hand, longer exposures in dark surroundings will result in other noise types such as dark noise. Due to the low effect of photon noise, it can typically be ignored.
As described above, the design of the photodiode allows incident light to generate a photocurrent that can be converted to a voltage for read-out. Although there is an almost linear relationship between a photodiode’s irradiance and the generated photocurrent, there is a threshold that photocurrent cannot exceed regardless of the photon energy available. The highest possible current defines the saturation of a photodiode. The photocurrent becomes saturated when all photogenerated charge carriers (free electrons and holes) are extracted from the semiconductor.
This physical property is the reason why smaller pixels – often as a result from large sensor resolutions – only provide a relatively low saturation level. To achieve higher saturation levels, it is more desirable to have a resolution that is reasonably lower and offers an increased pixel size instead. In CCD sensors, where incident photons are converted to electric charge, each sensor element can store a maximum amount of charge known as the full well capacity. While modern camera sensors are designed to dissipate excess charge above the full well capacity, for very bright parts of the scene, excess charge from a saturated pixel can spill over to adjacent regions. This artifact, known as blooming, can lead to saturation in pixels that would not otherwise be saturated.
The image below shows the relationship between illumination energy and resulting photocurrent. The saturation point, also known as clipping, is clearly visible at the breaking point of the photocurrent function.
Reaching saturation will result in a fully white pixel. In addition, a saturated pixel will not be able to detect any more photons as it is already operating at it’s limit. There is a high probability for saturated pixels to be actually overflown and therefore they contain less information about the scene than other pixels. For that reason, it is generally recommended for a photo to choose the exposure setting so that the brightest region of interest falls just below the saturation point. On the other hand, underexposure leads to higher relative noise. Finding the perfect exposure requires balancing these contrary goals. The relationship between noise and saturation defines the dynamic range of the sensor and determines the range of irradiances that can be captured acceptably in a single exposure.
Definition: The dynamic range (also known as contrast) of an image refers to the ratio of the largest brightness value to the smallest brightness value. In other words, a scene has a high dynamic range if it contains both very light areas and very dark areas at the same time. A scene may also exhibit great brightness with minimal dynamic range because there are no dark areas, e.g. looking directly into the sun.
The dynamic range is a useful indicator to describe the possible range of measurement where a camera sensor or any other detector can be used. The dynamic range therefore relates to the ability of a camera to record simultaneously very dark scenes alongside bright situations. The dynamic range of a camera sensor is typically defined as the full-well capacity divided by the noise level, both indicated in electrons. The equation is therefore DR = Pmax / Pmin.
Dynamic ranges are usually specified by the logarithmic unit dB (decibels). The dB value expresses the factor by which the greatest brightness value is greater than the lowest brightness value. The ratio of two brightness values, l1 and l2, can be converted into a value DR in dB with the following equation.
DR = 20 x lg ( l1 / l2) dB
As described, the maximum signal (Pmax) is limited by the photodiode’s saturation level. By contrast, the minimum signal (Pmin) is limited by noise that superimposes low signals so that any lower intensities of light cannot be distinguised from that noise level. This relation is illustrated in the figure below.
Using some exact numbers as an example, it becomes clear what the information on a sensor’s dynamic range really means:
Full-Well Capacity (electrons)
Read Noise (electrons)
Dynamic Range Improvements
Here is an excerpt from a research article that describes the development of a 110 decibel CMOS image sensor. The description will give an idea on the approaches made to increase the dynamic range.
In this paper, a new visible image sensor with 110 dB intrascene dynamic range is reported. The sensor captures four linear response images with different sensitivities simultaneously at 60 frames per second (fps). A real time fusion and dynamic range compression (DRC) algorithm, which is implemented by an FPGA, is also presented. This algorithm can generate a high dynamic range image from the four images of different sensitivity, compressesthe composite image’s dynamic range to match that of a normal 8-bit display unit, and displays it at video rate (60 fps). Finally, an accurate measurement method in determining the sensor’s dynamic range is also presented.
[A High Dynamic Range CMOS APS Image Sensor, Photobit Technology Corp. Pasadena, CA 91101]
An analog-to-digital-converter (ADC) is a unit implemented within the readout circuit and plays an important role in the processing of digital images. The photocurrent or voltage registered by a sensor’s photodiode is an analog signal. In analog technology, a signal is registered in the form of a wave where it’s intensity can be expressed by an amplitude. The camera sensor itself produces a linear sequence of different amplitudes when being read out. However, to form an entire picture from this linear sequence of analog signals, computational processing is required which can only be achieved by digital values.
An analog-to-digital-converter allows to measure the amplitudes of the analog signals and translates them into discrete values, more precisely into binary numbers that can be used for digital signal processing. For this reason, the ADC is quintessential for the process of digital image formation as it prepares the signal to be readable by the digital signal processor (DSP). The figure below shows the functional principle of an ADC.
Regardless of it’s internal architecture, an ADC operates by comparing the unknown input signal to a known reference signal. The digitized output of the ADC is the ratio of the input signal to the reference signal times the full-scale reading of the ADC. One important fact is that an ADC cannot translate every possible amplitude into a binary number. It is rather limited by it’s bitrate that defines how precise the translation can possibly be. Therefore, the ADC’s precision can also be compared to a resolution. It is assumed, for example, that an ADC can only convert analog signals into one bit signals. Such an ADC would classify the analog signals as either pitch black  or fully white  with no further distinction inbetween. An ADC with two bit capacity would categorize them into four (2^2) groups: pitch black , fully white , and two levels inbetween [01 and 10]. Most consumer digital cameras use 8 bit ADCs, allowing up to 256 (2^8) distinct values for the brightness of a single pixel. Professional DSLR cameras use 10-12 bit ADCs, allowing up to 4096 (2^12) distinct values. These cameras typically provide the option to save the 10 or 12 bits of data per pixel in a RAW file because the commonly used JPEG files only include 8 bits of data per channel.
The minimum bit rate of an analog-to-digital-converter is determined by the sensor’s dynamic range. If the dynamic range of the sensor is for instance 60dB which corresponds to a range of 1000:1, the ADC should provide a resolution of least 10 bit (2^10 = 1.024 discrete levels) in order to avoid loss of information. Although a 10 bit ADC is a perfect choice in theory and any higher bitrate will not generate additional tonal information, in practice it can make sense to overspecify the ADC to 12 bits to allow for some margin of error on the ADC. It is also useful to have extra bits available to minimize posterization when applying the tonal curve to the linear data.
Some camera manufacturers use this fact to suggest that the digital camera captures images with a dynamic range defined by the analog-to-digital-converter. From the above it is easy to understand that this is only true if the sensor itself has sufficient dynamic range. The tonal range and dynamic range can never be larger than the dynamic range of the sensor.
This chapter was designed to give technically interested readers a general understanding of the physics behind a camera sensor and other electronic components related to digital image formation. If you are interested in some more facts on digital cameras, check out some other chapters!