Do grayscale images take less space?

lu.sagebl.eu

60 points by surprisetalk a year ago

In JPEG*

The title should reflect that. Obviously this is misleading otherwise, there are a million ways of representing grayscale in data.

Sure, you can argue just picking a color channel and making it the grayscale source is not gonna work as expected, but what if we're talking about the data off a pan sensor vs an RGB array? What about a pan sensor created by taking an RGB sensor and removing the filter so it's producing an image with 3x the resolution but in pan?

This is a complicated question.

kadoban a year ago

> In JPEG*
This should end up being true for any presentation image format (the one you're going to actually show people, not necessarily the one you might do editing work on) optimized for human vision as well.
It's more a property of our eyes than a quirk of JPEG. So I don't find it misleading personally.
- AnarchismIsCool a year ago
  
  I'd argue compressed file formats make up a lot less of the broader problem space than most people realize. If you're doing anything high performance (where the difference matters) but not immediately consumer facing you're spending a lot of time working with lossless/uncompressed imagery in ram/disk. Stuff like MIPI->CUDA for classification, this question can go a number of different ways very quickly.
  Keep in mind, at least in my experience, if you're even remotely thinking about doing anything with grayscale and this question comes up you're probably doing something...interesting (read: multi band remote sensing shenanigans).
dTal a year ago

> What about a pan sensor created by taking an RGB sensor and removing the filter so it's producing an image with 3x the resolution but in pan?
This got me thinking about how subtle and slippery the concept of "resolution" is.
The relationship is not strictly 3x; an ideally demosaiced Bayer filtered image still contains some information at the original capture resolution, because R, G, and B are correlated in nontrivial ways. Nyquist is a nice and simple result for the special case where nothing can be predicted about the original function, measurements are assumed to be perfect, and the criterion for reconstruction is "exact" - but in the general case of "recovering information, more is better" even a simple interpolation procedure with tame priors like "luminance exists" might perform so much better than the naive approach that we'd be silly not to use it.
Even the Nyquist frequency itself is less fundamental than it first appears; a less well known result is that a function can also be perfectly recovered if sampled at half the Nyquist frequency, if the slope is captured as well as the value. In other words you can losslessly convert a 2MP image into a 1M value+slope data matrix and back. What resolution is it? Depends on your point of view.
- krapht a year ago
  
  The same idea is used when doing complex IQ sampling of radio frequency data. I don't know if it's as obscure as you say, it's just not commonly considered when dealing with plain cameras. It's a bigger deal when working in the RF domain, or SAR imagery...
generalizations a year ago

Yup. "Do boolean matrices take up less space than fp32 matrices?"
- financltravsty a year ago
  
  Depends on compression ;)
  - generalizations a year ago
    
    Heh. Takes a scifi-level compression algorithm to make 32 bits comparable to 1 bit...or you'd need to get really lucky with your sparse matrix ;)
    
    kbelder a year ago
    
    But the other way around, it's all too common that a 1 bit array is stored equivalently to a 32 bit array.
    
    generalizations a year ago
    
    Yes, indeed. If a larger data type is used to represent the information, it will most certainly take up more space.
GuB-42 a year ago

This would be the case for any lossy format with decent compression. We are much less sensitive to color than we are to brightness so we can compress color much more than we can compress brightness for the same perceived quality.
Not only that but color and brightness are often correlated, and advanced compression techniques use that property, see "chroma from luma" in AV1.
Without compression, indeed, it can be 3x, or even 4x (for alignment).

dynm a year ago

This post jumps into using "4:4:4" / "4:2:2" / "4:2:0" notation without seeming to explain what it means. Would be very helpful to add a short explanation!

genpfault a year ago

> "4:4:4" / "4:2:2" / "4:2:0" notation
https://en.wikipedia.org/wiki/Chroma_subsampling
lifthrasiir a year ago

And that notation is not really intuitive even when you do understand what it means. For example the initial 4 is technically a denominator of other two numbers, but it is not intended to be normalized! A consistent notation would use the reciprocal instead, say, "1-1" for 4:4:4, "2-2" for 4:2:2 and "1-∞" or "1-x" for 4:4:0.

vander_elst a year ago

I don't understand the point of the article, it seems to me that the point is that non compressed grayscale images don't take a third of the space of a compressed image? If we look at the last example the fist image 4:4:4 is 150KB the one in gray is 50KB (it seems exactly 1/3), the compressed images in-between take less space, but also the quality is much worse, I would say as expected. What am I missing?

morsch a year ago

I appreciate the article, but isn't the example image of a grey bird on cobblestone looking on a dark pool of water an odd choice to illustrate color subsampling?

immibis a year ago

The Y channel of the 4:4:4 "sub"sampled image looks sharper than the other comparison images. Look at the ground next to the bird's feet. I perceive it as sharper in the 4:4:4 image even though it's just about a single colour and the luma was supposedly not subsampled.

wmil a year ago

Yeah the "without visual impact" bit is a lie. People working in image / video compression are in denial about the noticeable quality drop with 4:2:0. 4:2:0 is extremely noticeable if your computer monitor gets into that mode.
There are some major failure cases involving alternating single pixel width red / black lines. That caused issues with DVDs because the format also used interlaced frames, meaning things like red tail lights at night could end up looking awful.
Luckily AFAIK they've given up on interlacing.
I really think with modern tech they'd be better off trying to compress 4:4:4 instead of immediately throwing out a lot of colour information.

arh68 a year ago

I remember white/black-dithering (at even twice the dpi) being a lot smaller than the greyscale jpeg, when I was scanning documents. Greyscale was a bit smaller than full color, but dithered was a big step up (er, down).

kccqzy a year ago

When I scanned a lot of documents about ten years ago, I empirically found that JBIG2 is so much better than anything else including JPEG. That's specially optimized for scanning text documents.
crazygringo a year ago

For text documents with line drawings? For sure, absolutely -- especially when it's JBIG compression like fax machines use (or even Group 4 before that).
If it's a full scale photograph full of grayscales that's been dithered, then it's a harder comparison to make. Because obviously a ton of useful detail is lost in the dithering, in a way that isn't with lines and letterforms. So the comparison would really have to be with a terrible, blocky JPEG super-compressed file. In theory the JPEG should win.
gwbas1c a year ago

What were you storing? If the image was predominately black and white, this would make sense.
omoikane a year ago

I was honestly expecting this to be an article about grayscale versus dithered black and white, where the answer to "do grayscale images take less space" could go either way. Instead, it's an article about grayscale versus sRGB.

xxr a year ago

Somewhat related, a few years ago I noticed that an image with gaussian blur applied to it takes up significantly less space in JPEG than does the original image. I suppose it works out because the blur diffuses information throughout a particular area such that each JPEG block has less detail to encode internally?

alephxyz a year ago

A gaussian blur is essentially a low-pass filter so all the high frequency components are gone. The resulting image can be represented and stored using fewer DCT coefficients.
This page has a few visual examples at the bottom: https://www.cs.unm.edu/~brayer/vision/fourier.html
pornel a year ago

JPEG's compression is very similar to blurring.
It converts the image into frequency domain, and reduces precision of the frequency data, with a special case when the data rounds down to zero.
However, whole-image blur crosses the 8x8 block boundaries, so it isn't perfectly aligned with the blur that JPEG uses. Lowering quality setting in JPEG (or using custom quantisation tables) will be more effective.
There's also a fundamental information-theoretic foundation for this — lower frequencies carry less information.
tsukurimashou a year ago

blurring does average which compression does too

bawolff a year ago

Is it just the subsampling?

Naively i would assume that even if you have the unnessary the C_b C_r channels in a greyscale image, they are going to compress really well since they have very little information in a greyscale image.

plorkyeran a year ago

Storing photos and sufficiently photo-like images as YCbCr rather than RBG does indeed tend to make them more compressible even without subsampling.
wmil a year ago

JPEG / JFIF isn't really smart enough to take advantage of that. It's a file format from 1992, CPU speed was limited.
- bawolff a year ago
  
  JPEG does entropy encoding.

jdefr89 a year ago

I wonder if grayscale images are more efficient for compression to some degree. Too lazy to actually test or even explain why I am wondering...

Dwedit a year ago

Yes. No CB or CR channels.

fedeb95 a year ago

this may be true for screen, but what about printing?

atoav a year ago

Does a mono sound take more space than a stereo sound file?

Yeah, half. Provided you don't store a mono sound in a stereo audiofile.

Same thing for grayscale images. If your white is a #FFFFFF instead of a #FF your picture is a color file that coincidentally displays a grayscale image.

simonblack a year ago

Roughly a third.

valine a year ago

We don’t store color data in full precision usually. People aren’t sensitive to all colors equally, the less sensitive the eye is to a particular color the more efficiently you can store it.
You can also typically discard high frequency data in your color channels. So not only are you not storing full precision data, you also don’t need to store roughly a third of the frequency bins. You can think of this like cropping the image but in the frequency domain. The data saving are identical to cropping the image but visually imperceptible.
- shagie a year ago
  This shows up in CCD sensors where you have two green pixels, one red, and one blue in a 2x2 square.
  You can see the sensitivity as part of the ppmtopgm program.
  ppmtopgm (https://netpbm.sourceforge.net/doc/ppmtopgm.html)
  ppmtopgm reads a PPM as input and produces a PGM as output. The output is a "black and white" rendering of the original image, as in a black and white photograph. The quantization formula ppmtopgm uses is y = .299 r + .587 g + .114 b.
  Note the coefficient for blue is 0.114 and green is 0.587.
  This also has a spoof on Late Lament ( https://youtu.be/VNC54BKv3mc?t=109 ) that gives me a chuckle...
  Cold-hearted orb that rules the night Removes the colors from our sight Red is gray, and yellow white But we decide which is right And which is a quantization error.
  - pajko a year ago
    
    There are more formulas to be used, depending on the source material: https://en.wikipedia.org/wiki/Grayscale#Luma_coding_in_video...