Jump to content

How can a photo be more pixels than bytes?


Kerbface

Recommended Posts

Example. I have a 16 megapixel camera. Most of my photos are anywhere from 2-5MB (JPEG). So to take the more extreme example, say a 2MB photo. That basically means 8 pixels per byte, or one pixel per bit. And pixels are more than 1 bit of information, right? A pixel needs information not only on it's location but it's colour. How can they possibly fit this into a file this size? I mean maybe they could simplify the info by taking all pixels with the same colour and referencing them, and maybe they could simplify coordinates by having the list of pixels (their colour) just go in rows left to right (so you only need to list the resolution and a program would work it out from there), but with all this, I still can't see any fathomable way you can have less than or equal to or even near to a bit per pixel, because even a list of 1s and 0s with no purpose can only be 1 bit per item. But I'm no programmer...

Link to comment
Share on other sites

I'm not a programmer either, but...

JPEG is a compressed image format.

If you were to look look at something like a bitmap on the other hand you'd see a correlation between image size and pixel count.

Data compression is not something I can pretend to understand.

Link to comment
Share on other sites

Are you saving the photo using a compressed file format? You can make large images fit into smaller file sizes by compressing, and, in some cases, losing some of the information.

I know JPEG is supposed to be a compressed format, but I also know in the quality I get it off the camera, it is still 16mp, and the pixels aren't even "compressed" in the way that lower quality JPEGs are with the weird lines when you look close. But even with that, my question remains. HOW does something compress (even if it looks worse) so that the number of pixels can exceed the number of bytes or even perhaps bits.

Link to comment
Share on other sites

JPEG is not a lossless format. And yes, it's something like taking every pixel and grouping them, but it's a lot more complicated than that. Therefore the final size of the picture varies by the complexity of the picture. If you had a picture that had randomly generated pixels it's final file size would be quite large. On the other hand, a solid color JPEG will be very tiny. In a non-lossless format, such as a BMP, the final file size could be determined by the properties of the picture(color depth/resolution) regardless of it's contents.

Edited by Kilmeister
Repeat information: I'm slow at responding during compiles between coding.
Link to comment
Share on other sites

JPEG compression quality can be adjusted. Bigger files produce higher-quality images. You can usually get 10:1 compression for natural images (i.e. images taken of real-life scenes) without any perceptible drop in the quality of the image. The compression uses specialized frequency transforms and a bunch of knowledge about the ability of the human eye to perceive changes in hue and intensity to both discard information that isn't detectable and to -- as you said -- bunch pixels together.

Look at it this way. Suppose I had a 16MP image that measured 4000x4000 pixels. If all of the pixels in the left 2000 columns of the image are white and all of the pixels in the right 2000 columns are black, I could describe the image thusly:

for all pixels (4000x4000 grid):
if (column < 2000) pixel = white else pixel = black

The above statement can be represented by a computer with only a few bits and is a complete description of the original 16MP image. The JPEG format does something similar, with a general format for describing patterns of pixels in the image that repeat.

Link to comment
Share on other sites

The main trick to most modern compression algorithms is Huffman codes. The technical details are a bit involved, but the basic principles are very easy to understand.

Suppose, I want to send a message "abacadab". There are four different symbols, so I can easily do this with 2 bits per symbol. Say, 'a' = 00, 'b' = 01, 'c' = 10, 'd' = 11. The entire message is then encoded as "00 01 00 10 00 11 00 01". 16 bits total. But this is very inefficient. Instead, let me make use of frequencies of the symbols to optimize the encoding. Now, 'a' = 0, 'b' = 10, 'c' = 110, and 'd' = 111. The entire message is encoded as "0 10 0 110 0 111 0 10". I used spaces to separate out the symbols, but note that this isn't necessary. As you start reading the message, there is only one symbol that starts with 0, so you immediately know that it's 'a'. That is followed by a 1 which leaves some options, but next one is another 0 and only one symbol starts with 10. So you can read the string even though each symbol is encoded using different number of bits. Also, note that now the total is only 14 bits.

This isn't a huge saving, but still, I managed to store 16 bits of data with 14 bits of code without losing any information even with such a simple message. Naturally, I also need to inform the decoding program which symbols use which codes, but this can be done in a very compact manner, so as long as the message is sufficiently long, you save overall.

From here on the compression algorithms diverge, but the general idea is still to group similar things together so as to make use of the frequency of certain elements. PNG images use Deflate algorithm, which looks for repeating patterns. It's a generalization of the concept that Mr Shifty mentioned. The overall result is a significant reduction in image size without any loss of information. Some other compression algorithms are lossy. JPEG, for example, converts image data into frequency domain. There, you can replace the image you have with one that looks to human eye almost identical, but has more repeating elements allowing for more efficient Huffman coding, among other things. What this means is that you can often get a much smaller image without any apparent quality loss.

And that is ultimately the answer to your question. By storing information about the patterns, rather than actual data, you can make the file size much smaller. And even with lossy compression this can be done in such a way that to human eye the image looks identical to original.

Edited by K^2
Link to comment
Share on other sites

If a JPEG has been saved with reasonable compression levels then the artifacts are almost always almost invisible to the naked human eye. A white pixel in a dark area might be 1% or 2% too bright, and unless you had the uncompressed image to compare it with you would have no way of knowing; while shadings of tone and brightness would be perfectly natural.

I saw a demonstration where a camera was set to save the image as an uncompressed bitmap, then an image-editing tool similar to Paintshop was used to save the image in JPEG format. Seen side-by-side, you couldn't tell the difference, though one sharp-eyed member of the audience spotted that a single bright pixel in a dark field was slightly duller in the JPEG image. Then the lecturer used a program to subtract one image from the other, and display the differences as shades of white on black - black areas were absolutely exact, while brighter areas represented errors: the brighter the greater the error. There were a lot of non-black pixels! Interestingly, they tended to be clustered near to sharp boundaries. However, the JPEG image had been saved with a fairly low compression level (about 1:10), so there were no bright white areas - none of the pixels were severely wrong.

If you look at This Wikipedia Article[/ulr] at the subsection I have linked to then you will see an opposite of what I described: the darker areas show greater degrees of error.

Link to comment
Share on other sites

I know JPEG is supposed to be a compressed format, but I also know in the quality I get it off the camera, it is still 16mp, and the pixels aren't even "compressed" in the way that lower quality JPEGs are with the weird lines when you look close. But even with that, my question remains. HOW does something compress (even if it looks worse) so that the number of pixels can exceed the number of bytes or even perhaps bits.

The compression of JPEG doesn't work by reducing the number of pixels, but rather it works by storing the information for a group of pixels as a "trend" rather than storing their exact values. Imagine the smooth gradient curve at the edge of an object such that it's bright yellow at pixel 1, then pixel 2 is slightly dimmer yellow, then pixel 3 is slightly dimmer, then pixel 4 is dimmer still, etc all the way over to pixel 15 which is dark. This pattern is the sort of thing you get when an object is curving, and it happens a LOT in photographs. What JPEG does is look for cases where instead of storing individual pixel color values you can instead just store the start and end color of a gradient smooth transition, and how fast the gradient from one to the other is. This can be coded in fewer bytes than, say, showing all 15 pixel values across the 15 pixel gradient. This produces a picture that is VERY similar to the original picture, but not necessarily exactly the same. The quality factor (i.e. 80% quality) determines how close the original values have to "fit" to the value approximation gradient curve in order to be allowed to be replaced by the approximation curve equation.

Link to comment
Share on other sites

The quality factor is not that strictly defined. "Quality" setting mostly affects the way the DFT representation of the 8x8 blocks is truncated. But again, that's getting rather technical. I've spent several sleepless nights going through manuals back when I was writing my own JPEG decoder.

Link to comment
Share on other sites

Interestingly, they tended to be clustered near to sharp boundaries.

This is because high-frequency components (which produce sharp edges) are what is filtered out because your eye can't see changes in high frequency regions. Another way of thinking about this is this: say your original image has 3 adjacent regions with intensity values 11, 12, and 200. (near black, slightly lighter black, and near white). Now imagine that the jpeg compression alters the values to the following: 9, 12, 198. The first and third regions have changed by -2 each. Your eye will immediately see the difference in the first region, but won't see any change in the 3rd even though they changed by the same amount. The sharp transition sort of saturates your eye's ability to detect intensity variations.

Link to comment
Share on other sites

Example. I have a 16 megapixel camera. Most of my photos are anywhere from 2-5MB (JPEG). So to take the more extreme example, say a 2MB photo. That basically means 8 pixels per byte, or one pixel per bit. And pixels are more than 1 bit of information, right? A pixel needs information not only on it's location but it's colour. How can they possibly fit this into a file this size?

In addition to the compression inherent in the JPEG format, the position information isn't stored with each pixel. Picture size and resolution data is stored in the header and the heavy lifting of figuring out where each pixel goes is left to the decoder.

For reference, my 18mb Canon T2i produces RAW files (essentially a direct unedited dump of the sensor data) that average around 24-26kb. Higher end cameras, even with the same sensor size, produce still larger RAW files.

http://en.wikipedia.org/wiki/RAW_file

Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...