Hello, Mikaeri here.
Let's go ahead and talk about this topic, since I understand that there's considerable confusion over what makes an acceptable vs. unacceptable identical image to upload, whether that be a duplicate, revision, etc…
I'll try to make this as simple as possible by simply going over the guidelines outlined in the Parent and Child Relationships wikipage about parenting identical images and give reasons as to why they're ordered as they are.
But before that, we should really outline some things.
Images come in a large variety of filetypes, but the ones we're going to talk about mainly are JPG and PNG. You won't need to care about other file formats like GIF or BMP as they're fairly rare, especially in the case of the latter (which you should skip uploading altogether or convert if you really have to).
JPG is a lossy compression algorithm and raster file format for image files. Its strength is in providing an accurate representation of an image rich with color information in an acceptable storage size; this includes things like photographs and drawings.
Most images encoded in JPG use the YUV colorspace; they don't strictly encode bit values per pixel like PNG might. Instead, the image is sampled and converted into that colorspace from its' raw encoding. That's done through a mathematical transformation where values are stored as coefficients of a luma component (Y') and two chroma components (Cb', Cr').
If you want to read more about this, feel free to visit this question on StackExchange. But all you need to know that it is not the original image; it's just an accurate representation of it, which varies depending on % compression quality, chroma subsampling and other export settings.
JPG does not support alpha transparency – this is something important I'll mention later (as it has to do with a very specific site).
Regarding metadata, most JPG files embed EXIF metadata. This isn't really important for drawings though. But oftentimes it is embed nonetheless by various image editors.
The main reason why JPG is a preferred file format for images is because its web-friendly. Its takeaway is that you can easily share a JPG image online within reason; even at the highest compression quality you're still only ever going to see small dimples of image variance compared to a PNG export or the raw project file itself (PSD, KRA, GIMP, etc).
PNG is a lossless raster file format that supports lossless compression. As PNG is lossless, images in PNG files are bit-for-bit accurate representations of the original image (as long as, well, they haven't been downscaled by the artist purposefully).
PNG supports a variety of different colorspaces primarily using the archetypal RGB color model, the two of which you'll be seeing most is RGB24 or RGBA32. RGB24 supports 8 bits per channel (hence the 24), but does not support alpha transparency. RGBA32, however, does (as an extra 8 bits is reserved for the alpha channel per pixel).
The conversion from the RGB24 color space to the RGBA32 color space is lossless as all that happens is the addition of a pretty much empty alpha layer to the whole image. The opposite is not true, however; a conversion from RGBA32 to RGB24 is lossy as all alpha information is lost. This is sort of misleading though, since an RGBA32 image that doesn't encode any meaningful alpha information doesn't necessarily lose anything in its conversion to RGB24.
PNG doesn't necessarily support EXIF metadata; instead, blocks can be assigned with information, analogous to a key-value table. This explains the rather huge variance in image metadata from file to file, as many editors will insert basically their own format of metadata into such a field. You can read more about how they're assigned here, but rest assured you don't need to know all that (unless, of course, you're curious).
In unique circumstances such metadata can be fairly cumbersome, but most PNG metadata will fit into storage space that's usually no more than 1-2kb in size, aside from if an IEND chunk happens to be corrupted.
But, as a result of this "freedom" of metadata, PNG images can encode a LOT of info. Here's an example. This is metadata extracted from id 2845594:
Talk about insane!
There's a variety of stuff that matters and stuff that doesn't, depending on who you talk to. One of the things someone might care about is the ICC color profile, which basically describes how an image's color should be described to a color input or output device. Other parts someone might also care about is the software used, the encoder used, or even the thumbnail embedded in the image.
But to put it shortly, this information is non-trivial in many circumstances, and the presence of metadata usually signifies that the image has not been modified or stripped in any manner outside of what the artist has provided.
We won't talk about PNG interlacing (albeit that also increases filesize), since most images come exported non-interlaced by default. It's an option rarely seen.
Filesize does not accurately describe image quality. This is where I'll describe why that is.
Remember what I said about JPG being lossy? Well, that counts double if you re-encode a JPG by way of transforming an image or simply just resaving it in your editor. This is known as lossy-lossy compression.
When you save a JPG that uses JPG information, you legitimize the JPG artifacts in that image. Your editor simply doesn't know how to re-encode a bit-by-bit representation of the image you've saved because all it sees is the image. Doing this means you re-encode even the lossy artifacts already present in the original image, in addition to introducing more artifacts.
And not to mention when you do that, you're saving a new image entirely. It may even be a larger filesize than the original image you were working with. And it's not as if you can do much to make it smaller; JPG compression only works best when you're working with an image that's lossless to begin with.
A JPG image isn't automatically better just because it has a higher filesize. There are way too many factors to account for that; Yes, it could mean that such an image has underwent less sampling (perhaps a higher quality compression or better chroma subsampling options) but it says nothing about what we know from the original image; we can only observe the raw image we have.
Just as a small note, Twitter always resamples JPG images uploaded to the service, as do other social networks like Pawoo and Weibo. If I recall correctly, it's something like 85% compression with 4:2:0 chroma subsampling, but this comes from a comment I read some time ago by fireattack.
Artists can be naive at times. Sometimes they may not have the original project file anymore and still want to revise an image, in which case they'll go straight to editing the JPG file they have online because it's all they have at hand. As an uploader, you're going to have to keep an eye out for such cases.
There are cases where a file is simply smaller because it's been stripped of all its metadata. But aside from that, if the image is still the same compared to its metadata-present alternative, then it shouldn't matter.
This is where the statement especially rings true. Bigger doesn't mean better, especially not in the case of PNGs.
Remember, PNG is lossless. Most PNGs that are uploaded to pixiv, deviantart, etc. are bit-by-bit accurate representations of the original image that the artist wanted to provide. But sometimes, they may provide images that are even better compressed while still representing the same image.
In that case, PNGs with smaller filesizes are better than their larger counterparts, assuming that the metadata is still present. Of course, you could always copy it over if it isn't but…
Since examples do a much better job at explaining this, let's take a look at some of these karutamo images:
See how these images from his official portfolio are better compressed than both the images he uploaded to Twitter and Pixiv? And they still have the metadata present too. Take a look:
Celsys Studio Tool is the same as CLIP STUDIO PAINT, indicating that that was the software used.
But what about cases where an image is better compressed on Twitter than on Pixiv or another site? Well, interestingly enough, this has happened before. Take a look at these two images:
At first you might believe that the first is obviously better. It is, but just in one respect – it preserves metadata. Twitter never preserves metadata.
However, in this case, the Twitter version is actually compressed better. Let's take a look at these filesizes:
That's a difference of 62,916 bytes from the pixiv image! Strange, isn't it. And when you do a raw diff with those two images, you will find that they represent the exact same image. The size is somewhat trivial, but it's a decrease nonetheless.
Cases like these are indicative of when uploading a duplicate is actually preferred because of better lossless PNG compression. You could even "frankenstein" an ideal copy for archival by simply copying the metadata from id 2712109 into id 2844376.
The same can be said of another identical pair of images, which I'll let you explore if you need further proof:
In most cases, though, pixiv will keep the original compression whereas Twitter will often introduce a round of bad compression through the use of some pngcrush-like software at upload time. There are plenty of examples of this, but if you need one:
Reminder to self: find more SFW examples…
Finally. Although with all the explanations from above you should have enough information as to what, how and why these are listed in the order they are. Let's go through them.
Note that if you're going to be uploading "duplicates", they should all be from a legitimate source by the artist or publisher. This criteria does not apply to third-party edits – that includes optimizations.
Some of the tools you might need, to help understand stuff:
Twitter will always recompress an image uploaded to the service… except in certain cases!
All Twitter PNGs are RGBA32. If you upload a PNG with alpha transparency information, it will retain that information; it will just do the classic introduce a round of bad compression to the image and strip out the metadata as usual… which is a mini-bundle of fun, but whatever, right.
However, take care that this does not apply to PNGs with RGB24 information. Instead, those are converted and sampled to JPG as per the usual.
This is all (hopefully) explained in howto:twitter also, so in case you forget you can refer to there.
Pawoo is pretty much the same as Twitter except Pawoo retains image metadata. It also carries image dimension maxima; images can only be 1280 x 1280 at their largest.
One thing to note about Pawoo, however, is that Pawoo samples at a considerably lower rate than Twitter does (which makes uploading from Pawoo worth it). They still sample images uploaded, but it's much harder to tell if the images are the same dimensions.
If you have any questions or comments about this guide, bump the topic and we can have a discussion about it. I'm seeking to make this guide better and more user-friendly, so all comments are welcome.
And if you feel like your question is too specific or you feel it's too inappropriate to make public, feel free to message me on Danbooru. My handle is Mikaeri (user #470449).