It has become a habit of mine to scan important documents to preserve them better. One type of document that is regularly uploaded to my file storage is our residence's monthly fees. The setup is pretty simple: we receive an invoice in our mailboxes every month, then we pay it and receive the printed receipt. When I got home, the invoice and receipt are then scanned into a 2-page PDF file: the first page is the invoice and the second page is the receipt. This file is then stored together with the other important files.
Then the pandemic came. The process has changed – the invoice is still sent through our mailboxes physically, but the payments are done online. The invoice is scanned in my scanner and the scanned receipt is received via email. Because of this, the scan quality of the invoice and receipt are different.
Now that the invoice and receipt now comes from 2 different scanners or sources, I need to use ImageMagick to combine the files to produce a single PDF to match my previous PDFs in the file store.
However, this produces a really disproportionate PDF where the first page is really small and the second page is really big. For recording purposes, this does not really matter since I can just zoom in or zoom out when reading the PDFs. This does not sit well with me – I want the new PDFs to match how proper the previous PDFs looks like.
First thing I noticed is that the dimensions of the images are different. The invoice image files has width around 2500px and the receipt has around 3100px width. With this information, I tried to scale down the receipt image using ImageMagick.
Even with this, the PDF still looks weird. For some reason, the created PDF has still disproportionate page as if no scaling has been done! I even tried tinkering with different page sizes and scaling options, and nothing seems to work.
Then I realized, maybe the image densities are different? Here I tried to use ImageMagick's identify
command to see the image density.
Unfortunately, the default output of the identify
command does not show the image density.
Because of this, we need to format the output to show the image density:
Using the identify -format
flag, I can now see other information that are not shown by default. We can show the image density using the %x
and %y
specifiers in the format string. The %x
format specifier shows the horizontal image density, and %y
shows the horizontal density.
Using this command, I saw that the invoice has 300×300 density and the receipt is 72×72. With this information, I know that I need to match the image densities as well as scale them to match. To do this, I use the -density
flag in the convert
command:
convert receipt.jpg -resize 80% -density 300 receipt_scaled.jpg
Now that the invoice and receipt scans have matching densities and almost matching widths, I can now convert them to a PDF file:
convert invoice.jpg receipt_scaled.jpg result.pdf
The resulting PDF now looks how it was when I was scanning both invoice and receipts physically, and it does not need to be zoomed-in or zoomed-out to read!
In summary...
I used ImageMagick's convert
and identify
commands to get the job done.
Initially, I simply combined 2 JPG files together to a single PDF using the basic form of the convert
command:
convert invoice.jpg receipt.jpg result.pdf
But the resulting PDF has a really small first page and a large second page.
Tried scaling the second image to match the width of the first page using:
convert receipt.jpg -resize 80% receipt_scaled.jpg
Then making a PDF with it again still creates a disproportionate PDF.
I tried looking at the image density, since it might affect the resulting PDF.
identify -format "%x x %y" receipt.jpg
With this, I saw that the images have different densities. That's why I changed the density of the image for them to match:
convert receipt.jpg -density 300 -resize 80% receipt_scaled.jpg
Now that the images have the same density and almost the same width, I can now just make a PDF out of it and it should look nice:
convert invoice.jpg receipt_scaled.jpg result.pdf
Indeed, the resulting PDF looks nice!