David learns how to create a pdf with python

The project

The project is to automate to some extent creating pdfs with the photos of my dog Moncho.

They come also with some text

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

La foto de hoy es un modelo 3D de perrito que puedes descargar en la asset store por el módico precio de 5.99 pollitos. Aquí se puede ver en un fondo de prueba. Oferta disponible por tiempo limitado. #fotoMoncho

This is an idea of the layout done in inkscape in 2 minutes

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

pypdf

Let's try pypdf, I have 2 sources of information

The docs https://pypdf.readthedocs.io/en/stable/index.html

And this website, since is from 2025 is suspicious of be AI generated
https://realpython.com/creating-modifying-pdf/

Read an existing pdf

I created a pdf with inkscape, so let's see if I can use it as a template.

I used the instructions from the docs. I actually didn't use the Pathlib since the pdf was in the same folder








from pypdf import PdfReader


pdf_path = "prueba maquetación inkscape.pdf"

pdf_reader = PdfReader(pdf_path)

print(len(pdf_reader.pages)) #1

Since it only has one page, then is only 1 page. But at least it reads the pdf.

Extract the text

Reference: https://realpython.com/creating-modifying-pdf/#extracting-text-from-a-page

Welp, this gets more complicated. The example was plain simple but it doesn't work

The example extracts the text simple and neat, all I get is a complex function like this:

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

And the mistake was that I didn't wrote properly the method with the end parenthesis (I wrote in the line 10: print(page.extractext))










from pypdf import PdfReader


pdf_path = "prueba maquetación inkscape.pdf"

pdf_reader = PdfReader(pdf_path)


for page in pdf_reader.pages:
    print(page.extract_text())

And the output:

06 jul 2025, 11:54
La foto de hoy es un modelo 3D de perrito que    
puedes descargar en la asset store por el módico 
precio de 5.99 pollitos. Aquí se puede ver en un 
fondo de prueba. Oferta disponible por tiempo    
limitado. #fotoMoncho

Extract image from a pdf

For this we need to also instat pillow with the command pip install pypdf[image]

Reference: https://pypdf.readthedocs.io/en/stable/user/extract-images.html

With this snippet I can create a png with the Moncho's image extracted from the pdf.









from pypdf import PdfReader

reader = PdfReader("prueba maquetación inkscape.pdf")

page = reader.pages[0]

for count, image_file_object in enumerate(page.images):
    with open(str(count) + image_file_object.name, "wb") as fp:
        fp.write(image_file_object.data)

The name of the file is 0x9.png written in the same folder of the script

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Copying a pdf

We need to use the pdf writer for this, so first I'm going to try just to copy the existing pdf.










from pypdf import PaperSize, PdfReader, PdfWriter, Transformation

# Read source file
reader = PdfReader("prueba maquetación inkscape.pdf")
page = reader.pages[0]

# Write the new file 
writer = PdfWriter()
writer.add_page(page)
writer.write("write.pdf")

works! = D

Substituting the image

For this I need to use PIL (Python Image Library) that it seems included but I need to add that import.

You need to use the replace in the writer, not in the reader.

















from pypdf import PdfReader, PdfWriter
from PIL import Image 

# Read source file
reader = PdfReader("prueba maquetación inkscape.pdf")
page = reader.pages[0]



# Write the new file 
writer = PdfWriter()
writer.add_page(page)

writer.pages[0].images[0].replace(Image.open("PruebaMonchoEntrada.jpg"), quality=100)

writer.write("write.pdf")

And it works, now I have a copy pdf with a different image!

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Reference:

https://pypdf.readthedocs.io/en/stable/modules/PageObject.html#pypdf._page.PageObject.images

Substituting the text

It seems that extract_text() it just… well, extract the text but doesn't modify it.

I've had also tried page.get_contents() but returns {}

Buuuut we can try to use "getObject" and try to find the specific objects.

In this case we're looking for 2 text boxes so I write this code

















from pypdf import PdfReader
# Read source file
reader = PdfReader("prueba maquetación inkscape.pdf")
page = reader.pages[0]

for key in page:
    print(key, page[key])


print("----")

for x in range(0,9):
    print(x, reader.get_object(x))

print("----")

Seems like a tree.

I couldn't access to the specifics. And the very documentation points to somewhere else.

Borb

https://github.com/jorisschellekens/borb

This has a really good documentation in borb-examples, pretty useful!

https://github.com/jorisschellekens/borb-examples

This would be the next step, later.

David learns how to create a pdf with python

The project

pypdf

Read an existing pdf

Extract the text

Extract image from a pdf

Copying a pdf

Substituting the image

Substituting the text

Borb

Read more

Solucionando problemas de conexión en linux mint

Internal Assessment: Criterion B

Object Oriented Programming concepts

Java concepts not directly in OOP but still important for CS paper 2