Skip to main content
added 148 characters in body
Steffen Ullrich
  • 211.2k
  • 30
  • 420
  • 490

I think using gs should remove all active content (Javascript) and embedded data (Videos, Flash...), but to be. But I'm not sure if using pdfwrite directly on the PDF will really remove all active and embedded content. Thus I suggest that you mightfirst convert itthe PDF to Postscript firstusing gs and then convert itthe Postscript back to PDF instead of using gs with the pdfwrite directlybackend. Since the Postscript format itself does not support active content or embedded files youcontent such content should be more safe this waynot survive the conversion process. I'm not sure if this will also help against image formats, like exploiting vulnerabilities in libjpeg, libpng or similar. In any case the call to gs itself should be done inside some kind of protected environment (i.e. sandbox or similar) so that such vulnerabilities do not affect the security of the security system itself.

Another option would be to convert the PDF to images and maybe create a new PDF with these images then. This way you could protect against exploiting vulnerabilities in image libraries too, but at the cost of loosing the ability to work with the PDF as text (i.e. search, copy...). If you want the additional protection but need the ability to handle the PDF as text you could run some OCR software afterwards to reconstruct the text from the images.

I think using gs should remove all active content (Javascript) and embedded data (Videos, Flash...), but to be sure you might convert it to Postscript first and then convert it back to PDF instead of using pdfwrite directly. Since the Postscript format itself does not support active content or embedded files you should be more safe this way. I'm not sure if this will also help against image formats, like exploiting vulnerabilities in libjpeg, libpng or similar. In any case the call to gs itself should be done inside some kind of protected environment (i.e. sandbox or similar) so that such vulnerabilities do not affect the security of the security system itself.

Another option would be to convert the PDF to images and maybe create a new PDF with these images then. This way you could protect against exploiting vulnerabilities in image libraries too, but at the cost of loosing the ability to work with the PDF as text (i.e. search, copy...). If you want the additional protection but need the ability to handle the PDF as text you could run some OCR software afterwards to reconstruct the text from the images.

I think using gs should remove all active content (Javascript) and embedded data (Videos, Flash...). But I'm not sure if using pdfwrite directly on the PDF will really remove all active and embedded content. Thus I suggest that you first convert the PDF to Postscript using gs and then convert the Postscript back to PDF using gs with the pdfwrite backend. Since the Postscript format itself does not support active or embedded content such content should not survive the conversion process. I'm not sure if this will also help against image formats, like exploiting vulnerabilities in libjpeg, libpng or similar. In any case the call to gs itself should be done inside some kind of protected environment (i.e. sandbox or similar) so that such vulnerabilities do not affect the security of the security system itself.

Another option would be to convert the PDF to images and maybe create a new PDF with these images then. This way you could protect against exploiting vulnerabilities in image libraries too, but at the cost of loosing the ability to work with the PDF as text (i.e. search, copy...). If you want the additional protection but need the ability to handle the PDF as text you could run some OCR software afterwards to reconstruct the text from the images.

Steffen Ullrich
  • 211.2k
  • 30
  • 420
  • 490

I think using gs should remove all active content (Javascript) and embedded data (Videos, Flash...), but to be sure you might convert it to Postscript first and then convert it back to PDF instead of using pdfwrite directly. Since the Postscript format itself does not support active content or embedded files you should be more safe this way. I'm not sure if this will also help against image formats, like exploiting vulnerabilities in libjpeg, libpng or similar. In any case the call to gs itself should be done inside some kind of protected environment (i.e. sandbox or similar) so that such vulnerabilities do not affect the security of the security system itself.

Another option would be to convert the PDF to images and maybe create a new PDF with these images then. This way you could protect against exploiting vulnerabilities in image libraries too, but at the cost of loosing the ability to work with the PDF as text (i.e. search, copy...). If you want the additional protection but need the ability to handle the PDF as text you could run some OCR software afterwards to reconstruct the text from the images.

close