PDF metadata
It is useful but not essential to add the license details and other document information to the PDF file as PDF metadata. This can also be helpful because LibreOffice Impress (5.3.1.2) turns the title page into a bitmap — I guess to limit the amount of information that can be routinely scraped.
The first method is simply to set the document properties in LibreOffice Impress or Microsoft PowerPoint. These should then be transferred across automatically to the PDF on export.
The second method is to set the PDF metadata from LaTeX using the hyperref
package.
The third method is to use the exiftool command-line utility after the PDF is produced. There is a project site and a Wikipedia page on ExifTool. The utility runs on Windows, macOS, and Linux, but requires perl in the latter cases. I normally build the utility from a tar file downloaded from the project website. I have found Phil Harvey, the maintainer of ExifTool, to be extremely fast at fixing the two minor bugs I reported.
ExifTool can be used to simply interrogate a PDF file:
$ exiftool -duplicates -groupHeadings presentation.pdf
The -duplicates
option allows duplicate tags to be extracted and the -groupHeadings
option organizes the output by tag group, namely: File, PDF, and XMP (if present). XMP is an Adobe standard and stands for Extensible Metadata Platform. Information on the PDF and XMP tags supported by ExifTool are as indicated.
The same utility can be used to write or displace metadata. Caution: PDF metadata cannot be deleted, it is simply pushed into the background.
$ exiftool \
-author="Bill Frog <bill.frog@posteo.mars>" \
-title="PDF metadata" \
-subject="Methods to view and add metadata in PDF files" \
-copyright="This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License" \
-keywords="exiftool" \
-keywords="metadata" \
presentation.pdf
Note that ExifTool writes this information intelligently to the appropriate PDF and XMP tags, sometimes recording the same information in multiple places.
I normally add this command to a non-executable script which I name presentation.cmd
and then run this as follows:
$ bash presentation.cmd
This has the advantage that the metadata in the script can be easily edited and even placed under git control.
The following bash alias can be helpful too:
alias myexiftool="exiftool -duplicates -groupHeadings"
Actually I do a little more than this. I have a python script to generate a more sophisticated ExifTool script that embeds git information in the subject
field on-the-fly and refuses to run if git is dirty or no tag is current. Note too that ExifTool is great for processing JPG and PNG files and numerous other graphics and photographic file types.