Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation check issues #14

Closed
rusllonrails opened this issue Feb 6, 2015 · 16 comments
Closed

Validation check issues #14

rusllonrails opened this issue Feb 6, 2015 · 16 comments

Comments

@rusllonrails
Copy link

Hey Guys,

I'm very happy to use prawn 👍

One small thing I got today is that my generated pdf has some validation issues:

1.4.1 : Trailer Syntax error, The trailer dictionary doesn't contain ID
3.1.1 : Invalid Font definition, Some required fields are missing from the Font dictionary.
3.1.2 : Invalid Font definition, FontDescriptor is null or is a AFM Descriptor
7.1 : Error on MetaData, Missing Metadata Key in catalog

So I use latest version of prawn:

gem 'rails', '4.1.8'
gem 'prawn', git: "[email protected]:prawnpdf/prawn.git"
gem 'pdf_validator'

In rails console:

# I'm generating PDF file:
Prawn::Document.generate("metadata.pdf",
  :info => {
    :Title        => "My title",
    :Author       => "John Doe",
    :Subject      => "My Subject",
    :Keywords     => "test metadata ruby pdf dry",
    :Creator      => "ACME Soft App",
    :Producer     => "Prawn",
    :CreationDate => Time.now
  }) do

  text "This is a test of setting metadata properties via the info option."
  text "While the keys are arbitrary, the above example sets common attributes."
end

# Then try to validate generated file with "pdf_validator" gem (https://github.com/bitzesty/pdf_validator):
> path_to_pdf = "#{Rails.root}/metadata.pdf"
> res = PdfValidator.validate(path_to_pdf)
> res[:errors].map { |e| puts e }
1.4.1 : Trailer Syntax error, The trailer dictionary doesn't contain ID
3.1.1 : Invalid Font definition, Some required fields are missing from the Font dictionary.
3.1.2 : Invalid Font definition, FontDescriptor is null or is a AFM Descriptor
7.1 : Error on MetaData, Missing Metadata Key in catalog

Then I also uploaded generated "metadata.pdf" file to http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx
and got some issues in results:

Validating file "innovation_award_Dec_18_2014(1).pdf" for conformance level pdfa-1a
The file trailer dictionary must have an id key.
The key Metadata is required but missing.
The key MarkInfo is required but missing.
A device-specific color space (DeviceGray) without an appropriate output intent is used.
A device-specific color space (DeviceRGB) without an appropriate output intent is used.
The key F is required but missing. (2)
The value of the key SMask is an image but must be None. (2)
The value of the key CA is 0 but must be 1.0. (2)
The value of the key ca is 0 but must be 1.0. (2)
The font Helvetica-Bold must be embedded.
The font Helvetica-Oblique must be embedded.
The font Helvetica must be embedded.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document contains fonts without embedded font programs or encoding information (CMAPs).
The document contains transparency.
The document contains hidden, invisible, non-viewable or non-printable annotations.
The document's meta data is either missing or inconsistent or corrupt.
The document doesn't provide appropriate logical structure information.
Done.

Maybe someone is experienced with same issue and know how to fix it.

Thanks for any help 🍻

@bousquet
Copy link
Contributor

Some of these items (document ID in trailer, and ability to add metadata) are addressed in PRs #16 and #17. There's much more work to done for validation under all of the different PDF specs, but these two PRs helped me get PDF/X-1A compatibility to meet my printer's minimum requirements.

@rusllonrails
Copy link
Author

👍

@pointlessone
Copy link
Member

Validation errors are specific to PDF/A profile. At the moment Prawn doesn't support PDF/A and I personally don't plant to work on it any time soon. I'll be happy to help anyone who decide to contribute PDF/A support.

@timokleemann
Copy link

Has anyone ever managed to generate a PDF/A-3 compliant PDF with Prawn?

I would be glad if someone could provide a gist or other resources on how to achieve this. Even my co-pilot has been biting his teeth out so far.

@gettalong
Copy link
Member

@timokleemann I'm not sure why you are asking this here because in https://github.com/orgs/prawnpdf/discussions/1231#discussioncomment-10982910 you mentioned that you have full ZUGFeRD compatibility which requires PDF/A-3.

@timokleemann
Copy link

@gettalong Well observed! But I am using GhostScript to convert the Prawn PDFs to PDF-A standard. This is buggy, however, and I am not happy with it. I would love to create a PDF-A from within Prawn. But I haven’t come across anyone who has successfully done that.

@gettalong
Copy link
Member

@timokleemann Ah, okay. Alas, for Prawn itself I can offer you only some guidance. You would need to embed the required PDF/A XMP metadata stream, an ICC color profile (probably SRGB), make sure that you only use embedded fonts and a few other things which Prawn probably already takes care of. It shouldn't be that much of a hassle but one has to do the work, once. You could look at how HexaPDF does it.

@timokleemann
Copy link

timokleemann commented Nov 2, 2024

Thanks @gettalong for the guidance. I think I managed to add the required metadata to my PDF using Prawn's info method. Using a tool called mdls I can verify that the metadata is now indeed present in the PDF.

My Copilot now suggests that I use the combine_pdf to add the XMP metadata to the file. But do I really need another gem here? Or is there a better way to achieve this?

@gettalong
Copy link
Member

No, the info-method just adds the standard meta information. What you need is to add a metadata stream with the correct PDF/A metadata. Even if mdls shows the metadata, it probabaly just shows the one from the info dictionary and not the metadata stream.

combine_pdf is not needed since you just need to attach files to the PDF and this can be done with Prawn itself.

@timokleemann
Copy link

timokleemann commented Nov 2, 2024

@gettalong, cool, so I can get along without another gem here.

This is a rough idea of my current code:

class DocumentPdf < Prawn::Document

  def initialize(document)
    @document = document
    super(
      :page_size  => "A4",
      :margin => [32.mm, 20.mm, 40.mm, 25.mm]
    )
    setup_colors
    setup_fonts
    setup_layout
    add_metadata
    add_output_intent
    add_xmp_metadata
  end

  private

  def add_metadata
    self.info[:Title] = @document.title || "Document"
    self.info[:Author] = @document.author || "Author"
    self.info[:Subject] = @document.subject || "Subject"
    self.info[:Keywords] = @document.keywords || "Keywords"
    self.info[:Creator] = "Prawn PDF"
    self.info[:Producer] = "Prawn PDF"
    self.info[:CreationDate] = Time.now
    self.info[:ModDate] = Time.now
  end

  def add_output_intent
    icc_profile_path = Rails.root.join("app", "assets", "icc_profiles", "sRGB.icc")
    output_intent = {
      S: :GTS_PDFA1,
      OutputConditionIdentifier: "sRGB",
      Info: "sRGB IEC61966-2.1",
      DestOutputProfile: IO.binread(icc_profile_path)
    }
    catalog.data[:OutputIntents] = [output_intent]
  end

  def add_xmp_metadata
    xmp_metadata = <<-XMP
    <x:xmpmeta xmlns:x="adobe:ns:meta/">
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about=""
          xmlns:dc="http://purl.org/dc/elements/1.1/"
          xmlns:xmp="http://ns.adobe.com/xap/1.0/"
          xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
          xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
          <dc:title>
            <rdf:Alt>
              <rdf:li xml:lang="x-default">#{info[:Title]}</rdf:li>
            </rdf:Alt>
          </dc:title>
          <dc:creator>
            <rdf:Seq>
              <rdf:li>#{info[:Author]}</rdf:li>
            </rdf:Seq>
          </dc:creator>
          <dc:subject>
            <rdf:Bag>
              <rdf:li>#{info[:Subject]}</rdf:li>
            </rdf:Bag>
          </dc:subject>
          <dc:description>
            <rdf:Alt>
              <rdf:li xml:lang="x-default">#{info[:Keywords]}</rdf:li>
            </rdf:Alt>
          </dc:description>
          <xmp:CreatorTool>#{info[:Creator]}</xmp:CreatorTool>
          <xmp:CreateDate>#{info[:CreationDate].iso8601}</xmp:CreateDate>
          <xmp:ModifyDate>#{info[:ModDate].iso8601}</xmp:ModifyDate>
          <pdf:Producer>#{info[:Producer]}</pdf:Producer>
          <pdfaid:part>3</pdfaid:part>
          <pdfaid:conformance>B</pdfaid:conformance>
        </rdf:Description>
      </rdf:RDF>
    </x:xmpmeta>
    XMP

    metadata_stream = make_xmp_metadata_stream(xmp_metadata)
    object_id = state.store(metadata_stream)
    state.store.root.data[:Metadata] = PDF::Core::Reference.new(object_id)
  end

  def make_xmp_metadata_stream(xmp_metadata)
    PDF::Core::Stream.new({}, xmp_metadata)
  end

end

The problem is that it keeps giving me an error undefined method "info" no matter what I try.

What am I missing here?

@gettalong
Copy link
Member

N.b. I haven't had a recent look into the Prawn internals but:

  • The metadata needs to be provided on document creation according to the manual. You can access it later via doc.state.store.info which is a PDF::Core::Reference.

  • #add_output_intent: The DestOutputProfile needs to be a stream object that follows the PDF spec according to sections 14.11.5 and 8.6.5.5. From what I see you are just adding it as a string.

@timokleemann
Copy link

Thanks, @gettalong.

Below is my updated code.

class DocumentPdf < Prawn::Document

  def initialize(document)
    @document = document
    super(
      :page_size  => @paper_size,
      :margin     => [32.mm, 20.mm, 40.mm, 25.mm],
      :info       => {
        :Title => "Document",
        :Author => "Author",
        :Subject => "Subject",
        :Keywords => "Keywords",
        :Creator => "Prawn PDF",
        :Producer => "Prawn PDF",
        :CreationDate => Time.now,
        :ModDate => Time.now
      }
    )
    setup_colors
    setup_fonts
    setup_layout
    add_output_intent
    add_xmp_metadata
  end

  private

  def add_output_intent
    icc_profile_path = Rails.root.join("app", "assets", "icc_profiles", "sRGB.icc")
    icc_profile_data = IO.binread(icc_profile_path)
    icc_profile_stream = PDF::Core::Stream.new(icc_profile_data)
    output_intent = {
      S: :GTS_PDFA1,
      OutputConditionIdentifier: "sRGB",
      Info: "sRGB IEC61966-2.1",
      DestOutputProfile: icc_profile_stream
    }
    root = state.store.root
    root.data[:OutputIntents] = [output_intent]
  end

  def add_xmp_metadata
    xmp_metadata = <<-XMP
    <x:xmpmeta xmlns:x="adobe:ns:meta/">
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about=""
          xmlns:dc="http://purl.org/dc/elements/1.1/"
          xmlns:xmp="http://ns.adobe.com/xap/1.0/"
          xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
          xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
          <dc:title>
            <rdf:Alt>
              <rdf:li xml:lang="x-default">This is the Title</rdf:li>
            </rdf:Alt>
          </dc:title>
          <dc:creator>
            <rdf:Seq>
              <rdf:li>This is the Author</rdf:li>
            </rdf:Seq>
          </dc:creator>
          <dc:subject>
            <rdf:Bag>
              <rdf:li>This is the Subject</rdf:li>
            </rdf:Bag>
          </dc:subject>
          <dc:description>
            <rdf:Alt>
              <rdf:li xml:lang="x-default">These are the Keywords</rdf:li>
            </rdf:Alt>
          </dc:description>
          <xmp:CreatorTool>Creator</xmp:CreatorTool>
          <xmp:CreateDate>CreateDate</xmp:CreateDate>
          <xmp:ModifyDate>ModifyDate</xmp:ModifyDate>
          <pdf:Producer>Producer</pdf:Producer>
          <pdfaid:part>3</pdfaid:part>
          <pdfaid:conformance>B</pdfaid:conformance>
        </rdf:Description>
      </rdf:RDF>
    </x:xmpmeta>
    XMP

    metadata_stream = make_xmp_metadata_stream(xmp_metadata)
    metadata_object = ref!(metadata_stream)
    state.store.root.data[:Metadata] = metadata_object
  end

  def make_xmp_metadata_stream(xmp_metadata)
    PDF::Core::Stream.new(xmp_metadata)
  end

end

Unfortunately, I am having trouble referencing the metadata in my code via doc.state.store.info. I keep getting an error undefined local variable or method "info". (That's why I hardcoded the values as "This is the Title" etc. for now.)

But, even worse, when I try to render the PDF using send_data(DocumentPdf.new(document).render) from my controller, I get this error:

PDF::Core::Errors::FailedObjectConversion
This object cannot be serialized to PDF (#<PDF::Core::Stream:0x00000000699498...

What am I missing here?

@pointlessone
Copy link
Member

Generally you don't want to use Stream directly, it's for internal use only. Instead create an empty dictionary (ref({})) and use its stream.

@sled
Copy link

sled commented Nov 13, 2024

@timokleemann this works for me:

conformity.rb

require 'securerandom'

module Conformity
  BASE_DIR = File.join(File.expand_path(File.dirname(__FILE__)))
  XMP_DATE_FORMAT = '%Y-%m-%dT%H:%M:%S%:z'

  def render(*args, **kwargs)
    add_output_intent
    add_trailer_data
    add_xmp_metadata
    super
  end

  def document
    @document ||= Prawn::Document.new({ info: info_data })
  end

  def create_date
    @create_date ||= Time.now
  end

  def modification_date
    @modification_date ||= Time.now
  end

  def info_data
    {}.tap do |data|
      info[:Title] = "A title"
      info[:Author] = "An author"
      info[:Subject] = "A subject"
      info[:Keywords] = "Some keywords"
      info[:Creator] = "Prawn"
      info[:Producer] = "Prawn"
      info[:CreationDate] = create_date
      info[:ModDate] = modification_date
    end
  end

  def add_trailer_data
    # See: https://stackoverflow.com/questions/20085899/what-is-the-id-field-in-a-pdf-file
    # The value of the ID entry is not a string but instead an array of two strings.
    # And the string values are not arbitrary but instead unique values recommended to be obtained by hashing.
    # Thus they especially must not be re-used for different documents

    # Added with https://github.com/prawnpdf/pdf-core/pull/16

    state.trailer[:ID] = [
      # not sure if ByteString is needed here...
      PDF::Core::ByteString.new("myDocument"),
      PDF::Core::ByteString.new(SecureRandom.uuid)
    ]
  end


  def add_output_intent
    icc_profile_path = File.join(BASE_DIR, 'colorprofiles', 'sRGB2014.icc')
    icc_profile_ref = ref!({ N: 3 })
    icc_profile_ref << IO.binread(icc_profile_path)

    output_intent = {
      S: :GTS_PDFA1,
      OutputConditionIdentifier: "sRGB2014.icc",
      Info: "sRGB2014.icc",
      DestOutputProfile: icc_profile_ref
    }
    root = state.store.root
    root.data[:OutputIntents] = [output_intent]
  end

  def add_xmp_metadata
    xmp_metadata = ref!(Type: :Metadata, Subtype: :XML)
    xmp_metadata << xmp
    root = state.store.root
    root.data[:Metadata] = xmp_metadata
  end

  def xmp
    # NOTE: `begin="w"` should be `begin="\u{FEFF}"` but this crashes VeraPDF....
    <<~XMP
      <?xpacket begin="w" id="#{SecureRandom.uuid.tr('-', '')}"?>
      <x:xmpmeta xmlns:x="adobe:ns:meta/">
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
          <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
            xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
            <dc:title>
              <rdf:Alt>
                <rdf:li xml:lang="x-default">A title</rdf:li>
              </rdf:Alt>
            </dc:title>
            <dc:creator>
              <rdf:Seq>
                <rdf:li>An author</rdf:li>
              </rdf:Seq>
            </dc:creator>
            <dc:subject>
              <rdf:Bag>
                <rdf:li>A subject</rdf:li>
              </rdf:Bag>
            </dc:subject>
            <dc:description>
              <rdf:Alt>
                <rdf:li xml:lang="x-default">Some keywords</rdf:li>
              </rdf:Alt>
            </dc:description>
            <xmp:CreatorTool>Prawn</xmp:CreatorTool>
            <xmp:CreateDate>#{create_date.strftime(XMP_DATE_FORMAT)}</xmp:CreateDate>
            <xmp:ModifyDate>#{modification_date.strftime(XMP_DATE_FORMAT)}</xmp:ModifyDate>
            <pdf:Producer>Prawn</pdf:Producer>
            <pdfaid:part>3</pdfaid:part>
            <pdfaid:conformance>B</pdfaid:conformance>
          </rdf:Description>
        </rdf:RDF>
      </x:xmpmeta>
      <?xpacket end="r"?>
    XMP
  end
end

pdf_document.rb

class PdfDocument
  include Prawn::View
  prepend Conformity

  ....
end

main.rb

#!/usr/bin/env ruby
File.open('output.pdf', 'wb') { |f| f.puts(PdfDocument.new.render) }

To validate I used:

verapdf --format html -f 3b output.pdf > report.html

@sled
Copy link

sled commented Nov 13, 2024

@timokleemann please see #62

to correct my example above use:

  
  def add_xmp_metadata
    xmp_metadata = ref!(Type: :Metadata, Subtype: :XML)
    xmp_metadata << xmp.force_encoding('BINARY')
    # .... omitted
  end

  def xmp
    <<~XMP
      <?xpacket begin="\u{FEFF}" id="#{SecureRandom.uuid.tr('-', '')}"?>
      # .... omitted
    XMP
  end

@timokleemann
Copy link

Hi @sled , this looks good—thank you! There aren't many examples like this available online, so your post will be appreciated by many.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants