pdf - iTextSharp: Convert PdfObject to PdfStream -
i attempting pull font streams out of pdf file (legality not issue, company has paid rights display these documents in original manner - , requires conversion requires extraction of fonts).
now, had been using mutool - extracts images in pdf no method bypassing them , of these contain 10s of thousands of images. so, took web answers , have come following solution:
i of fonts font dictionary , attempt convert them pdfstreams (for flatedecode , writing files) using following code:
pdfdictionary tg = (pdfdictionary)pdfreader.getpdfobject((pdfobject)citem.pobj); pdfname type = (pdfname)pdfreader.getpdfobject(tg.get(pdfname.subtype)); try { int xrefidx = ((prindirectreference)((pdfobject)citem.pobj)).number; pdfobject pdfobj = (pdfobject)reader.getpdfobject(xrefidx); pdfstream str = (pdfstream)(pdfobj); byte[] bytes = pdfreader.getstreambytesraw((prstream)str); } catch { }
but, when pdfstream str = (pdfstream)(pdfobj); error below:
unable cast object of type 'itextsharp.text.pdf.pdfdictionary' type 'itextsharp.text.pdf.pdfstream'.
now, know pdfdictionary derives (extends) pdfobject uncertain doing incorrectly here. please - either need advice on patching code, or if entirely incorrect, either code extract stream or direction place said code.
thank you.
edit revised code here:
public static void getstreams(pdfreader pdf) { int page_count = pdf.numberofpages; (int = 1; <= page_count; i++) { pdfdictionary pg = pdf.getpagen(i); pdfdictionary fobj = (pdfdictionary)pdfreader.getpdfobject(res.get(pdfname.font)); if (fobj != null) { foreach (pdfname name in fobj.keys) { pdfobject obj = fobj.get(name); if (obj.isindirect()) { pdfdictionary tg = (pdfdictionary)pdfreader.getpdfobject(obj); pdfname type = (pdfname)pdfreader.getpdfobject(tg.get(pdfname.subtype)); int xrefidx = ((prindirectreference)obj).number; pdfobject pdfobj = pdf.getpdfobject(xrefidx); if (pdfobj == null && pdfobj.isstream()) { pdfstream str = (pdfstream)(pdfobj); byte[] bytes = pdfreader.getstreambytesraw((prstream)str); } } } } } }
however, still receiving same error - assuming incorrect method of retrieving font streams. same document has had fonts extracted using mutool - know problem me , not pdf.
there @ least 2 things wrong in code:
- you cast object stream without performing check:
if (pdfobj == null && pdfobj.isstream()) { // cast stream }
error message you're trying cast dictionary stream, i'm 99% sure second part of check returnfalse
whereaspdfobj.isdictionary()
returnstrue
. - you try extracting stream
pdfreader
, you're trying cast objectpdfstream
instead ofprstream
.pdfstream
object use create pdfs,prstream
object used when inspect pdfs usingpdfreader
.
you should fix problem first.
now general question. if read iso-32000-1, you'll discover font defined using font dictionary. if font embedded (fully or partly), font dictionary refer stream. stream can contain full font information, of times, you'll subset of glyphs (because that's best practice when creating pdf).
take @ example listfontfiles book "itext in action" first impression of how fonts organized inside pdf. you'll need combine example iso-32000-1 find more info difference between fontfile
, fontfile2
, fontfile3
.
i've written example replaces unembedded font font file: embedfontpostfacto. example serves introduction explain how difficult font replacement is.
please go http://tinyurl.com/iiacsch16 if need c# version of book samples.
Comments
Post a Comment