Search and Remove a Text from a PDF using iTextsharp

In this Post we are going to look at how we can search a specific text and visually remove them using iTextSharp library.

Steps Involved :
1.Identify the Position of the Text in the PDF
2.Remove the Text by applying a White Patch in the specified location.

Searching Text and getting the positions :
1.Define a TextExtraction Strategy
2.Read the PDF File using PdfReader
3.Get the Text from the reader using PdfTextExtractor.GetTextFromPage and check whether the Text which we are looking for is matching
4.If it matches then add this to the matched position

Here the Code :


  public class MyLocationTextExtractionStrategy : LocationTextExtractionStrategy
    {
        //Hold each coordinate
        public List myPoints = new List();

        //Automatically called for each chunk of text in the PDF
        public override void RenderText(TextRenderInfo renderInfo)
        {
            base.RenderText(renderInfo);

            //Get the bounding box for the chunk of text
            var bottomLeft = renderInfo.GetDescentLine().GetStartPoint();
            var topRight = renderInfo.GetAscentLine().GetEndPoint();

            //Create a rectangle from it
            var rect = new iTextSharp.text.Rectangle(
                                                    bottomLeft[Vector.I1],
                                                    bottomLeft[Vector.I2],
                                                    topRight[Vector.I1],
                                                    topRight[Vector.I2]
                                                    );

            //Add this to our main collection
            this.myPoints.Add(new RectAndText(rect, renderInfo.GetText()));
        }
    }

 public static List<textposition> GetPosition(string sourcePDFpath, string textToSearch)
        {
            List</textposition><textposition> positions = new List</textposition><textposition>();
            var t = new MyLocationTextExtractionStrategy();

            //Parse page 1 of the document above
            using (var r = new PdfReader(sourcePDFpath))
            {
                for (int i = 1; i < = 3; i++)
                {
                    var ex = PdfTextExtractor.GetTextFromPage(r, i, t);
                    //Loop through each chunk found
                    foreach (var p in t.myPoints)
                    {
                        if (p.Text.Contains(textToSearch))
                        {
                            positions.Add(new TextPosition() { Page = i, X = p.Rect.Left, Y = p.Rect.Bottom, Width = p.Rect.Width, Height = p.Rect.Height + 3 });
                            
                        }
                    }
                }
            }
            return positions;
        }

2.Remove the Text by applying a White Patch
1.From the Previous code we have identified the Positions. Now we need to use this info to add a White Patch
2.We will be using PdfStamper class to create a additional content to apply/overwrite the required content.
3.The only caveat is that this procedure doesn’t remove the text. But hiding visually from the user.

public static void RemoveText(string filepath, string outputPDFpath, List<TextPosition> positions)
        {
            try
            {
                var pdfReader = new PdfReader(filepath);
                int totalPages = pdfReader.NumberOfPages;
                pdfReader.Close();

               
                for (int page = 1; page <= totalPages; page++)
                {
                    using (Stream inputPdfStream = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.Read))
                    using (Stream outputPdfStream = new FileStream(outputPDFpath, FileMode.Create, FileAccess.Write, FileShare.ReadWrite))
                   
                 
                    {

                        //Opens the unmodified PDF for reading
                        var reader = new PdfReader(inputPdfStream);
                        //Creates a stamper to put an image on the original pdf
                        var stamper = new PdfStamper(reader, outputPdfStream) { FormFlattening = true, FreeTextFlattening = true };

                        //Adds the image to the output pdf

                        iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(new Bitmap((int)positions[page].Width, (int)positions[page].Height), BaseColor.WHITE);
                        image.SetAbsolutePosition(positions[page].X, positions[page].Y);
                        stamper.GetOverContent(page).AddImage(image, true);

                    

                        //Creates the first copy of the outputted pdf
                        stamper.Close();
                        reader.Close();
                        //Opens our outputted file for reading
                        
                    }
                    File.Delete(filepath);
                    if (totalPages != page)
                        File.Move(outputPDFpath, outputPDFpath + "_1");
                   
                    filepath = outputPDFpath + "_1";
                }

              
                 
                

            }
            catch (Exception ex)
            {
            }

        }

Nirmal

Technical Geek - Gadget Enthusiast - Loves Programming C#,PHP & IOS - Blogger About

More Posts - Website

Follow Me:
TwitterGoogle Plus