When looking for a source of reference, I get a pdf file that the content/text in the form of images. As a result I could not copy the text (Lol... bad habit). There is one way to change the scanned pdf into html, by using google. If you get a pdf file of search results, then Google will display an option to display it in html form. This method is only useful if the pdf file indexed all by google and only displays the first 20 pages alone.
There is one
other way to convert pdf files into a set of images (.png, .jpg), and then scan the image with OCR, and then to output a HTML or
text file.
Since I (the author) using the operating system GNU / linux ubuntu,
then the method below only applies to OS GNU/Linux only (for the
windows version, ask someone else!).
Just go ahead....
First install xpdf, imagemagick, and ocropus
sudo apt-get install xpdf imagemagick ocropus
use this script to convert. Save with the name "pdf2txt" (without the quotes).
click
here to view the code.
How to use: Suppose you want convert "makalah.pdf" into the html file, use the command:
./txt2pdf makalah.pdf > makalah.html
the above command will convert "makalah.pdf" and output "makalah.html". by: Alvin