Creating LaTeX PDF reports in Drupal

Generating documents for printing from web applications may be a challenging task.

Whoever has faced such a task has probably been tempted to create a new CSS file for the printer friendly version and simply offer a cleaner version of the online page. This may be a nice useful trick, but the result is not a printable document, but a less non-printable document. It is a nice trick, but it is nothing else. The same thing may be said about the use of a solution that generates PDF from an HTML file, i.e., DOMPDF.

The problem is that these solutions do not provide actual printer ready documents. They provide simply a document that is not terrible when printed. A document ready for print should be aware of the pages, should have page numbers, should treat carefully the tables and other objects that may need to occupy more than one page, and so on.

The solution I like is to use LaTeX to render the PDF.

I made an implementation of such solution for an internal system made with Drupal using Drutex and now I am doing it again for another system using Drupal 7.

Drutex module is not working quite well yet, so I decided to create a custom module for that. Of course I understand it would be nicer to help Drutex evolve, but I have a short deadline and it is much easier to create a custom module for my needs than to create a generic module that may fit all needs, as Drutex is.

What I did was basically create two custom modules: one for the PDF generation and other for the functions that will write reports.

I like to use an independent module to generate LaTeX reports, calling pdflatex, simply because I may use it again in another application. This module will have a function that calls pdflatex, some template files and a function to fix some problems concerning the conversion from pure text or html to latex.

The other module, the one that writes the reports, consists basically of functions that write a LaTeX code and deliver it to the PDF generating module. One should keep in mind that PHP escapes the characters after backslashes ("\") and thus it is necessary to double it for LaTeX commands. Also, PHP has more than one kind of strings. For coherence I like to use always double quote strings.

Bellow is the main function of the PDF generating module: It receives a name, a content and a template and creates a .tex file, executes the pdflatex on it and deliver it to the user. It is not a ready to use code yet, but it is working for me and may be a good reference for those who are in need of doing something similar. Explanations are in the comments of the code. (Notice that "segep_latex" is simply the name of my module)


# Folder that contains the files generated by pdflatex
$tex_directory = $_SERVER['DOCUMENT_ROOT'].base_path().'sites/default/files/tex/';
# Folder that contains the template files (segep_latex is the name of my report generating module)
$template_directory = drupal_get_path('module', 'segep_latex').'/templates/';
 /* 
 * Creates a PDF with pdflatex given a content.
 * Content and its name must be safe.
 * Name will be used in shell and content will be used in LaTeX
 */
function segep_latex_generate_pdf($name, $content, $template="basic_report.tex"){
    global $tex_directory;
    global $template_directory;
    #Template files need to be created freely while working with LaTeX.
    #It helps to use a LaTeX editor like Gummi.
    #it need a complete LaTeX code to work.
    #That is why what we will do is to use a separate file for template with a keyword to be replaced by the content received.
    #Keyword here is CONTENTTOREPLACE
    $template_content = file_get_contents($template_directory.$template);
    #It is necessary to remove from $name chars that may cause trouble with the file system
    #Forbidden char list will sure be greater, but this is a start
    #Lets replace any forbidden char for an underscore
    $forbidden_chars = array("/", ".", " ", "?", ";", ":", ",", "|");
    $name = str_replace($forbidden_chars, "_", $name);
    #First step is to create a .tex file with the content we want to render.
    #Apache will need writing permission in this folder
    $tex_file = fopen($tex_directory.$name.'.tex', "w");
    if ($tex_file){
        #If the file was created, let's write on it
        #First let's replace the template keyword
        #We should not filter the content to remove LaTeX commands. They should be filtered by now and the ones left should be intentional.
        $file_content = str_replace('CONTENTTOREPLACE', $content, $template_content);
        #Now we should write the content on the file
        fwrite($arquivo, $conteudo_arquivo);
        #Its time to run pdflatex and create de PDF files.
        #It is important to run at least twice for the indexes to be appropriately created from toc, lot and lof files.
        #In my case, due to longtable package, I need to run more than twice for the tables to be properly aligned
        shell_exec("cd ".$tex_directory." pdflatex ".$name.".tex && pdflatex ".$name.".tex ;");
        shell_exec("cd ".$tex_directory." pdflatex ".$name.".tex && pdflatex ".$name.".tex ;");
        #The work is done. Lets deliver it.
        $path_to_the_file = 'public://tex/'.$nome.'.pdf';
        if (file_exists($path_to_the_file)){
            file_transfer($path_to_the_file, array( 'Content-Type'=>'application/pdf', 'Content-Disposition'=>'attachment; filename="'.$name.'.pdf', 'Content-Length' => filesize($path_to_the_file), ) );
            #I read somewhere that IE have trouble receiving files this way.
            #Next we fix it.
            if(strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE')) {
                $http_headers['Cache-Control'] = 'must-revalidate, post-check=0, pre-check=0'; $http_headers['Pragma'] = 'public'; 
             } else {
                 $http_headers['Pragma'] = 'no-cache';
             }
        }else{
            drupal_set_message('The pdf file could not be created','error');
        }
    }else{
        drupal_set_message('The tex file could not be created', 'error');
    }
};

The report generating module basically writes LaTeX code using data from Drupal. I like to use the function views_get_views_result() in order to retrieve Drupal contents. This way I can use the power of views module to generate reports and (most importantly) give the task of generating views to other members of the team and focus on coding.

comments powered by Disqus