Wwhy bother converting in the first place? In my case it’s because I’ve a large document thatneeds revisions; and I don’t want to have to fight with Word again to get it ready for printing. Using LaTeX sepatates the content from the format, so I can concentrate on words, and let LaTeX place the page-breaks, line-breaks, hyphenation, figures, tables, footnotes etc. in the best place when the document is compiled.

For producing technical writing, LaTeX is amazing; and has by far the easiest method of entering mathematical equations. Just writing

$N_x = \frac{k \pi^2 D}{b^2}$

produces:

$latex N_x = \frac{k\pi^2D}{b^2}$.

And, a long document just looks so much better in LaTeX. Here’s an example from my .docx conversion:

[gallery link=”file”]

Process

Going from Word to Latex is not single step, there’s currently no method to go directly and any conversion method will have errors. Not least because a Word document is often poorly structured; but also cross-references, citations, figures and equations don’t normally transfer well.

My preferred process is:

  1. Open the Word document in LibreOffice Writer.

  2. Remove any items that don’t transfer (for instance all the drawings and equations boxes in my document floated to the top of the first page, each with no contents.

  3. Save as on .odt file.

  4. Open the odt in Abiword (with the LaTeX export plugin installed).

  5. Save as a TeX. Or convert on one step with:

abiword --to=tex document.odt

  1. The output from Abiword will not be very clean, but running something like:
<code class="block">sed -i -e "/setlength{.oddsidemargin/ d" \
       -e "/setlength{.textwidth/ d" \
       -e "/begin{flushleft}/ d" \
       -e "/end{flushleft}/ d" \
       -e "/begin{flushright}/ d" \
       -e "/end{flushright}/ d" \
       -e "/begin{center}/ d" \
       -e "/end{center}/ d" \
       -e "/begin{spacing}/ d" \
       -e "/end{spacing}/ d" \
       -e 's/\\[lL]arge//g' \
       -e 's/{``}/``/g' \
       -e 's/{`}/`/g' document.tex</code>

[1]

  1. should remove a lot of cruft. Though it could leave some hanging curly brackets that will cause non-fatal errors when compiling.

I also had a lot of resized text in tables and, to remove the attributes while preserving the data, used

sed -r -e 's/\{\\(script|footnote)size ([^\}]*[\\%]*)\}/\2/g' document.tex

  1. Now transfer the missing images from the original document, and replace any missing equations. Be sure to do the cross-references too.

  2. Format the bibliography and add the citations.

Exporting Images

If the document you are transferring already has images embedded from external sources, then adding them the LaTeX document is fairly straight forward. But if you’ve got Word drawings, or Excel graphs, you’ll want to extract them from the original document.

I don’t know a way of doing this to all the figures at once, and I don’t like copy-and-paste as this gives raster (bitmap) images, but it is possible to go one-by-one and copy them out in a vector format.

  1. From Word, save/export the file as a pdf (this may require a Microsoft plugin)

  2. Import one page of the pdf into Inkscape.

  3. Remove all artifacts other than the diagram/graph.

  4. Resize the drawing canvas to match the drawing.

  5. Save the svg.

  6. Save a copy as pdf+LaTeX, that way any text is separate, and will be rendered as body text when the final document is processed.

Then add a figure with:

<code class="block">\begin{figure}
  \centering \def\svgwidth{\columnwidth}
  \import{images/}{file.pdf_tex}
  \caption{}
  \label{im:}
  \end{figure}</code>

Source: [1] charlietanksley.net/philtex/converting-to-latex/

Some of the best introductions to LaTeX though are wikibooks.org - LaTeX, and A not so short introduction to LaTeX 2e (scridb)