Geekpedia Tutorials Home

Building a C# Chat Client and Server

Building a C# Chat Client and ServerA step by step tutorial teaching you how to create your own chat client and chat server easily in C#, for local networks or the Internet.

in C# Programming Tutorials

Getting Hard Drive Information

Getting Hard Drive InformationA C# tutorial showing you how to make use of WMI to extract information on disk drives, such as model, capacity, sectors and serial number.

in C# Programming Tutorials

UPS Shipping Calculator

UPS Shipping CalculatorThis tutorial will teach you how to calculate the shipping cost based on the weight, height, length and depth of the box, the distance and the UPS service type.

in PHP Programming Tutorials

Create Your Own Rich Text Editor

Create Your Own Rich Text EditorCreating a Rich Text Editor using JavaScript is easier to do than you might think, thanks to the support of modern browsers; this tutorial will walk you through it.

in JavaScript Programming Tutorials
Search
Tutorials
Programming Tutorials
IT Jobs
From CareerBuilder

Converting Word Documents to RTF, HTML, TXT, XML

In this tutorial you'll learn how to open and convert Word documents (.doc) to the popular formats RTF, HTML, TXT and XML. You will also find yourself learning the basics of operating with COM objects.

On Monday, April 10th 2006 at 09:00 AM
By Andrew Pociu (View Profile)
*****   (Rated 4.8 with 18 votes)
Contextual Ads
More C# Resources
Advertisement
Download this Visual Studio 2005 project Download this project (Visual Studio 2005)

Converting Word Documents to RTF, HTML, TXT and XML

Opening and manipulating Microsoft Office Word documents (.doc) can be done rather easily using the .NET Framework. You are capable of opening, editing and creating Word documents with only a few lines of code. However, since classes for managing the Word document format are not available in the .NET Framework, the solution is to reference COM objects into your project. The downside of this is that to be able to manage the Word documents with the application we're going to create in this tutorial, the user running it will need to have Microsoft Word installed, preferably the same version that we designed the application for.
In this tutorial the application was designed and tested to work with Microsoft Office version 11, more exactly Microsoft Office Word 2003. On other recent versions, the application is likely to work but it may require a few changes, especially the Open() and SaveAs() functions which probably differ. Therefore if you find the project attached doesn't work on your system, and you don't have Microsoft Office 2003 installed, that's probably the cause.
Just to make things clear: there is a way to open, edit and save Word documents without requiring the Word application to be installed, however the task of building such an application would require an entire team of experienced programmers where a language such as C++ might prove more efficient, since it involves creating your application from scratch, i.e. to create your own .doc parser - unless you find a 3rd party component that does that.

Convert Word Form

Start by creating a C# Windows application project. Add a total of 6 buttons and one label. Name them btnOpen, btnClose, btnToHtml, btnToRTF, btnToText, btnToXml and the label lblFilePath. Disable the four convert buttons and the close button (btnClose) by setting the Enabled property to false. We will enable them once the user chooses a file to convert. Now there's two more controls you need to add to the project, via the Visual Studio Toolbox: an OpenFileDialog and a SaveFileDialog. Name them openDoc and saveDoc. The first dialog (openDoc) we will use to open the MS Word Document that we want to convert, thus we want to restrict the user to choosing only a Microsoft Word type of document (.doc), and to do that go ahead and change the Filter property of the OpenFileDialog to the following value:

Word Document|*.doc

This assures us that the user will only be able to select a Word Document. For more details on this object, please see the Using OpenFileDialog to open files tutorial.
As for the other dialog - saveDoc, we're not going to define a filter right now, because the file type to which we're going to save depends on what button the user clicks (To HTML, To RTF, etc.). We're going to define the filter when the user clicks the button, because at that time we know the extension.

Now let's start doing what we need to do to open an Word document. Right click the project name in Solution Explorer and choose Add Reference. Switch to the COM tab and scroll down until you find Microsoft Word 11.0 Object Library. If you don't have this item listed, you probably don't have Microsoft Office installed so unfortunately the tutorial ended for you here. In case you see a different version of the object library such as Microsoft Word 10.0 Object Library or Microsoft Word 9.0 Object Library, it means you have an older version of Office. Normally you should be able to adjust the code from this tutorial to match your Word version, easily.

COM Reference

After you add the Word Object Library to your project, in Solution Explorer you will see some new items were added:

Word Reference

Now that we have Microsoft.Office.Core, VBIDE and Word added as a reference we are ready to start coding. Switch to code view, and the first thing we want to do is create three objects in the Form1 class, right above the constructor:

private Word.ApplicationClass WordApp;
private Word.Document WordDoc;
private object DocNoParam = Type.Missing;


The first object is the Word Application Class, which we can access thanks to the COM reference we added earlier. We're going to use this to start the Microsoft Word engine, which will do the work of converting the document to the other formats. WordApp will also be the one opening the document; the document will then be stored inside WordDoc - which is the the second object we create.
The third object seems kind of odd - it's an object of the type Missing. The functions we are going to call for opening and saving the document will take a handful or parameters, but we'll only want to specify a few of them. For the other parameters that we don't have any values to pass to, we're going to pass this missing object - as in "parameter is missing".
The reason for this small inconvenience is that the COM object was meant to be used mainly with the VisualBasic language where there is no method overloading, overriding or constructors. Visual Basic is also more permissive and allows the user to skip some parameters. In C# we can't skip these parameters and we'll have to specify a missing parameter, similar to specifying null.

Now that we have these objects ready, we can open the Word document. To do that, double-click btnOpen to create its Click event handler. Use the following code:

private void btnOpen_Click(object sender, EventArgs e)
{
   // Create an instance of the Word Application
   WordApp = new Word.ApplicationClass();
   // We don't want to display the Microsoft Word window
   WordApp.Visible = false;

   // If the user choosed a path of the file to open
   if (this.openDoc.ShowDialog() == DialogResult.OK)
   {
      // Set the label to the new file path
      lblFilePath.Text = openDoc.FileName;
      // Enable the convert and close buttons, since now we have a document opened
      btnToHtml.Enabled = true;
      btnToRTF.Enabled = true;
      btnToText.Enabled = true;
      btnToXml.Enabled = true;
      btnClose.Enabled = true;

      // Create and set the objects we're going to pass to the Open() function
      object DocFileName = openDoc.FileName;
      object DocReadOnly = false;
      object DocVisible = true;

      // Open the document by passing the path
      WordDoc = WordApp.Documents.Open(ref DocFileName, ref DocNoParam, ref DocReadOnly, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocVisible, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
      WordDoc.Activate();
   }
}


The above code opens the Word document specified by the user in the OpenFileDialog window, enables the convert and close buttons and sets the label to the path of the file just so that we remember which file is opened.
As we discussed before, we pass a handful of values to the parameters of the Documents.Open method, but to most of them with pass the reference of DocNoParam which contains Type.Missing, meaning plain and simple that we don't want to pass anything to that parameter. The Office COM object was designed with the Visual Basic language in mind, that's why this line in Visual Basic would be about 10 times shorter since we would only have to pass values to the parameters that we are interested in.

Now that we have the Word document opened and we can manipulate it as you we want, let's accomplish the main task of our program and save this document with different formats. The first button is supposed to save to HTML, so double-click it to get to the click event handler and use the following code:

private void btnToHtml_Click(object sender, EventArgs e)
{
   // Suggest a path for saving
   saveDoc.FileName = @"C:\Test Document.html";
   // The file extension to which we want to save
   saveDoc.Filter = "HTML Files|*.html";

   // If the user choosed a path where to save the file
   if (this.saveDoc.ShowDialog() == DialogResult.OK)
   {
      // Set the save path object
      object SaveToPath = saveDoc.FileName;
      // Set the format type to HTML (wdFormatHTML)
      object SaveToFormat = Word.WdSaveFormat.wdFormatHTML;
      // Save the document to the specified path and format
      WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
   }
}


As you can see in the code above, when btnToHtml is clicked we prompt the user to save the document in the HTML format. The whole magic is in the object SaveToFormat = Word.WdSaveFormat.wdFormatHTML; line where specify the format we wish to use. In this case we specify wdFormatHTML to save the file as an HTML document. Upon clicking this button, the document will be converted from its specific .doc format to HTML tags. Along with the HTML file, sometimes there is also a folder created that holds the pictures for that document, referenced in the HTML document.

From the remaining 3 buttons the code get repetitive, with only a few changes to adjust the different extension.

The C# code for converting to RTF:

private void btnToRTF_Click(object sender, EventArgs e)
{
   // Suggest a path for saving
   saveDoc.FileName = @"C:\Test Document.rtf";
   // The file extension to which we want to save
   saveDoc.Filter = "RTF Files|*.rtf";

   // If the user choosed a path where to save the file
   if (this.saveDoc.ShowDialog() == DialogResult.OK)
   {
      // Set the save path object
      object SaveToPath = saveDoc.FileName;
      // Set the format type to RTF (wdFormatRTF)
      object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatRTF;
      // Save the document to the specified path and format
      WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
   }
}


The C# code for converting to plain text:

private void btnToText_Click(object sender, EventArgs e)
{
   // Suggest a path for saving
   saveDoc.FileName = @"C:\Test Document.txt";
   // The file extension to which we want to save
   saveDoc.Filter = "Text Files|*.txt";

   // If the user choosed a path where to save the file
   if (this.saveDoc.ShowDialog() == DialogResult.OK)
   {
      // Set the save path object
      object SaveToPath = saveDoc.FileName;
      // Set the format type to TXT (wdFormatText)
      object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatText;
      // Save the document to the specified path and format
      WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
   }
}


The C# code for converting to XML:

private void btnToXml_Click(object sender, EventArgs e)
{
   // Suggest a path for saving
   saveDoc.FileName = @"C:\Test Document.xml";
   // The file extension to which we want to save
   saveDoc.Filter = "XML Files|*.xml";

   // If the user choosed a path where to save the file
   if (this.saveDoc.ShowDialog() == DialogResult.OK)
   {
      // Set the save path object
      object SaveToPath = saveDoc.FileName;
      // Set the format type to XML (wdFormatXML)
      object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatXML;
      // Save the document to the specified path and format
      WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
   }
}


There's one last thing we need to do. Unless we close each document after we open it, instances of WinWord.exe will remain in memory, so obviously you'll want to press the close button before opening another document or closing the application. In the click event handler of btnClose we tell Word to close the document and to not save any changes:

private void btnClose_Click(object sender, EventArgs e)
{
   // Since we don't want to save changes to the original document
   object SaveChanges = false;
   // Close the document, save no changes
   WordDoc.Close(ref SaveChanges, ref DocNoParam, ref DocNoParam);
}


Here is the entire application code in case you want to have an overall look:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace OpenWord
{
   public partial class Form1 : Form
   {
      private Word.ApplicationClass WordApp;
      private Word.Document WordDoc;
      private object DocNoParam = Type.Missing;

      public Form1()
      {
         InitializeComponent();
      }

      private void btnOpen_Click(object sender, EventArgs e)
      {
         // Create an instance of the Word Application
         WordApp = new Word.ApplicationClass();
         // We don't want to display the Microsoft Word window
         WordApp.Visible = false;

         // If the user choosed a path of the file to open
         if (this.openDoc.ShowDialog() == DialogResult.OK)
         {
            // Set the label to the new file path
            lblFilePath.Text = openDoc.FileName;
            // Enable the convert and close buttons, since now we have a document opened
            btnToHtml.Enabled = true;
            btnToRTF.Enabled = true;
            btnToText.Enabled = true;
            btnToXml.Enabled = true;
            btnClose.Enabled = true;

            // Create and set the objects we're going to pass to the Open() function
            object DocFileName = openDoc.FileName;
            object DocReadOnly = false;
            object DocVisible = true;

            // Open the document by passing the path
            WordDoc = WordApp.Documents.Open(ref DocFileName, ref DocNoParam, ref DocReadOnly, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocVisible, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
            WordDoc.Activate();
         }
      }

      private void btnToHtml_Click(object sender, EventArgs e)
      {
         // Suggest a path for saving
         saveDoc.FileName = @"C:\Test Document.html";
         // The file extension to which we want to save
         saveDoc.Filter = "HTML Files|*.html";

         // If the user choosed a path where to save the file
         if (this.saveDoc.ShowDialog() == DialogResult.OK)
         {
            // Set the save path object
            object SaveToPath = saveDoc.FileName;
            // Set the format type to HTML (wdFormatHTML)
            object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatHTML;
            // Save the document to the specified path and format
            WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
         }
      }

      private void btnToRTF_Click(object sender, EventArgs e)
      {
         // Suggest a path for saving
         saveDoc.FileName = @"C:\Test Document.rtf";
         // The file extension to which we want to save
         saveDoc.Filter = "RTF Files|*.rtf";

         // If the user choosed a path where to save the file
         if (this.saveDoc.ShowDialog() == DialogResult.OK)
         {
            // Set the save path object
            object SaveToPath = saveDoc.FileName;
            // Set the format type to RTF (wdFormatRTF)
            object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatRTF;
            // Save the document to the specified path and format
            WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
         }
      }

      private void btnToText_Click(object sender, EventArgs e)
      {
         // Suggest a path for saving
         saveDoc.FileName = @"C:\Test Document.txt";
         // The file extension to which we want to save
         saveDoc.Filter = "Text Files|*.txt";

         // If the user choosed a path where to save the file
         if (this.saveDoc.ShowDialog() == DialogResult.OK)
         {
            // Set the save path object
            object SaveToPath = saveDoc.FileName;
            // Set the format type to TXT (wdFormatText)
            object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatText;
            // Save the document to the specified path and format
            WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
         }
      }

      private void btnToXml_Click(object sender, EventArgs e)
      {
         // Suggest a path for saving
         saveDoc.FileName = @"C:\Test Document.xml";
         // The file extension to which we want to save
         saveDoc.Filter = "XML Files|*.xml";

         // If the user choosed a path where to save the file
         if (this.saveDoc.ShowDialog() == DialogResult.OK)
         {
            // Set the save path object
            object SaveToPath = saveDoc.FileName;
            // Set the format type to XML (wdFormatXML)
            object SaveToFormat = SaveToFormat = Word.WdSaveFormat.wdFormatXML;
            // Save the document to the specified path and format
            WordDoc.SaveAs(ref SaveToPath, ref SaveToFormat, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam, ref DocNoParam);
         }
      }

      private void btnClose_Click(object sender, EventArgs e)
      {
         // Since we don't want to save changes to the original document
         object SaveChanges = false;
         // Close the document, save no changes
         WordDoc.Close(ref SaveChanges, ref DocNoParam, ref DocNoParam);
      }
   }
}

Digg Digg It!     Del.icio.us Del.icio.us     Reddit Reddit     StumbleUpon StumbleIt     Newsvine Newsvine     Furl Furl     BlinkList BlinkList

Rate Rate this tutorial
Comment Current Comments
by veera prasad on Monday, July 24th 2006 at 07:12 AM

Hi..Andrei Pociu .
Gr8 work man, I got an errorr regarding this code.While im using Microsoft Word 10.0 Object Library. its working well coverstion of word to text.
But my task is Covert Word to XML so im using Microsoft Word 11.0 Object Library .. then im getting an error Object reference not set to an instance of an object. at codeline : WordDoc = WordApp.Documents.Open() ... can u pls tell what will i do to fix this one
Thanks & Regards,
Veera Prasad

by veera prasad on Monday, July 24th 2006 at 07:15 AM

im using vs.net1.1 , Ms-office 2003.

by veera prasad on Tuesday, July 25th 2006 at 07:28 AM

Hi Andrei Pociu..

sorry yar,
Problem is from side..

Code is excelent .. Thanks for sharing such a Valuable Resouce..

Thanks alottt..

by Kiran on Thursday, June 12th 2008 at 02:19 AM

Hi Andrei

I appreciate your work and the way you have approached.

It's really helpful for a person who has even zero knowledge in C# programming.

But unfortunately when i tried this i am unable to see any diffferneces when its get converted to .html and .xml and more over it is throwing some errors.

I could be definitely wrong with my side of coding , but just in case can u let me know whatz the problem like where i am wrong.

Thank you

by prasad on Wednesday, October 15th 2008 at 11:25 AM

Nice presentation.

by thanks on Thursday, November 27th 2008 at 08:03 PM

that i need

by web 2.0 linkwheels on Wednesday, December 14th 2011 at 09:41 AM

hi,
its good.
i want to know how can we get the current time without the date.
pls help me

by one way backlinks on Wednesday, December 14th 2011 at 09:44 AM

hi,
its good.
i want to know how can we get the current time without the date.
pls help me

by johnnie walker whiskey on Friday, February 10th 2012 at 09:14 AM

Nice information, many thanks to the author. It is incomprehensible to me now, but in general, the usefulness and significance is overwhelming. Thanks again and good luck!

by hamid on Thursday, April 11th 2013 at 04:53 PM

How I can convert doc file into pdf file?

by Designer Wholesale Clothing on Thursday, May 23rd 2013 at 03:26 PM

http://www.designeronlinecheap.com/


Comment Comment on this tutorial
Name: Email:
Message:
Comment Related Tutorials
There are no related tutorials.

Comment Related Source Code
There is no related source code.

Jobs C# Job Search
My skills include:
Enter a City:

Select a State:


Advanced Search >>
Ads

From the creators of Geekpedia, a revolutionary new coupon website!

BargainEZ has coupons codes, printable coupons, bargains and it is the leading source of Passbook coupons for iPhone and iPod touch devices.

Coupons
Discover Geekpedia
Other Resources