A step by step tutorial teaching you how to create your own chat client and chat server easily in C#, for local networks or the Internet.
A C# tutorial showing you how to make use of WMI to extract information on disk drives, such as model, capacity, sectors and serial number.
This tutorial will teach you how to calculate the shipping cost based on the weight, height, length and depth of the box, the distance and the UPS service type.
Converting Word Documents to RTF, HTML, TXT, XML
In this tutorial you'll learn how to open and convert Word documents (.doc) to the popular formats RTF, HTML, TXT and XML. You will also find yourself learning the basics of operating with COM objects.
On Monday, April 10th 2006 at 09:00 AM
By Andrew Pociu (View Profile)
(Rated 4.8 with 18 votes)
Download this project (Visual Studio 2005)
Opening and manipulating Microsoft Office Word documents (.doc) can be done rather easily using the .NET Framework. You are capable of opening, editing and creating Word documents with only a few lines of code. However, since classes for managing the Word document format are not available in the .NET Framework, the solution is to reference COM objects into your project. The downside of this is that to be able to manage the Word documents with the application we're going to create in this tutorial, the user running it will need to have Microsoft Word installed, preferably the same version that we designed the application for.
In this tutorial the application was designed and tested to work with Microsoft Office version 11, more exactly Microsoft Office Word 2003. On other recent versions, the application is likely to work but it may require a few changes, especially the Open() and SaveAs() functions which probably differ. Therefore if you find the project attached doesn't work on your system, and you don't have Microsoft Office 2003 installed, that's probably the cause.
Just to make things clear: there is a way to open, edit and save Word documents without requiring the Word application to be installed, however the task of building such an application would require an entire team of experienced programmers where a language such as C++ might prove more efficient, since it involves creating your application from scratch, i.e. to create your own .doc parser - unless you find a 3rd party component that does that.
Start by creating a C# Windows application project. Add a total of 6 buttons and one label. Name them btnOpen, btnClose, btnToHtml, btnToRTF, btnToText, btnToXml and the label lblFilePath. Disable the four convert buttons and the close button (btnClose) by setting the Enabled property to false. We will enable them once the user chooses a file to convert. Now there's two more controls you need to add to the project, via the Visual Studio Toolbox: an OpenFileDialog and a SaveFileDialog. Name them openDoc and saveDoc. The first dialog (openDoc) we will use to open the MS Word Document that we want to convert, thus we want to restrict the user to choosing only a Microsoft Word type of document (.doc), and to do that go ahead and change the Filter property of the OpenFileDialog to the following value:
This assures us that the user will only be able to select a Word Document. For more details on this object, please see the Using OpenFileDialog to open files tutorial.
As for the other dialog - saveDoc, we're not going to define a filter right now, because the file type to which we're going to save depends on what button the user clicks (To HTML, To RTF, etc.). We're going to define the filter when the user clicks the button, because at that time we know the extension.
Now let's start doing what we need to do to open an Word document. Right click the project name in Solution Explorer and choose Add Reference. Switch to the COM tab and scroll down until you find Microsoft Word 11.0 Object Library. If you don't have this item listed, you probably don't have Microsoft Office installed so unfortunately the tutorial ended for you here. In case you see a different version of the object library such as Microsoft Word 10.0 Object Library or Microsoft Word 9.0 Object Library, it means you have an older version of Office. Normally you should be able to adjust the code from this tutorial to match your Word version, easily.
After you add the Word Object Library to your project, in Solution Explorer you will see some new items were added:
Now that we have Microsoft.Office.Core, VBIDE and Word added as a reference we are ready to start coding. Switch to code view, and the first thing we want to do is create three objects in the Form1 class, right above the constructor:
The first object is the Word Application Class, which we can access thanks to the COM reference we added earlier. We're going to use this to start the Microsoft Word engine, which will do the work of converting the document to the other formats. WordApp will also be the one opening the document; the document will then be stored inside WordDoc - which is the the second object we create.
The third object seems kind of odd - it's an object of the type Missing. The functions we are going to call for opening and saving the document will take a handful or parameters, but we'll only want to specify a few of them. For the other parameters that we don't have any values to pass to, we're going to pass this missing object - as in "parameter is missing".
The reason for this small inconvenience is that the COM object was meant to be used mainly with the VisualBasic language where there is no method overloading, overriding or constructors. Visual Basic is also more permissive and allows the user to skip some parameters. In C# we can't skip these parameters and we'll have to specify a missing parameter, similar to specifying null.
Now that we have these objects ready, we can open the Word document. To do that, double-click btnOpen to create its Click event handler. Use the following code:
The above code opens the Word document specified by the user in the OpenFileDialog window, enables the convert and close buttons and sets the label to the path of the file just so that we remember which file is opened.
As we discussed before, we pass a handful of values to the parameters of the Documents.Open method, but to most of them with pass the reference of DocNoParam which contains Type.Missing, meaning plain and simple that we don't want to pass anything to that parameter. The Office COM object was designed with the Visual Basic language in mind, that's why this line in Visual Basic would be about 10 times shorter since we would only have to pass values to the parameters that we are interested in.
Now that we have the Word document opened and we can manipulate it as you we want, let's accomplish the main task of our program and save this document with different formats. The first button is supposed to save to HTML, so double-click it to get to the click event handler and use the following code:
As you can see in the code above, when btnToHtml is clicked we prompt the user to save the document in the HTML format. The whole magic is in the object SaveToFormat = Word.WdSaveFormat.wdFormatHTML; line where specify the format we wish to use. In this case we specify wdFormatHTML to save the file as an HTML document. Upon clicking this button, the document will be converted from its specific .doc format to HTML tags. Along with the HTML file, sometimes there is also a folder created that holds the pictures for that document, referenced in the HTML document.
From the remaining 3 buttons the code get repetitive, with only a few changes to adjust the different extension.
The C# code for converting to RTF:
The C# code for converting to plain text:
The C# code for converting to XML:
There's one last thing we need to do. Unless we close each document after we open it, instances of WinWord.exe will remain in memory, so obviously you'll want to press the close button before opening another document or closing the application. In the click event handler of btnClose we tell Word to close the document and to not save any changes:
Here is the entire application code in case you want to have an overall look:
|Digg It! Del.icio.us Reddit StumbleIt Newsvine Furl BlinkList|
Rate this tutorial
There are no related tutorials.
Related Source Code
There is no related source code.
C# Job Search
From the creators of Geekpedia, a revolutionary new coupon website!
BargainEZ has coupons codes, printable coupons, bargains and it is the leading source of Passbook coupons for iPhone and iPod touch devices.