IFilters are used by search enginies and other applications like the Sharepoint server to read the content of files and documents. For each document type you need the fitting IFilter installed on your system.
The missing part is how to write IFilters easily. Of course there is some documentation from Microsoft, but it's heavy C stuff.
By 'accident' I found the solution in a huge project from Stephen Toub, he implemented all the needed stuff :-)
Based on his code I implemented an IFilter Template. You just have to implement two functions reporting the content of your document, that all. Look at the sample code for an IFilter reading TXT files how simple it is.
The IFilters based on the IFilter Template are really working fine, pass all tests provided by Microsoft, can be used with Sharepoint and MS Desktop Search and even with the new Windows Vista Search.
But there is one big issue with the IFilters bases on the IFilter template, they can not used by .NET applications. Upps.
The case is tricky.
In .NET you can call real/unmanaged COM objects, the framework will automatically create a RCW wrapper for it. Thatīs how reading the content of files is done, look the provided code sources above.
With .NET you can create COM objects, therefore the framework creates automatically a CCW wrapper that handles the transition from unmanaged code to the managed code, letīs call this COM objects, managed COM objects.
What happens if a .NET application calls a managed COM object? In this case both wrappers, the RCW and the CCW wrapper should be created and handle the calls. But reallity show that the framework goes a shortcut and just makes a call from managed code to managed code. By skipping the COM interface the .NET framework enforces an excat match between the managed types. This will fail, shown an invalid cast error, because you can not garantuee that the writter and consumer of a IFilter uses exact the same signatures for the Interface definition.
I worked hard to find a solution, but I was not successful up to now. I guess in future may developers will face the same problem in different areas and hopfully find a solution :-))
The answer from Microsoft helpdesk wasn't very helpful either: "It looks like our product group does not support calling the IFilter interfaces via .Net due to performance and dependency reasons. Given they do not support this for the current product, it's safe to say it's not supported for the currently shipping / legacy products either. In general managed IFilter is not a supported scenario. Loading CLR is very costly to indexing performance and different IFilters could take dependency on different versions of .Net Framework. We currently have no plan to support it."