Moving Documents in SharePoint 2010 without assigning new ECM Document IDs

9. February 2012

Using the new ECM Document ID service in SharePoint 2010, poses a challenge when it comes to moving documents in code. For example in a custom workflow that routes documents from one location to another. Moving documents with custom code will by default trigger the document id provider to assign new IDs to the documents in the new location. It goes like this no matter if you are using the default provider or a custom document id provider.

Fortunately, there is a way to programmatically add a document with a preset document id. All you need to do is to add the document with the two properties _dlc_DocId and _dlc_DocIdPersistId as shown in the code example below.

private SPFile AddFile(SPFolder folder, string filename, Stream content, string docId)
    var properties = new Hashtable();
    properties["_dlc_DocId"] = docId;
    properties["_dlc_DocIdPersistId"] = "True";
    return folder.Files.Add(filename, content, properties);

The document id provider will when _dlc_DocIdPersistId="True", assign the document id provided in the _dlc_DocId property rather than generating a new id.

The Document ID service in SharePoint is pretty nice and useful. But it does have some caveats that you should be aware of before you decide to employ it in your organization. But I am preparing another post on this topic – stay tuned!

SharePoint 2010, ECM

SharePoint 2010 Search: The Good News and the Bad News

29. January 2010

Microsoft SharePoint Server 2010 is just around the corner now and it will once again raise the bar in the SharePoint Enterprise Search space. I witnessed the new SP2010 search experience first time at the SharePoint 2009 conference in Vegas last year and was overall quite pleased with what I saw there. It looks like the search team in Redmond has really listened to the community and their customers and addressed many of the annoying pain points present in the SharePoint Server 2007 search experience.

However, having worked with SP2010 Beta 2 for a while now has revealed some pain points / annoyances left behind in the product. Some of them I can understand from a technical standpoint while others just make we wonder how they could miss it again. Don’t get me wrong here – I’m still a big fan of SharePoint Search and just want this part of the product to be mere perfect. Seriously, Enterprise search is an ever more vital part of most SharePoint deployments.

Anyway, I am dedicating this post to sharing the good news and the bad news I have learned so far by working with SharePoint Server 2010 Search. Please note that the FAST Search Server for SharePoint 2010 is not included in my evaluation here – it is technically and financially a whole other ball game.

The Good News

  • Improved relevance. More parameters included in score calculation. One cool new parameter is click-through rate on search results also known as popularity ranking. Other parameters include URL fuzzy matching, social tags, inferred metadata, detected language and implicit phrase matching.
  • Enhanced query syntax. Enables power users to build advanced queries using Boolean operators like AND, OR, NOT. Also support for the range operators <, >, <=, and >= for searching numeric ranges or date ranges.
  • Wildcard search. Now possible to search for partial words using the wildcard character *. For example search for: Micro* author:bill*
  • Enhanced multi-lingual support. Improved language detection from document text, better word breaker in more languages for better handling of compound words.
  • Phonetic and nickname search. Useful in people search to match similar names with different spelling. E.g. a search for Chris also returns people named Kris or Christopher. I really like this new feature as it makes people search much more precise and useful.
  • Faceted search aka. Refiners. Presents users with a list of relevant suggestions for refining the search results by document type, site, author, modified date, tags or any other managed property available in the index. The refiners as Microsoft like to call them, offers a very simple and intuitive way to filter results by metadata.
  • Query suggestions. Presents a list of relevant search terms as you type – this is known as pre-query suggestions. There is also post-query suggestions, which is just a Web part listing related queries. The suggestions are based on past queries from other users.
  • Improved did you mean suggestions? Support for more languages.
  • View in browser. Link in the search results for viewing office documents with full fidelity directly in the browser. Requires Office Web Applications 2010 to be installed on the server. This feature is very useful to users who do not have the Office clients installed or do not want to always download the entire document.
  • Open Web Parts. The OOB search Web parts are no longer sealed! Consequently, it will be much easier for developers to build their own custom search Web parts simply by extending the built-in ones. With SharePoint 2007 you would have to build your own from scratch, which is a very daunting task.
  • New Connector Framework. This is the next evolutionary step of the Business Data Catalog introduced in SharePoint Server 2007. Indexing external content has become a lot easier thanks to much better tool support in the form of the new SharePoint Designer 2010. Hooking up SharePoint to index content from a database is now a no-brainer; point SPD to your database to automatically reverse engineer a BDC model, then deploy that model to the indexer via the admin UI. To index more complex and dynamic repositories, developers can now also build custom connectors in managed code (The old C++ Protocol Handler API is still supported). Other great improvements over the old BDC are the ability to index document attachments and item security (ACLs).
  • Improved admin dashboard. Offers a few improvements to the search admin dashboard introduced in SharePoint 2007 with the infrastructure update.
  • New health analysis tool. Can generate reports useful for performance monitoring, capacity planning and troubleshooting.
  • PowerShell scripting. Enables administrator to automate virtually all search administration tasks by using Windows PowerShell 2.0 scripts.
  • New and improved deployment architecture. The search system has been componentized a lot more for improved performance, scalability and availability. With enough servers, SharePoint Search now scales to about 100 million documents while maintain fresh indexes and sub-second query latency. The most welcome improvement is without doubt support for multiple stateless crawlers (aka. indexers) on the same content source. Another biggie is support for partial indexes, i.e. support for splitting a large index across multiple query servers.
  • Better support for indexing case sensitive repositories.
  • Improved Search Analytics. SP2010 more or less includes the same type of search analytics reports as we know from SharePoint Server 2007. But they have received some nice improvements like nicer graphics and the ability to view data in any date range. Also, it is now possible to create custom reports thanks to a new and documented Data Warehouse.
  • Desktop search integration in Windows 7.

The Bad News

  • The same old advanced search Web part. Looks like it was brought over from SharePoint 2007 as is - it still does not offer a good parametric search experience with property value drop-downs and the like. Users will still need to know and type possible metadata values. However, with the introduction of faceted search this shortcoming is not as severe as it was for SharePoint Server 2007 when it came out. Furthermore, the enhanced query syntax will also make it a hell lot easier for developers to create their own advanced search Web part to assist users in constructing complex queries.
  • Sub-optimal navigation experience. If search is great, users will adopt it as a good navigation tool. But in a SP2010 search center there is no way of navigating from a document search result to the document library where the document lives. Also, it can be hard for users to navigate back from the search center to the site they initiated the search from. However, these issues can be fixed with a little customization. But it would have been nice to see a greater user experience out-of-the-box.
  • Inconsistent search UI. The problem I am referring to here is that searching the "This site" search scope in the search box does not take the user to a search center; instead it takes her to the SP2010 Foundation search page in the _layouts folder. All other search scopes take her to the full search center. In other words you still have two different search interfaces in SharePoint. My recommendation will be to turn off the “This site” scope as results can anyway be refined by Site in the search center.
  • No Visual Best Bets. I have heard Microsoft presenters get all exaggerated about the Visual Best Bet feature available with FAST search. But it is really nothing special – just a Best Bet with an image! Seriously, this feature should also be available with the built-in search engine. In other words, the Best Bets feature does not seem to have received any improvements from SharePoint 2007 whatsoever.
  • Changes to managed properties still require a full crawl. Managed metadata properties are still there and managed exactly the same way as in SharePoint 2007. This is also good - but there is still one major annoyance that I had hoped MS would find a solution for. Adding a new managed property or changing an existing one, requires a new full crawl. This should not be necessary as the metadata is already indexed and searchable via a crawled property.
  • No push based indexing. It is not possible to notify the search index about immediate/important changes to content. The index will still have to wait for the crawler to stop by and pick up the changes for the index.
  • Incomplete indexing of system metadata. The indexer does not pick up all system metadata on documents. Forget about finding crawled properties for document information like CheckOutStatus, CheckedOutBy and CheckedOutDate. Then there is the ContentTypeId, which is indexed – but it seems to only happen for Office documents in the new 2007 format. A properly indexed ContentTypeId would make hierarchical searching on content types possible.
  • No document preview with hit-highlighting. This feature is unfortunately only available for FAST Search and here it does not even look very convincing except for PowerPoint documents.

Fortunately the list of bad news is shorter than it was for SharePoint Server 2007 back then. But let us see if we identify more good/bad news as we get to learn SharePoint 2010 Search better. I would love to hear from you out there – have you found other good or bad news on the topic?

SharePoint Search , ,

#SPC09 Day 3 – SharePoint 2010 is BIG

22. October 2009

We are now at the end of day 3 at the SharePoint 2009 conference at Mandalay Bay in Las Vegas. Tomorrow is the last day of the conference, which adjourns shortly after noon. After three days of sessions loaded with details on SharePoint 2010 it has already become clear that this release is a major leap forward from SharePoint 2007. Microsoft has managed to improve the product across the board, improving existing features as well as adding many new interesting features. SharePoint 2007 already looks old now! The new version looks like a much more complete and robust platform with no obvious shortcomings.

Specifically, SharePoint 2010 delivers significant improvements to the following areas of the product:

  • Scalability and Reliability
  • Web User Interface
  • Web Content Management
  • Document Management
  • Records Management
  • Workflow
  • Offline Content
  • Metadata, Taxonomies and Folksomonies
  • Enterprise Search
  • Business Intelligence
  • Usage Analytics
  • Business Data Catalog now known as Business Connectivity Services (BCS).
  • My Sites, Wikis and Blogs
  • Backup/Restore
  • Hosting
  • Upgrade tools
  • Developer tools
  • APIs and Web Services
  • Administration experience
  • PowerShell Scripting

Follow the blogs posts listed on Planet SharePoint to learn what others are saying about all the new stuff.

Today, I attended more sessions on Enterprise Search and learned a few new things since the many search sessions yesterday. The most interesting search session today was the session on the new Connector Framework in the SharePoint Server 2010 search engine. This new framework makes it easier to connect the search engine crawler to backend Line-Of-Business systems. Easier because MOSS 2007 required developers to write complex protocol handlers. In SP2010 it is just a matter of firing up SharePoint Designer and configuring an external content type, which is also a new concept. The connection in turn works through the improved BCS Framework. The connector framework also supports crawling with custom .NET code. With the new frameworks and much better tools – it is for instance a breeze to index data residing in an external SQL table. It is done without writing any code! On a side note, the BCS framework can also surface the data as it was a SharePoint list with full add, edit and delete capabilities.

The built-in connectors for indexing SharePoint, Web sites and File shares are still implemented as a regular protocol handler. Interestingly, the SharePoint connector has received significant improvements in terms of performance and stability. The Microsoft presenter (Sid shah) mentioned an example of a large installation where MOSS 2007 needed three weeks to complete a full crawl. Upgrading to SP2010 Search took this down to 4 days.

SharePoint 2010

#SPC09 Day 2 – An Intense Day on Enterprise Search

21. October 2009

Today was a big search day at the SharePoint 2009 conference in Las Vegas. Five sessions packed with information on new and improved features in SharePoint 2010 Enterprise Search. I attended all the sessions but one and have learned a great deal on the new search experience since my last posting on Sunday.

My general impression on search from the sessions is very good – it seems like Microsoft has really done a good job listening to customers, partners and the community and fixed some major pain points identified with SharePoint 2007 Search. On top of that we are also going to see some new and very useful features. But before I start enumerating any features it is helpful to know that Microsoft will offer three Enterprise search products:

  • Search Server 2010 Express. Free Enterprise Search engine with certain restrictions. Don’t know the details yet but my guess is the limitation will be one server deployment only and a few missing features like Business Connectivity Services (BCS) and People Search.
  • SharePoint Server 2010. Ships with the full version of the Enterprise Search engine including BCS and People Search. Suitable for customers without extreme scalability and extensibility requirements. Scales to approx. 100 million documents.
  • FAST Search Server 2010 for SharePoint. For customers with extreme scalability, functionality and extensibility requirements. Scales to billions of documents and thousands of concurrent queries.

The following list of new and improved features applies to the first two products:

End-User Improvements

  • Improved relevance. More parameters included in score calculation. An important new parameter is click-through rate on search results also known as popularity ranking, i.e. popular results tend to bubble upwards. Also support for customizing the relevance algorithm.
  • Enhanced query syntax. Support for Boolean operators like AND, OR, NOT. Also support for the range operators <, >, <=, >= for search numeric ranges or date ranges.
  • Wildcard search. Now possible to search for partial words using the wildcard character *. E.g. Micro* author:bill*
  • Phonetic and nickname search. Useful in people search to match similar names with different spelling. E.g. queries for people named “Jacob” also returns people named “Jakob”.
  • Faceted search with support for refining search results on any managed property configured to support refinement. Offers a very intuitive way to filter results on metadata – will make advanced search less relevant to most users.
  • Query suggestions as you type. Mined from query log.
  • Improved did you mean. Support for more languages.
  • View in browser. Users can from the search results now view Microsoft Office documents directly in the Web browser without having the Office client installed. Requires the new Office Web Apps on the server.
  • Desktop search integration.

Administration Improvements

  • Improved admin dashboard.
  • Common admin experience across all three search products.
  • PowerShell. All administration features are scriptable via PowerShell.
  • Search Reporting. Extensible search analytics reporting.

Scalability and Reliability Improvements

  • New search engine architecture. Provides better better scale-out and scale-up options.
  • Multiple crawlers (formerly known as index servers) can now work in parallel building an index. Turns the role into a redundant role and enables greater crawl speeds. Microsoft presenter reported 175 docs/secs with four crawl servers.
  • Partial indexes. The full index is no longer required to reside on each query server. It can now be split over multiple query servers for greater query performance. Can also be mirrored for increased availability.

Extensibility Improvements

  • Open Web parts. All the search Web parts are now public and open for inheritance. Makes it possible to add more functionality with few lines of code.
  • Common UI framework across all search products including FAST.
  • Common APIs across all search products including FAST.
  • New connector framework for indexing LOB systems. Essentially a managed code replacement for protocol handlers. C/C++ protocol handlers are, however, still supported.

Other Improvements

  • Improved Wordbreaker with support for more languages.

I can not say much about the improvements to FAST Search Server 2010 for SharePoint other than it can do all that the standard SharePoint 2010 search engine can do plus a lot more.

SharePoint 2010

In Vegas and Ready for the SharePoint Conference 2009

19. October 2009

The time has come that me and many other SharePointers have anxiously been looking forward to; The SharePoint 2009 conference in Las Vegas where we will learn about all the new goodies SharePoint 2010 has to offer. I am here together with about 230 fellow Danes! The conference will altogether accommodate about 7000 attendees!! This is a very impressive figure that shows just how much momentum SharePoint has gained.

This morning (after a tasty but unhealthy breakfast buffet at the Luxor) I went to register myself for the conference and to get the SWAG. I have to admit that I was a little disappointed that it did not include a new cool SharePoint 2010 bag and a t-shirt with the new SharePoint logo on it. The best item in the SWAG is a 250 page book with technical details about all the improvements in SharePoint 2010. I have studied it and will in a moment share some of its content here.

There is a lot of new stuff in SharePoint 2010 – too much for me to cover it all here. Hence I will stick to my favorite topic here, namely SharePoint Search. My plan is to try and cover the news on Search starting today with the stuff I have just learned from the book. Then the next four days offers a god deal of sessions on SharePoint Search and I will publish technical details from these on a daily basis. The conference offers about 230 different sessions across all SP2010 topics where 13 of them are specific to Enterprise Search.

New and Improved Features in SharePoint Server 2010 Search

Here is the news I got so far from studying the book handed out to all conference attendees.

  • Improved user experience.
  • Improved relevance ranking.
  • Faceted search. The search user experience now includes this concept available in MOSS 2007 via the free faceted search Web part on CodePlex.
  • Improved people search with support for Wildcard search and phonetic name search. Faceted search is also available here.
  • Improved Search based on Social Behavior. Support for social tagging and rating of content, which will influence relevance. Another cool thing is that SharePoint tracks the Click-through rate on results to detect popular results. This information is in turn used to adapt and improve the relevance score from user behavior.
  • Re-architected Search Architecture. SP2010 includes a new Query architecture and a new Crawling architecture supporting greater redundancy and improvements to scaling up and out. The biggest change is the introduction of index partitions allowing a huge index to spread across multiple query servers. Hence, SP2010 now scales to about 100 million documents compared to 50 million in SharePoint 2007.
  • Improved development experience. Improved APIs and tools allowing developers to extend search and build applications on it.

This list is by no means the complete list of improvements to SP2010 search – we will for sure learn more during the next four days. Also, please note that the news above applies to the standard search engine that ships with SharePoint 2010 – they do not apply to the FAST Enterprise Search engine. The book does not say much FAST other then there will be a new product named FAST Search for SharePoint. It will provide a more conversational and visual user experience including document previews and a more advanced faceted search component capable of delivering deep refinements with exact counts,

Sessions that I Plan to Attend

  • Monday 19/10: Enterprise Search Overview
  • Tuesday 20/10: SharePoint Server 2010 Search: Capabilities Deep Dive, FAST Search for SharePoint: Capabilities Deep Dive, Deploying FAST Search for SharePoint, Search Relevance and Relevance Tuning
  • Wednesday 21/10: A Tour of Great Enterprise Search Applications, Overview of Content Acquisition for Search in SharePoint 2010, Solving Information Chaos: Advanced Content Processing with FAST Search for SharePoint, Social Search in SharePoint 2010
  • Thursday 22/10: Customizing Search in SharePoint: Building Great Sites with Search

SharePoint 2010 , ,