Tuesday, July 19, 2011
Daniel Antion has an interesting and well thought out article called “Can Records Change” at the Association for Information and Image Management. His question details what we do about changes in data about a document, or metadata. I’m thrilled about him bringing up this topic; because it’s one I’m passionate about. Let’s think about some reasons this information changing and maybe we can shed some light on his question:
- The underlying document changed. This is probably one of the most common reasons for metadata changing; people make changes to documents all the time. The contents may have been modified; the subject could have been modified; authors added; review information changed and so on.
- Linked information changed. This is less common; and many document management systems don’t handle it correctly or at all. Consider a situation where we link to a Person record on our line of business system. We may store some of the fields from that record in the document management system; such as Surname or City; things that may make it easier to find the document down the line. So; we capture an Application form for a “Ms. Jones”, but 6 months later we find out that she’s got married and hew new name is “Mrs. Smith”. Do we leave the original record data as it is? Curse ourselves for storing Line of Business data in our DM system? Change the data; accepting that a search for “Ms. Jones” now won’t find a document that plainly says “Ms. Jones” on it?
- Information captured incorrectly. Depressingly common; we obviously want the correct information. However, our auditors and lawyers will possibly also want the original metadata; especially if processing or business decisions were made using that information.
- Extra information added. Our processing workflow might well add metadata to the document; storing information about the processing steps undertaken; approvals gained; signatures affixed and so on. This doesn’t change the original document or metadata but must be accessible as well.
- Our metadata schema changes. This is also depressingly common, where we change what fields can/must be captured against a document type. Much as we all like to think we can plan perfectly, and much as our clients love to believe they understand their requirements full; the truth is different. Think about a scenario where we’ve been in operation for 3 months when the client comes in and tells us that they need a “Category” field added to the document type. Great; we can add it, but what about the existing documents that don’t have it? Does this mean that we have to add it as an optional field? In too many systems the answer is yes. Now, a couple months later they change their mind. “Get rid of it”, the client commands. What happens to the documents captured with the data? If we restored the field sometime in the future would their data have been lost? Again, too many systems have “yes” as to the answer to that question.
Okay, so now we’ve had a look at some of the reasons that the document can change, we can see some requirements coming out. Our hypothetical metadata system must keep a version history; and must keep it in such a way that previous versions data is still accessible in searches. Needless to say audit information about who, what, when, why must be stored against each metadata change. The system must be flexible to schema changes, allowing fields to be added later - even if mandatory, as well as allowing them to be removed and even restored.
Additionally when we keep a version history, we must also consider whether we want a bitemporal system; a system which not only stored what did happen; but also what should have happened, e.g. we only updated “Ms. Jones” to “Mrs. Smith” yesterday; but she sent us the documentation 2 months ago and we should have done it then. A bitemporal system caters for such a situation; allowing you to see both the “Operational Truth” of how events actually occurred and the “Business Truth” of how events were supposed to happen.
As you can see, what seems like a simple topic of changing information becomes complicated very quickly. It’s important that your document management system handle these complexities in an intuitive manner. Almost every system I’ve ever seen falls over when it comes to metadata. The most usual reason is that most systems are designed around their underlying database; and that database doesn’t handle one or more of the scenarios I’ve outlined above. For example, a relational database like SQL Server can’t cater for schema changes correctly without a great deal of work that frankly isn’t worth the effort. Other systems use a more hierarchical store which handles the schema changes nicely, but struggles with efficient bitemporal access and most importantly tend to have rotten performance.
Do you know of other systems that can efficiently handle all of the above reasons for metadata changing? What about scenarios I’ve left out?
Want to change your metadata reliably, accurately and quickly? Signate 2010 handles all of the above scenarios well due to it’s unique and innovative design.
Wednesday, July 06, 2011
In AIIM, Laurence Hart makes a number of comments about the Search vs Folders debate.
Claim 1: People are used to folders, Rebuttal: They are used to search as well
Not a good enough reason to stick with folders. By that logic we shouldn’t use search engines to access the Web, we should organise it into folders instead, just like Yahoo used to do. You see, people are also used to search engines, they use them every day of the week. In fact, more and more people are using search based idioms rather than folder based idioms to access their systems. Look at the search box built into the Windows 7 menu, and every Windows Explorer window.
Claim 2: Search Engines fail, Rebuttal: So does everything else
He claims that we should have folders as a fallback position in case the search engine doesn’t work. Well, if your technology is based around an unreliable bolt-on search engine (looks meaningfully at SharePoint), then yes, this is a valid concern. If your entire system is designed around search, then the search engine is the core and any folder-based view would be the bolt-on, and thus would be the one more likely to fail. All systems can fail from time to time, but that is not a good reason to not use an entire class of technology.
“Cars sometimes break down, so we should all use horses.”
This is a classic example of the logical fallacy of the excluded middle.
Claim 3: Folders help you organise, Rebuttal: Why manually organise?
I’m actually not sure what point he is trying to make here exactly. He goes into taxonomies, and how folders help users create a “well-executed taxonomy”, and how creating a taxonomy without folders sacrifices performance and simplicity. Not one person I’ve ever spoken to about their requirements from a document management system has ever mentioned the word taxonomy. Not one. Ever.
I will not deny that a proper taxonomy is easier to do with folders than without. I will even admit that should a system have a bolt-on taxonomy system this will likely be less performant and simple than a system designed around taxonomies. I deny the need for taxonomies at all. Search and metadata is all that is required to search billions of documents, and requires zero extra effort.
He then admits that these taxonomies change, and systems must be put in place to manage these transitions. I’ve never once had to redesign search, it’s search for goodness sake; and if your data changes, just reindex it! Want to add a metadata field? Reindex. No manual effort, let the computer do it for you.
4. Not using folders cripples systems, Rebuttal: Only if the developers were idiots
This claim boggles my mind. Let me quote: “One of the problems that you get when you don’t use folders is that you can cripple most systems. While few systems claim a limit to the number of documents that can reside in one location, there is a practical limit”. I’m pretty sure that what he’s talking about here is the well-known reality that operating systems struggle when directories file up with more than a few thousand files.
He seems to be conflating the experience of the system from the outside (i.e. the users sees no folders), with the implementation details of the inside (i.e. does that mean the system stores every document in one huge directory). This is utter rubbish. Signate as an example creates an internal directory structure which documents are routed to in a balanced fashion, ensuring that no directory winds up with too many documents. This structure is internal to the system and is a performance and management implementation detail. It is not exposed outside the system at all.
In fact this argument of his is an excellent example of why folder based systems don’t work as well as search based ones. While Signate automatically balances files across a directory structure designed to allow billions of documents per node, no such balancing can be applied when humans are involved. Every folder-based system I’ve ever seen winds up with a “dump” location, sometimes more than one, where documents which don’t fit the taxonomy neatly are placed. This can swiftly grow to thousands of documents, resulting in the very problem that Laurence claims search based systems suffer from. Sure, if the taxonomy was perfect, this would not arise; and this is also a sign that the taxonomy may need to change, resulting in a great deal of manual work. In a search-based system with balancing, this situation never arises. This is not just a punt for Signate; I’ve never seen a search-based document storage without balancing, and I struggle to comprehend that anyone would ever conceive of designing such a system.
He then claims that “You can swear that nobody will ever browse to [the internal storage] location, but unless you remove that capability, someone will do it”. Well, of course we remove it! I consider it a massive security breach if people are able to access the internal document location of the system without passing through the interface to the system. Does he allow users to access his internal company databases directly? Of course not.
5. Search Engines can’t read your mind reliably, Rebuttal: nothing can
Neither can folders. Search engines help you find what you’re looking for; folders let you know where you’re looking. Which would you rather have? We don’t need perfect reliability; you can refine search terms based on the results we see. Too many results? Add search terms. Too few? Remove some. Make some more approximate, tighten up others. Signate allows an enormous range of searching options, including approximate search where words similar to the specified word are found.
Conclusion
Clearly, I’m biased. I’m so convinced of the value of search-based document management that we created one ourselves. Laurence is a specialist in Documentum, a prominent folder-based document management system. So, we’re both biased. But read his article, read mine, and then ask yourself which approach:
- Will get the benefits of document management into my users hands faster?
- Will result in the lowest ongoing administration whilst delivering excellent results?
- Will adapt to my changing business needs?
Folder-based systems are great for rigorously defining the information content your organisation needs; and if you’re working in a top-down company that has an IT department that can easily define a data dictionary for your entire business and enforce it’s consistent usage; then I’d strongly suggest you look at systems that support such an approach. If, however, you work in the remaining 99% of businesses where change is constant, time is precious, and flexibility and turnaround are more important than rigor; then look at systems that support that approach.
Thursday, February 17, 2011
The main means of accessing documents in document management systems is via folders. This makes sense because it’s what people are used to. Before they get a document management system they normally arrange their shared documents in a shared location, organized via folders. They’re intuitive, hierarchical and familiar; and thus people tend to look for document systems which are focused around folders as well. This makes the migration to the new system easier as well.
THIS IS A MISTAKE!
“Why are you switching to a document management system at all?”
If your file share is perfect for you why spend a lot of money on a system which is just going to replicate it? You’re trading a cheap, convenient, and reliable system for one that is much dearer, requires retraining and is all too often less reliable. If you need to share your documents in a folder structure across the Internet, use Dropbox, it’s a fantastic service and it’s very reasonable. If your company just needs a glorified (and expensive) folder share, I don’t want to con you out of your money and add zero value. With Signate, we want to add real value, and we don’t believe that is done by a web-based share.
Case Study in failure
At a large financial company where I consult on some electronic accounting issues, they had a massive shared drive where all documents were kept. They spent millions of rands implementing a company-wide Content Management System (CMS); money thrown at the software, hardware and numerous consultants involved. Most CMS systems make it easy to arrange your documents in folders, and so they reorganized the layout, designed it better and set it all up. All new projects were to use the new system it was decreed. They did, but only as a file storage medium. The more advanced features like wikis, monitoring, calendar, workflows and so on went almost unused. It was also decided that the existing file share was too large to migrate, so it coexists side by side with the CMS, and there is often confusion about where to find documents, and which version is the “current” one: the file share version or the CMS one. Net result: significant capital and operational spend, massively increased storage requirements (due to duplication), confusion, and little or no improvement. They have also been unable to get the CMS search working, which is a critical failure in my opinion (as we will see later).
Is this the fault of the CMS? Not at all. This particular CMS is a very powerful tool in the right hands. It’s configurability allows it to really shine when well-implemented. Unfortunately it is all too rarely implemented well, and this usually requires hordes of very expensive consultants.
Failure of Design

The underlying problem is that the CMS, along with all too many document management systems cater to people’s first instincts: the desire to keep things the same as they were. Let’s cast our minds back to 1998. The Web was a growing phenomenon, and the most popular portal was Yahoo!. Their web site was built around a directory, a folder structure exactly like that in your shared drive, except it consisted of links to web pages. People submitted their pages to Yahoo! and it would be placed in a category.
They had search, of a sort, but the focus was clearly on the directory structure; that was how you ensured that you found what you were looking for. You would browse through folders, hunting for the right category. Sometimes the categories were arranged somewhat haphazardly, so it could take a while to find the right one. However the task of maintaining this directory grew larger and larger, and the directory fell further and further behind.
I remember that my primary source of new pages started to be from friends’ emails rather than finding them in directories. All the while Google was making search their primary focus. We all know how that story played out; search became the dominant means of finding pages in the web. Why does search trump directories? For a few simple reasons:
- A directory imposes the directory organisers priorities on the consumer – If the organiser arranges things in a way that the consumer finds counter-intuitive it can be difficult or impossible for the consumer to find content that is present.
- A directory requires constant work to ensure relevance – Entries (or documents) can become stale or corrupted, newer locations may become popular causing duplication with work occurring in both locations.
- Search puts the consumers priorities first – You type what you’re looking for and the search engine finds it, what could be simpler than that? There is no organiser other than the content, so you don’t have to put up with odd filing hierarchies.
- Search ensures relevant content is found immediately – No hunting through folders and opening documents; the best matching results are returned first.
- Search allows for powerful search terms – You can use advanced features such as ranges for dates and numbers, exact matching, wildcards and so on very quickly and easily.
- Directories are categorised by perception, search by reality – When we decide to place a document under the “Technical Specifications” folder we’re doing so based upon our idea of what that document contains. Normally this would be done by the content author; so they’re generally pretty accurate, but there might be a better location or the categoriser may be mistaken in their assessment. Search categorises documents based on their content.
- Directories are static – Related to the above, documents change, and your system must cater for that. A directory structure tends not to change, even when it should. People are used to accessing a particular document in a particular place, and if you move the document they won’t find it at all. You’ll go from 100% accuracy to 0% in one swift go. Whereas with a search system, the document will move up and down in the search results for a particular set of search terms as it’s content changes.
- Directories take effort – You need policies and procedures and people who monitor them and control them. All of this is not productive work.
The Road Ahead
The future of document management lies in search. In my many years in the Document Management field, across industries as diverse as logistics, healthcare, insurance, financial, travel and many others I have seen finding documents again and again become the pain point for project after project. This is why we created Signate: as a response to the appalling inefficiencies of products spanning from the cheapest of the cheap to high-end enterprise servers. Signate puts search front and center, and whilst we are ahead of the game right now, I am under no illusions as to how long that advantage will last. Search is such a compelling feature that all document management systems will have to become search-centred or they’ll fail.
The question you have to ask yourself is where you need your company to be? Do you want an easy transition to document management but very little added value, or are you willing to learn a new way of finding your documents? It isn’t even that new if you’re used to searching the Web.
Search Quality
So now that I’ve made my case for search over folders in document management systems, let’s look at the quality of that search. Have a look at the screenshot to the left. It’s from another document management systems search screen and exemplifies pretty much everything I dislike about search in the document management space.
Each field you can search on is listed, each with it’s own box. Worse yet there are drop downs for “Contains”, “Exact Match” and so on. Whilst I hate the dropdowns for the date fields, at least they actually have a date range search as opposed to forcing you to pick one date at a time. But now, what if I knew that the document was created after a certain date, and the revision I was looking for was before another one. How would I enter that? Would I be able to leave the To date range empty for “creation date”, and the from date range empty for “revision date”. Possibly, it’s not clear. How would I search for where the author is either “Ray Bradbury” OR “Orson Scott Card”? I know it’s one, but am not sure which one. I’d probably have to do two searches.
Now consider the search screen on the right. This strangely enough is much clearer as to what you need to do, which is counter-intuitive if you think about it. You’d expect that the screen which spells everything out explicitly would be the easiest and most compelling, but it’s just not. An empty search box invites, a complex search screen repels. How would you search for both authors as above. Well, I’d type: “Ray Bradbury” OR “Orson Scott Card”
The normal reason document management companies use search forms like the left hand one is because their search form is a thin wrapper over their underlying database. This limits them, as databases are designed to be very specific, and cannot search across fields easily. Not only that, if they don’t get the search form exactly right, it is possible for a user to run updates and malicious scripts on the database. The database that’s storing all your document data. With Signate we use a completely separate search engine, which not only is designed to search and search well, but also cannot affect your underlying data. Oh yeah, and it’s fast. Blazingly fast. Much faster than a complex query run against a database. Plus it can easily scale up to billions of documents, which database-driven searches struggle with.
Conclusion
If you need a document management system, please, please, please choose one that puts search in the forefront. Ensure that before you buy you really kick the tires on the search system; that it’s quick and easy to use. It should not take your staff longer to find an internal document than to find a web page via Google. If it does, then you have a suboptimal system. A swift and powerful
document management system should pay you dividends across the board. You should have happier and more productive staff, faster processes, easy findability for your documents, and vastly improved turnaround times.
Use our Document Costs Calculator to work out the amount you’re probably wasting right now on your document costs, and thus the amount you can save every year. That financial company I discussed earlier? Our calculator shows that well designed and run document management system should be saving them between 12 and 70 million rands a year, and save between 180,000 and 490,000 staff hours annually. These figures are based on research, calculations and figures from Gartner, Cap Ventures and the Arbeidsgemeinschaft für wirtschaftliche Verwaltung.
Plug in your company’s figures and see the impact a good document management system could be having on your bottom line and client satisfaction. Plus it’s good the environment too.

Sean Hederman is Director of Palantir (Pty) Ltd, and Software Architect for the Signate Document Management System. He also writes the popular programming blog Codingsanity.
Wednesday, November 24, 2010
We've just added the "Send Email Activity" to Signate. This allows you to easily drop templated emails into your document process.

This powerful, and much asked for feature comes standard and free with all Signate editions. If you already have a Signate license, you will automatically get this feature on the next major upgrade as per your support agreement.
The ease of use of this activity is incredible. Drop it on to the Workflow Designer, set the properties, and edit the Body as indicated in the screenshot to the left.
Even dedicated workflow tools such as K2, have difficult procedures to accomplish this same task that is dealt with in seconds by even a novice in Signate.
Sunday, October 24, 2010
This is just a quick announcement of our new Approval Workflow functionality which will be released soon. The workflow designer now adds the Approval Workflow task:

As you can see from the above there are two legs, Approve and Reject, and document capture only happens in the Approve leg. You can add more legs and/or rename the "standard" legs. The approval activity has an Allocated To property, like Capture Document which specifies which group the Approval will be routed to. When an approval step is opened by a user, the screen below is displayed:

Each leg which was added in the design appears in the Approval selection box. Using the example above, clicking on Submit Decision would route this document to Capture Document for capturing of document metadata.
Wednesday, September 15, 2010
We are releasing a Service Pack 1 which contains some defect fixes, performance and stability upgrades, increased workflow management tools, as well as the following new features:
- Document Conversion - Convert an Acquired document to another document type, such as PDF.
- Upload in Signate Online - Signate Online will now come with support for adding and capturing a document, allowing capture from your Mac or from another continent.
- Versioning via Signate Online - Another feature to assist our Mac users, you will be able to view version history, check out and check in via Signate Online.
- Import AutoCatalog - When you import a document, Signate will now provide detailed information about where and when it was imported, as well as it's original creation date to the capturer to allow them to make more effective decisions.
- Email from Search - Email your search results from Signate Online to your clients, a great timesaver.
Note that all clients with Bronze or better subscriptions will get this update for free. Additionally, as per our standard policy, any purchases made after this announcement will get this Service Pack bundled for free, no matter what subscription.
Service Pack 1 is scheduled for release on September 20th for customers in our Beta Program, and we expect final release on the 19th October.
Looking for a Document Management System? Signate 2010 is powerful, secure and easy to use.
Monday, August 23, 2010
A question that is often not asked is why? Why did we decide to write a document management system? There are plenty of international products playing in this space, and plenty of local products too. Why go in to a market with heavy competition?
I personally have worked in document management and workflow for many years now, working with a broad spectrum of products and companies. What I've found is that the document management market is not so wrapped up as people would have you believe. There are some very, very popular products that are quite simply incapable of doing the job that is normally required.
These products are difficult to use, do not scale well, require armies of consultants and developers, and are inefficient and slow. They force you to focus on their way of doing things, instead of being flexible enough to fit properly into your business the way you want.
On the local side, quality, support and feature set are a problem. Products are usually just slapped together by using purchased toolsets from overseas, meaning that the local suppliers cannot effectively support the software. If your scanner doesn't work, they'll throw up their hands and log the query with the international vendor, who may take weeks or longer to resolve the issue.
I know users don't really care about software design, but it's a big point for me. Many of these supposed "document management systems" are basically just a wrapper over a file share, allowing the systems to be easily compromised. There is a focus on slapping in as many features in as quickly as possible in the hope that some will stick.
At Palantir we firmly believe that technology can be simple and easy to use, as well as being powerful. We saw way too little of that with document management, and felt very strongly that there is a need for that simplicity and power; a need that is not being met currently. That's why we wrote Signate.
In order to ensure that we had a deep understanding of the technologies, we created our own libraries and toolsets. As a bonus this freed us of the usual licensing constraints; which is why Signate client licenses are so flexible; most products require you to buy separate scanning licenses, we don't. It also allows us to more effectively control the user experience; ensuring that our vaunted ease of use is not compromised by technology choices.
So, the simple answer as to why we wrote a document management product?
Because we wanted to; because we felt we could do a better job; because we felt customers were not being well served by the choices in the market up until now; but mostly because it was fun.
Looking for a Document Management System? Signate 2010 is powerful, secure and easy to use.
Wednesday, July 21, 2010
Significant parts of Signate are Windows applications, such as scanning, high-speed capture and the workflow design and monitoring tools. Windows applications have some significant advantages over Web applications:
- Can take full advantage of the computers processing power
- Can access hardware such as scanners
- Can easily perform multiple tasks at once
- Can easily work on multiple tasks at once
However they come with two big disadvantages:
- They require installation
- They require configuration
For Signate we have done our best to address these disadvantages. For starters, the installation process for the Signate client is simple and quick (see the video). Since it is so simple, it is also easy to deploy automatically.
Secondly, we make use of Zero-Configuration. This means that the Signate client will automatically find the Signate server machine and set up communication with it, without your IT staff having to be involved. All policy is controlled by the Signate server, so you don't need to roll out changes to each client PC when the policy is modified.
If you change the location of the server, all the client PC's will automatically pick up the new location. Additionally, if you run with multiple Signate servers, if the nearest one should be unavailable, the client PC's will switch to the further one for the interim.
All this ensures that a Signate client installation gives you the advantages of a Windows application with the ease of administration of a Web application.
Tuesday, July 20, 2010
Many vendors focus on providing as many options when scanning as possible. Pages and pages of options must be perused by the scan operator before scanning can commence. This makes setting up the scanner tricky, usually requiring specialist knowledge. This is great if you're the vendor, not so great if you're the client.
With Signate SimpleScan we went the opposite direction, working hard to reduce the number of options required before you can kick off a scan. So, we slashed the options down to:
- Scanner - Which scanner to scan from
- Resolution - The image detail required
- Color Depth - Black & White, Grayscale, or Color
That's it. Those three are all we figure are needed to get the scan you want. Oh, and we default Resolution and Color Depth to values we find generally work best.
So, I hear you ask, what if you want to control the number of pages to scan? Well, that's easy, just put the pages you want to scan into the hopper. If you want to scan a single page, put it in the hopper on it's own.
Hmmm, okay, but what about duplex pages? Well, Signate defaults to duplex scanning if the scanner supports it. If it finds that the back of the first page is blank, it switches to single sided scanning. So, if you want to scan both sides, just make sure that the first page has content front and back.
This simplicity, and the high performance operation we squeeze out of scanning, is only possible because our scanning software is written by us. Most other vendors buy this software from other companies; forcing them to adapt to the way the software works. It also means that they struggle to support the software, not having the necessary skills. Finally, it forces them to license the scanning software separately, and at a very high cost.
Signate does not distinguish between "normal" licenses and scan licenses. This gives you a great deal of flexibility, allowing you to assign more people to scanning when heavy loads of documents arrive, without affecting your licensing costs, and without requiring the rigmarole of getting new licenses vetted through the scan software vendor.
Looking for a Document Management System? Signate 2010 is powerful, secure and easy to use.
Welcome to the official Signate blog!
We've created this blog to begin a conversation on the Signate product. We hope to get feedback from you, as well as explain our thinking on where we've been and where we're going.