This post originated from an RSS feed registered with Ruby Buzz
by .
Original Post: Its the data stupid!
Feed Title: cfis
Feed URL: http://cfis.savagexi.com/articles.rss
Feed Description: Charlie's Blog
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by
Latest Posts From cfis
Advertisement
Let's say you've been tasked with integrating several applications in your organization.
A quick Google later and you're overwhelmed with different opinions about
the right technology to use. What programming language, what platform,
what messaging infrastructure, what database (or no database), what hardware
- the list goes on and on. And as quickly, you'll find plenty of war
stories about unsupportive management, political difficulties
of getting different parts of an organization to work together, turmoil caused
by reorganizations, difficulties with outsourcing - and on and
on.
Yet try to find information about how to model your problem domain. This does
not mean what is the best UML tool, or why XML is superior to XYZ. It means
what information do your systems capture about the real world and what hidden
assumptions do they use to manipulate that information.
The reason you won't find much information is that the problem is extraordinarily
hard. Computer are digital - they divide the world into sharp distinctions that
don't exist. The real world is analog. For example, can you tell me the
difference between a stream, brook, run, creek, river and water course? Of course
you can't - they all blend into each other. Words in a language are fuzzy, ambiguous
representations of things in the world (or maybe not, do dragons exist?). They
mean whatever a group of people have decided they mean.
Your definition of a creek is undoubtedly different than mine.
One of the wonders of human intelligence is that we are
able to sort through all this fuzziness and can usually communicate with each
other. Computers are not so fortunate. Trying to share information between different
applications is fraught with error.
Let's take an example from the book Data
and Reality, which, as I've written before,
is by far the best book I've read on the subject of data modeling (go buy
a copy now before it goes out of print again!). Let's say you want to share
employee data between two different applications. Sounds easy, doesn't it?
But then let's start asking some questions:
Do employees include contractors?
Do employees include part-time workers?
Do employees include retired workers?
Do employees include workers on leave?
Do employees include workers serving in the military?
Do employees include workers who have just signed a contract but have not
show up to work yet?
And on and on. The answers will be different depending on what department of
the organization you ask. An employee on leave may exist according to the benefits
department but not the payroll department. Or how about a couple working for
the same company - is the husband's wife a dependent or an employee (and of course
vice versa).
No matter what you do, you won't get this right. Every application includes
hidden assumption on what its data means and how it is processed. Those assumption
inevitably vary between different applications.
In the next post we'll look at a real world example.