A collaboration on distributed data models and their implementations

I am currently working with Jared Rosoff and Bob Chatham at Wave Syndicate. We are building a tool that needs to operate at internet scale and so I have been thinking alot about distributed architecture and implementation. Constructing a tool for this scale of use is new to me. I had some experience with this while working at Tazz Networks but I am still very much a novice.

On the drive home today I was thinking about
1. data models and using the emerging distributed data tools like Amazon’s SimpleDB and Google’s BigTable and its imitators,
2. the contending variables at play when designing a C++ data model for use with ObjectStore,
3. David Hay’s great data modeling book Data Model Patterns: Conventions of Thought, and
4. a comment made long ago in a forgotten blog posting by a PHP programmer who needed good data models and not data user interfaces.

It would be extremely useful if the data modeling community would create data models and implementations that are factored, designed and tested for each of the distributed data tools. For example, a commentary data model (for use with blogs, wikis, discussion groups, issue trackers, etc) using SimpleDB would be quite different from one using SQL, and another using BigTable. Any of the models might need more than one schema implementation per distributed data tool as each implementation differently trades off consistency, latency, concurrency, etc. I wish I had the skills now to lead something like this. It would be a great boon to both current application developments and, perhaps more so, for developer education. If you know of anything happening along these lines please drop me an email.

1 comment:

Karen Lopez - Data Architect said...

Interesting blog entry.

Data Architects do have the tools and processes to produce such a wide variety of data models. That's why we make a distinction between conceptual, logical, and physical data models.

It is often the case, though, that architects are asked to create only physical models, which is why most data models appear to be focused only on SQL physical implementations.

If we had the resources to complete conceptual, logical, and physical models, then we could establish business requirements separately from physical implementations.

I recommend the three volume universal data model (pattern) books by Len Silverston, BTW.