Skip to content

Data Silos are Killing Data Flow!

Situation Analysis

Data Silos are bad. Even worse, they are growing exponentially!

It’s no secret that Big Data is a conventional stool that consists of three legs: data volume, velocity, and variety. Unfortunately, data volume and velocity increasingly receive too much time, attention, and energy at the expense of variety

Today, we have a NoSQL (“Not Only SQL Relational Tables” Relational DBMS World View) vs SQL (“SQL Relational Tables Only” World View) raging across DBMS vendors in either camp. The aforementioned camps have a simple value proposition: I can process more data at lower costs, with regards to data volume and velocity challenges. Thus, if you want to get some dead-silence in this noisy realm, simply introduce the issue of heterogeneously shaped and disparately located data. Basically, how does one reference data across database management systems (SQL or NoSQL)? 

What is a Data Silo? 

Technology that’s inherently constructed with an internalized view of data, in relation to how its represented, accessed, and manipulated. Data Silo vectors include:

  • Myopic views of structured data representation e.g., the notion that “unstructured data” exists when that’s a subjective view held by those who don’t understand or care about loosely coupling data, data representation notations, and data serialization formats
  • Query languages tightly coupled too a specific overreaching view of data representation (e.g., SQL imposition in a world where nobody sees entity relationships in Tabular form, when thinking)
  • Use of Literals as opposed to References (e.g., Hyperlinks) for identifying (naming) entities.

What’s the problem with Data Silos?

They impede the pursuit of data-driven agility. Ironically, we tout (with fervor) the imminence of a data-driven Internet of Things, a Web of Things, Big Data, and the like, while completely overlooking the inevitable impact of heterogeneously shaped data on this fine-grained mesh of machines, data, and people. Data Silos are also extremely expensive, in every sense of the word.

How do we address the Data Silo problem?

Simply step back and look at the World Wide Web (Web) abstraction over the Internet. Basically, if we were to go back 26 years (prior to its emergence), using today’s dominant thinking about data matters, the meme of the day would be “Big Documents” and a race amongst vendors to provide the fastest “Big Documents” processing system. The fact that document content format varies wouldn’t matter since vendors would simply pursue the misguided notion that the fastest document management system wins and all alternatives die — en route to a single document content format that serves all purposes.

Luckily for all of us, the Web emerged instead. It provided fundamental infrastructure, via sound architecture, for document content creation, sharing, and integration. It delivered this virtue without confining the world to a specific content format — courtesy of “content type negotiation” which is backed into its core.

What worked for what would have been “Big Documents” will also work for “Big Data” using the very same infrastructure of the Web — thanks to the underly dexterity of its core architectural components (URIs and HTTP).

Core Web Architecture: Hyperlinks (URIs + HTTP)

If we can identify documents using hyperlinks, we can do the same for other entity types (people, places, music,  and other things that make up our experiential existence).  Likewise, if we can use hyperlinks to signal the fact that one document is related to another, we can apply the very same approach to identifying  how a variety of entities are related, and the even describe the very nature of different entity relationship types.

The Web fundamentally demonstrates the power of Data as the new Electricity conducted via hyperlinks. Thus, in this noisy world of DBMS technology (SQL or NoSQL) and its “Big Data” meme, we must pay attention to the role hyperlinks should be playing in regards to data representation. For instance, to what degree (if any) are hyperlinks used to identify entity relationship components (i.e., an entity, its attribute names, and associated attribute values) or the subject, predicate, and objects aspects of a sentence (re., parts of speech)? Ignoring this fundamental step is a recipe for data silo explosion, and that’s exactly what’s happening today.

Data De-Silo-Fication Example

Be it within the confines of an enterprise intranet or the public echelons of the World Wide Web. The data silo induced data-flow-inertia problems remain the same i.e., we need to increase data flow across data silos, using methods that go beyond Tables, Forms, and Graphics (pretty silos). Basically, we need to add hyperlink enhanced sentences to the mix, using an approach I call nanotation. 

Nanotation is simply about the ability to create controlled natural language sentences in any medium that accepts plain text. That’s it. In my specific case, I prefer to use Turtle Notation due to its closeness to controlled english and the visibility it brings to relationship type semantics. It also doesn’t hurt that one of its creators also invented what we know as the Web. 

Here’s a simple Nanotation example that basically creates a webby structured data island right within this post.

{

  <> a schema:BlogPosting .

  <> rdfs:label “Data Silos are Killing Data Flow!” .

  <> rdfs:comment “””

                               Simple sentences that systematically encode

                               information [data in some context] in reusable form.

                              “”” .

}

In the example above, <> identifies this post using a relative HTTP URI which surmounts the fact that an actual document location on the Web doesn’t exist for my content until I save and publish this post. Anyway, once published, I will use the comments section associated with this post to showcase the effects of hyperlink enhanced data representation. 

Here are some additional links that showcase the effect of nanotation-style digital sentences or statements as an effective vehicle for alleviating current and future challenges posed by data silos:

Conclusion

Database management system performance and scalability are not the most important aspects of the Big Data meme. They are simply an aspect of said meme. Data variety, privacy, and security are also extremely important issues that cannot be ignored, during the process of product designed, development, acquisition, and deployment.

The issue of data de-silo-fication shouldn’t be the topic used to invoke silence in a noisy space. It should actually be the issue around which the most noise swirls 🙂

Related Links

// <![CDATA[

a schema:WebPage ;
schema:about ;
is schema:about of .
// ]]>

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: