We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Cloud Computing in the Intelligence Community

Originally published April 1, 2013

In this article, I will not bore you with Gartner’s definition of cloud, or NIST’s, or DISA’s – not that those aren’t thoughtfully crafted. It’s just that they only make sense if you share Gartner, NIST, or DISA’s repertoire of definitions, meanings, and analogies, which few actual human beings do. Besides, we’re past that now. We all know what cloud is. And yet, as you sit in meetings with leaders, managers, vendors, and developers, the perception of what cloud computing is, and brings to the Intelligence Community, is all over the map. It seems there’s a kind of schizophrenia to this cloud business – particularly in the Intelligence Community. And that’s because there are two very different kinds of clouds being talked about in this domain. The utility computing cloud and the compute and storage cloud.

The Utility Cloud

The utility computing cloud is typically represented as a layered stack, with hardware at the bottom, then infrastructure, platform, and software at the top. The idea is that some provider offers hardware, infrastructure, platform, or software capacity to consumers over the network. To me as a consumer, the value proposition is one of outsourcing. The burden of maintenance is gone. I pay only for what I use. The capacity is always there to meet my changing demands. I can rely on skilled professionals to handle a variety of issues that are outside my core domain – better that they handle it than me. I don’t care how it works, so long as it does. I’ve got other problems to solve.

As a provider, the value proposition is all about efficient management of a pool of resources which is achieved through multi-tenancy and elasticity. Server virtualization plays an important role here. It’s like operating a bank. With enough customers, I can manage total cash on hand in a way that looks to any one customer like all his money (and more) is readily available. What possible downsides could there be to this? Well, I’ve never been a utility provider, so I have no special insight, but I’m thinking the Long Island Power Authority after hurricane Sandy makes a pretty good analogy.

On the consumer side, there are a few things to keep in mind. First, be very careful about what you outsource. You really better be sure about what is, and is not, in your core domain. At Mission Focus (MF) we have a saying that goes, “What you don’t have in your own hands, you don’t have.” If you outsource something that is part of your core domain, it will bite you – hard.

The second point to make on the consumer side is that, as always, the consumer must be smart. No, you won’t have to rack and stack the servers yourself, configure the network, install the OS, RAID the drives, setup NTP, or do any of the hands-on work that the provider takes over. But you must know how to specify it. In particular, whatever layer of the utility computing stack you’re operating on you must be able to specify that layer, and probably the one below it in fantastic detail. By the way, this is always the hardest part, and you’ll have to work it out with a remote provider. You’re still going to need your IT guy.

A final point worth making is, alas, data security. The issue isn’t so much who can do a better job of it – you or your provider. It’s that when bad things happen, how are they going to be handled? Will you be heading into the data-center or into court?

A couple years ago, at the Government session of the National Association of Broadcasters show in Las Vegas, people were mostly talking about this kind of cloud - the “service cloud.” It seemed that any vendor offering some kind of service over the network (the favorite being transcoding) was suddenly in the cloud business. This perspective is what leads some people to say that cloud is no different from SOA. This is an over-simplification that a consumer might make, but not a cloud service provider. It is the case however, that cloud, just like SOA, does not absolve you from understanding the problem you’re trying to solve and having a well-defined architecture for doing it.

That said, the value proposition of the utility computing cloud applies to the Intelligence Community as for any other large organization. I would argue however, that the Intelligence Community should be not just a utility computing consumer, but also its provider – that both are within the Intelligence Community’s core domain.

The Compute & Storage Cloud

The other kind of cloud often talked about in the Intelligence Community is what I call the compute and storage cloud. It is not undifferentiated capacity for rent, it is a specific capability formed of a cluster of commodity servers, each with direct-attached storage, memory, CPU, and contributing bandwidth. The software running on that cluster turns all those spinning disks into a distributed data-store, and all the CPUs into a massively parallel computational engine with tremendous aggregate bandwidth.

The use of commodity servers is significant. Commodity does not mean junk. It means the servers with the best performance-price ratio, not necessarily the best reliability. But that’s okay because the software infrastructure is designed to accommodate occasional server failures as normal. And that’s a good thing, since the population of servers within a given cluster doesn’t have to be homogeneous, as servers naturally die, you will replace them with today’s commodity servers which are Moore’s law better than yesterday’s. The same fault-tolerance mechanisms mean that growing the cluster is also easy. So hardware refresh and cluster growth are built-in, continuously ongoing processes. The key insight is that by marrying economy with technology, we are able to achieve tremendous scale in data storage and performance in data processing. Despite what up-selling salesmen say, achieving serious scale in the real world necessitates using commodity components.

The scale of data we’re talking about is in the many petabytes. Facebook has a 100 PB Hadoop cluster and is pushing beyond that. At this scale, to search or process data (i.e., to make it useful) we have to change our thinking. First, the data are simply too big to move. So instead of moving the data to the processing, we move processing to the data. That’s the idea behind Hadoop map-reduce. Second, since it is impossible for a distributed computer system to simultaneously provide consistency (all users get the same query results) availability (all operations eventually succeed) partition tolerance (it still works if the network between nodes drops messages), instead of the perfect atomicity, consistency, isolation and durability that relational databases give us, we compromise. That’s what HDFS and NOSQL data-stores do. How that compromise is made is largely what distinguishes the various NOSQL technologies.

Often the compromise is around consistency - giving up perfect for eventual consistency. Twitter is an example of this. When you send a tweet, to you it looks like that tweet got posted (more or less) right away even as your followers may not see it for many more seconds after that. If this sort of compromise bothers you, understand that in an Ultra-Large Scale System data is not a pure crystalline form. It’s a noisy, entropic, ambiguous mass that makes its own gravity and weather. You’re not even going to notice the difference between perfect and eventual consistency.

So what could possibly be the down-side to the compute and storage cloud? Well, I’ve heard people claim some pretty serious shortcomings, so let me go through a few of them. One is that the compute and storage cloud is only good for batch processing, not interactive transactions or real-time processing. Well, clearly we’re not going to use map-reduce in the cockpit of a fighter jet. The architectural quality attributes of a real-time system are fundamentally different than those of a compute and storage cloud. So that’s true. But a compute and storage cloud can serve perfectly well as the back-end for a reasonably fast interactive user-facing system. Like Facebook for example.

Now I don’t go to many conferences, but I’ve been to several where I’ve heard speakers claim that Hadoop (meaning the compute and storage cloud) will not work for video processing. While it may not be immediately obvious how to apply the map-reduce framework to parallel process a single video, it is straightforward to use map-reduce to simultaneously process 1000 videos in a 1000 node cluster. In other words, even if we can’t parallelize below the file level, we can always parallelize at the file level. That said, there are many kinds of processing that may be applied to video below the file level. We have packets, frames, channels on which to apply enhancement, mining, extraction, even tracking. It just takes a little imagination. In our own work for the National Geospatial Intelligence Agency (NGA), we apply map-reduce processing to video to do a variety of things like exploiting the meta-data stream, extracting security markings, extracting the collection footprint, and so on.

Another claim I’ve heard is that the compute and storage is not good for geospatial data. There’s no issue storing geospatial data itself, so this criticism is aimed mainly at geospatial indexing. While you’re not likely to get geospatial indexing capability out of the box from your NoSQL implementation, it is certainly possible to do. In fact, pretty much all the compute and storage implementations around the Intelligence Community today have done it to some level. The geospatial indexing scheme that we implemented for NGA’s cloud is particularly good. It supports arbitrary geospatial queries over a balanced distributed index in an incredibly efficient way, using the power of the entire cluster.

Put simply, the value proposition of the compute and storage cloud to the Intelligence Community is big data. It gives us the infrastructure we need to productively manage and use big data. And by big data, I don’t just mean a lot of data of one kind (like WAMI, wide area motion imagery), or a lot of data of a few kinds (like SIGINT, signals intelligence), I mean both tremendous volume and tremendous diversity of data. I mean all of it – everything.

A Research Instrument for Big Data

The question as to why big data such a big deal to the Intelligence Community is actually a fair one that mostly no one asks. And that’s because, we instinctively hold this idea that big data gives us the broadest possible foundation on which to conduct the analysis of information for the purpose of answering questions and developing understanding. Still, some people say cloud does not solve the “data problem.” It just makes it bigger. This is true in the sense that if the thinking fails to evolve with the technology no real progress can be made. Data at scale is not just more data, it is profoundly different. Just as a baseball is profoundly different from the individual atoms from which it is formed. If you insist on applying quantum thinking to the classical domain, you’re just never going to understand projectile motion, much less the game of baseball.

From this perspective, the value of the compute and storage cloud to the Intelligence Community goes well beyond just providing a compute and storage infrastructure, it gives us the necessary foundation for building a research instrument for big data. A research instrument that will not only allow us to explore this new domain, but also allow us to learn new ways of thinking about it. I believe there’s new science here. That big data, as I have defined it, is an entirely new substance – not animal, vegetable, mineral, not quarks and gluons, not pure energy, not simply information, something else. A flattening of humanity, a manifestation of humanness. Something of us. Deep and mysterious.

In order to develop that science and begin to understand that substance, the next step after collecting and persisting it, is to figure out how to represent and organize it. The choices we make determine the kinds of questions we can explore. It is these choices that most clearly distinguish the various big data efforts around the Intelligence Community – whether the objective is a tool aimed at solving a particular problem or a research instrument aimed at exploring this new domain. Obviously the Intelligence Community is not the National Science Foundation, so it is naturally the case that all the big data efforts in the Intelligence Community are mission focused. As are we. But the Institute for Modern Intelligence (IMI) side of our personality aims for much, much more. The IMI is building a research instrument.

That’s not to say that IMI and MF are doing two different things. We’re not. We’re doing one thing: Building a semiotic compute and storage cloud for NGA that solves real problems for NGA today, and also serves as a research instrument for NGA and Intelligence Community beyond. Using compute and storage cloud technology underneath, we represent big data in a manner that lifts its meaning to the highest possible level, applying wave upon wave of processing to continually lift that level higher, cultivating disparate data into a coherent web of information. It’s a new way of representing data, information, and knowledge that has deep roots in semiotics. We believe this work will not only make NGA the most productive and efficient organization in the Intelligence Community, it will open a door to new science.

It is difficult to overstate the value of cloud computing to the Intelligence Community. The two kinds of clouds, the utility computing cloud and the compute and storage cloud, are effecting a profound transformation of the Intelligence Community’s data infrastructure that is cracking the “data problem” by unifying our data and processing assets and operationalizing them in new ways. These technologies have brought us to a threshold beyond which new science waits to be discovered. The era of Modern Intelligence (the science, practice, and governance of Intelligence at Ultra-Large Scale) has begun.

About Mission Focus & The Institute for Modern Intelligence

Mission Focus is an agile development shop that takes domain design and development as seriously as system design and development. We work mostly in the Intelligence arena with DoD and IC customers in close partnership with the Institute for Modern Intelligence. Our core domain is the storage, processing, and utilization of data in the context of immense scale and diversity. We are experts in cloud compute and storage technology and invented the Sign Representation Framework which underpins a game-changing approach to data unification. We pride ourselves on our disciplined engineering practices and distinguish ourselves by our ability to continually learn and innovate. The work we do is meaningful and intentional and is wrapped with our integrity. We are driven to think harder and work better than the rest because we believe the code we are writing will change the world.

The Institute for Modern Intelligence is a 501(c)3 non-profit organization dedicated to developing the science, practice, and governance of Intelligence at Ultra-Large Scale.

Copyright © 2013 Institute for Modern Intelligence.


  • Mark Eick
    Mark Andrew Eick is Founder and Chief Executive at Mission Focus.  He is a technologist dedicated to crafting scientific innovation into practical solutions for the Intelligence Community (IC). With degrees in computer science and philosophy, he started his career in 1995 at Ford Motor Company developing systems to manage the development and printing of service manuals. The system he developed fundamentally changed the production process, improving efficiency and quality and earning him multiple technology awards. In 2003 Mr. Eick became CTO at SSS Research, a DOD contracting startup focused on bringing new visualization techniques to thin client geospatial applications. His innovations optimizing the processing and transmission of geospatial data received multiple software patent awards and were deployed in an operational system which directly contributed to numerous successful missions. In 2008, Mr. Eick started his own company, Mission Focus, focusing on the “Big Data” problems inherent in the IC domain.  As CEO he now orchestrates technology and operations at a much larger scale, employing cloud technologies to address the storage and processing of diverse information at tremendous scale. In addition, in 2009 Mr. Eick also co-founded the Institute for Modern Intelligence, a 501(c)(3) non-profit corporation whose mission is to develop the science, practice, and governance of Intelligence at Ultra-Large scale.


  • Suzanne Yoakum-Stover, Ph.D.
    Suzanne Yoakum-Stover is a data scientist dedicated to developing Modern Intelligence - the science, practice, and governance of intelligence at Ultra-Large Scale (ULS). Since earning her Ph.D. in physics from Stony Brook University, she has contributed, both as a research scientist and as an educator, to a range of technical fields including AMO physics, artificial intelligence, medical imaging, computer graphics and simulation, statistical and natural language processing, data management, and knowledge representation. A problem solver driven by a thirst for insight, the thread that unifies all of her work is an enduring penchant for software engineering.

    Dr. Yoakum-Stover currently serves as the Executive Director of the Institute for Modern Intelligence (IMI) and as Chief Scientist at Mission Focus where she leads a team developing a practical, ULS systems solution for data storage, exploration, cultivation, and exploitation that can accommodate the diversity of current and future intelligence data, semantics, and perspectives in one unified Data-Space without information loss or distortion. Through her innovation, which provides a unified interface to all data, she strives to enable a diverse ecosystem of processing to be put into production together over the Data-Space and upon that foundation, execute a broad ULS systems research agenda with specific application to intelligence. She, the IMI, and the team at Mission Focus aspire to transform the “data problem” into a sublime intelligence asset for the defense of our nation and thereby give Modern Intelligence its birthplace.



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!