We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Data Catalog Helps San Diego Become a Smart City

Originally published August 9, 2017

This article is an interview conducted by Ron Powell, industry analyst and expert with the BeyeNETWORK and the Business Analytics Collaborative, with Maksim Pecherskiy, the chief data officer in the performance and analytics department for the City of San Diego.  

In November of 2014, you were chartered with implementing the city’s new open data initiative. Can you tell us what that means for the city of San Diego?

Maksim Pecherskiy: When Mayor Faulconer was elected into office, he really wanted to double-down on transparency, efficiency and economic development. As they were evaluating and building the open data policy, they saw some of the benefits of open data that other cities were getting. They structured the open data policy with New York’s as a template. San Diego has a lot of different datasets. We have 35 departments. We have 11,000 people that work for us, and we’re a city of 1.3 million people. We obviously have a lot of data that can be used to build things upon, for residents to build applications, and also for the government to be more transparent. If you’re a resident and you want to know what construction is happening in your neighborhood, you can learn that. And if you’re a software developer and you want to integrate street paving information, for example, into your application, you can do that as well.

How were you able to structure a team for the open data initiative?

Maksim Pecherskiy: Initially it was just me. The first part of the open data policy was to do an inventory of what data the city has. As you can imagine, we have a variety of different data sources. It was a very large task. When it came time for me to hire my second person, I tried to find somebody that I could put in front of a director and somebody that could run a meeting. I also focused on data visualization skills because when you are building out a data team, the infrastructure side, somebody that is able to work with data is obviously the first need. In this case, that was me because my background is in software development. The next key component is a person with strong visualization experience. The reason for that is you’re going to lose the buy-in very quickly if all you’re doing is talking about software, numbers, structuring datasets, and so on. But people understand charts, and they can make decisions off of dashboards. There is a lot of value in strong visualization. 

The next person that we’re bringing on is a data science coordinator. I’m very excited about it. This person has experience in JavaScript and in data science and also has done machine learning and some of the predictive analytics.
What I am excited about is that once we have this person on board, we’ll be a full team. Obviously, we can always grow as we continue to take on additional tasks. But between somebody that can do the data science, somebody that can do the infrastructure and somebody that can do the visualization, I’m really excited about what we can get done.

Recently you spoke at Strata on a panel called “Beyond the Numbers: Expanding the Size of Your Analytic Discovery Team.” What are the major insights you presented on that panel?

Maksim Pecherskiy: I mentioned the data inventory that we had to do. It was really a survey. We worked with 65 people across 35 departments, and we asked them what data sources they use, what data is in those sources, and had them tell me about the metadata regarding the data. Obviously, it’s already out of date because things change so fast. New IT systems are implemented. Database schemas are changed. It’s continuously evolving. We got many false positives –  for example, when you ask some people for data, they think that a PDF report is data or that a web page is data. That’s an education piece. The consensus when we were done with the inventory/survey was that we didn’t want to do it again. This caused us to start looking into data catalogs. 

We do have a catalog in the city now that we recently deployed from Alation. The project I talked about at Strata was our first end-to-end project, and I talked about how we use the data catalog to mark up and document what data is in the database. We wanted to show our residents a map of streets showing when they were last paved, when they were scheduled to be repaved, and what the overall condition index is. There is a truck that drives around every city street and gets its condition index.

Before we had a data catalog, we looked at this database and started working with the team. We realized that there are 319 tables, and we weren’t sure how they were able to run the report that they run. There was no consensus on what certain fields mean – for example, what is a repair mile versus a pave mile versus a center-line mile. With the use of the data catalog, we were able to bring that database in and have solid documentation on what the fields are so we won’t have to go through that high-friction exercise again. And, we’re obviously able to structure the data correctly to release it as open data – which is coming shortly – and then as a map.

That’s great. You also talked about releasing dashboards to the city residents so that they can go online and find any information they need about the streets. 

Maksim Pecherskiy: Exactly. That is the end product, and that is available at streets.sandiego.gov.

Another big area today is IoT with sensors being utilized to create smart cities. Can you highlight San Diego’s efforts in making this a reality?

Maksim Pecherskiy: I believe we are the first city to have individually metered street lights, which means that every street light has a meter on it and it reports its own energy usage. On top of that, as we’re continuing to replace our street lights, we have a pilot program – that hopefully will expand – with several of these lights having a variety of sensors on them. Some of those could be visual sensors, some could be audio sensors, and some could be temperature/humidity/air quality sensors.  And, it’s extensible so you could add additional sensors through an interface. 

All of that data flows back to a central platform that we can then do processing on such as computer vision or counting bikes or counting traffic. I’m pretty excited about that. It’s a very forward-looking effort. We are really building a base and platform for the city to become a truly pluggable smart city. 

The other cool thing we have relates to our parking meters. All of our parking meters are smart parking meters in that they continuously transmit information back to a central source. We have really granular data on when a car left, when a car entered the space, and how long the car remained. We are not tracking the “who” because there is no way to track that. But what that allows us to do is have a better conversation about does a bike lane influence parking in a certain area or is business profitability correlated with parking meter turnover in certain areas. There are a lot of really interesting things we can do with that.

I would say we’re already a pretty smart city. We obviously have a long way to go, and I’m excited about some of the things that we’ll be able to do.

From a technology perspective, where do you keep all that sensor data? That must be a lot of data.

Maksim Pecherskiy: From a technology perspective, we’ve had a pilot going on with GE, and we’re using their Predix platform to store all that information, and then we are able to expose APIs out of that for our own and hopefully eventually the public’s use of it.

You mentioned the use of data catalogs. Are data catalogs being used in the IoT initiative as well?

Maksim Pecherskiy: Most of our data is still about things – it’s not from things. The nice thing about data catalogs is they don’t really care. Data is data. Data has metadata. Data has certain columns. Data has certain fields. It has certain semantics about it. My vision for the future is to have an analyst in the City of San Diego be able to go in, open Alation, and ask a question such as, “How can I predict whether adding a bike lane will increase or decrease business activity in a certain area.” And, I want that person to be able to find that data, understand that data, and be able to make an analysis without having to call me.

Maxim, it’s been a pleasure talking with you today to learn about the City of San Diego. 

  • Ron PowellRon Powell
    Ron is an independent analyst, consultant and editorial expert with extensive knowledge and experience in business intelligence, big data, analytics and data warehousing. Currently president of Powell Interactive Media, which specializes in consulting and podcast services, he is also Executive Producer of The World Transformed Fast Forward series. In 2004, Ron founded the BeyeNETWORK, which was acquired by Tech Target in 2010.  Prior to the founding of the BeyeNETWORK, Ron was cofounder, publisher and editorial director of DM Review (now Information Management). He maintains an expert channel and blog on the BeyeNETWORK and may be contacted by email at rpowell@powellinteractivemedia.com. 

    More articles and Ron's blog can be found in his BeyeNETWORK expert channel.

Recent articles by Ron Powell



Want to post a comment? Login or become a member today!

Be the first to comment!