Data Scientist Interview: Michal Migurski of Stamen

Michal Migurski is the Director of Technology and partner at Stamen, a San Francisco design studio specializing in data visualization and mapmaking. He has been building for the web since 1995 and earned his degree in Cognitive Science from UC Berkeley. Below, Michal talks about how Stamen makes beautiful interactive maps out of geographic data (like the one above of the Bay Area), the past and future of visualization, and whether he considers himself a data scientist.

 

Metamarkets: What is your day-to-day work like?

Michal Migurski: We do a lot of work with geography and maps. That’s both base mapping, which is to say contextual maps to show data on top of, as well as data maps that will show you thematic layers of information that sit on top of contextual maps. When we’re working with those datasets, day to day for me typically involves cutting and slicing through very large datasets using a number of open source geography tools. Dealing with data input, dealing with data output, dealing with translation, rendering, and visualization.

Then the other part of that is where that data comes from. That’s the more business development, client relations side of things. Stamen is a client services design firm. Unlike, for example, Pete [Skomoroch] at LinkedIn, we don’t have a giant set of data that’s internal to our company. We deal with whatever the clients bring to us.
In a lot of cases, there’s a whole business development pipeline process that’s involved in figuring out what data clients have, how to convert it into something we can use, how to square their expectations of what they think they have versus the reality, and then turning that all into something that’s going to be workable and publishable on the web.

Metamarkets: What are some examples of clients that you’ve worked with or datasets you’ve dealt with?

Michal: We recently worked with Zillow, the real estate company just a few months ago. We helped them produce a nationwide map of city, state, county and zip code level information about housing foreclosures. In that case, they were able to come up with a dataset linked to those geographies. Our job was to connect that to Census information for the whole US and then produce an interactive browser and rendering environment for it all. That is in addition to the UI design that goes on around the application. It was about cutting through that dataset and figuring out what was interesting there, what could be highlighted, which zip codes and areas deserve special attention due to a higher than normal foreclosure rate, that kind of thing.

Metamarkets: What is your own background like and how did you come to Stamen?

Michal: I’ve been here the second longest of anybody in the company. I first joined Stamen in 2003, when it was just Eric Rodenbeck, our founder. My own personal background, I come out of UC Berkeley. I did an undergraduate degree in cognitive science and a minor in computer science there, and at the time, was always interested in graphic design and visual design as a discipline. After I left school, I did a bit of basic web design in the waning days of the dot-com boom, and then after a while of working on a number of projects and learning all the technologies involved, I hooked up with Eric. He had been running Stamen for about two years up to that point. What I brought to the studio was a focus on technology and data that allowed us to go out and look for datasets, do in-browser visualizations, much more live and Internet-focused work than we had been able to do before.

Metamarkets: Going off of that, how have the possibilities for visualization, interactive visualization in particular, expanded in the time that you’ve been working in this sphere?

Michal: Oh, it’s been huge. I don’t think “visualization” was a buzzword of any kind when Eric and I were first working together. He worked at Wired Magazine for a long time. He was a print designer there. Then for a while, he was at a company called Quokka Sports, seminal in the live event visualizations space. They did visual interpretations of live sports. Not like football or basketball, but more like mountain climbing, motorcycle races, sailing races–weird, rabid fanbase kind of sports. They would do a lot of things like visualizing heart rates of different competitors in a sport, or showing multiweek climbs on mountains, that kind of thing. It was a much different time granularity than regular sports had, which meant that it had to have a different form of narrative and storytelling.

When he and I started working together, everyone looked at that and said, “Well, that’s a form of graphic design.” As more data became available online, and especially as the Web 2.0 thing happened around 2005 or so, you started seeing people’s normal day-to-day interactions getting turned into information.

One of our early clients back in 2006 was Digg.com, one of the first Web 2.0 social websites that experimented with the idea of opening up their database completely live. Flickr did their API, and then Digg did their API afterwards. We helped them get their heads around what it was to publish data, and then also what it was to publish live views of that data changing in real time.

Metamarkets: Do you have any prediction for how the possibilities for visualization are going to continue to develop?

Michal: Honestly, I think that visualization is going to get more and more normal and more and more expected as a part of just daily dealing with information. The way that we understand the word visualization to be used, often all it means is the next logical step in showing information. It’s really more a future-focused word, whereas things that used to be called visualization become normal and day-to-day and aren’t really considered special anymore. You think about scatter plots, pie charts, colored heat maps, that kind of thing. All that stuff was incredibly cutting edge a decade ago, and then as the data and tools have become more available they become features of other things.

The most critical thing in what we do, part of it is chasing the liveness of it, the working with live data wherever possible these days. That’s getting really interesting with Twitter’s streaming API, stuff that Facebook is doing.

Metamarkets: The majority of the visualization that you do is interactive?

Michal: I’d say so, yeah. We definitely do pieces that aren’t, but our hearts are probably most closely in the ones that are interactive. Especially live data, something that changes every time you look at it.

Metamarkets: Do you consider yourself a data scientist?

Michal: I don’t know that I do. I’ve heard the term used in two different ways. One way that I’ve heard it used I would consider myself an example of, somebody who has to deal with data, and experiments with data, and operates on data and turns it into something else. But I think that the way the term started to get used much, much more over the past few years is in connection to a fairly specific tool chain and approach to data which focuses on things like Hadoop for crawling large datasets, different types of statistical methods for dealing with large, connected corpuses of documents or people or whatever.

Stuff like basically what Pete does at LinkedIn is very much at the heart of what a data scientist is about. With Hilary Mason from bitly, it’s very much the same thing. Working at a company that produces this massive corpus of data and then having somebody whose only job is to just look at it really closely and understand what’s in there.

Metamarkets: So the toolset that you draw on is a little different?

Michal: Yeah, it’s the tools that we draw on. One way of looking at it is that a regular scientist deals with data in the same way. It’s just that they’re not a data scientist. They’re an astronomy scientist, or a frog scientist or whatever, but they’re dealing with a lot of the same tools. Data science, it seems to me, it’s about people whose area of study is the data itself, rather than something the data is about.

Metamarkets: Is the profession that’s sprung up around data science genuinely new, or is it more of a rebranding?

Michal: I definitely think it’s new. I think that it’s also transitional. One of the things that I keep seeing in different disciplines, for example with interactive mapmaking on the web, for a long time that wasn’t something that a lot of people knew how to do or bothered to do. Then, companies like ours, more independent firms, started to work with interactive map data online, perhaps using the Google Maps API, perhaps using other things. It was a really specialist, novel kind of thing. What is happening now is you are starting to see the pendulum swing in the other direction. All of the subject matter experts and community members in worlds like journalism, for example, are starting to figure that stuff out for themselves.

Through the medium of conferences like ONA [Online News Organization] and other journalist-focused conferences that work is getting brought back into the fold. I suspect that something similar would happen with data science as well. The people that helped create that data in the first place, which, in the case of the place like bitly might be the operations or development folks, will start to pull those tools back and use them themselves, rather than farming it off to a data scientist.

I can totally imagine that, whether it is Microsoft or Google or somebody else, the first group who make dealing with a dataset on the scale of Hadoop visible through an interface like Excel, will basically get all of those marketing, sales-like research people and companies that currently deal with Excel, to work with gigantic datasets. Those people will be data scientists, too.

Metamarkets: What are some people or projects or organizations within the visualization realm that you admire?

Michal: There is a company in DC called MapBox that has been doing an excellent job of encouraging a desktop publishing revolution in web mapping over the past few years, that I have a lot of respect for. They are essentially tool builders that are making things like simple web maps easy and accessible to people, in a way that I think Google hasn’t quite succeeded in doing. There is also a professor, Mark Hansen, who’s currently at Columbia. He is probably one of my favorite statisticians, partially because he is a practicing artist and a statistician at the same time. A lot of his work is very much riding that line between art practice and science practice.

Mark is well known for one project that he did with collaborator Ben Rubin called Listening Post, a data collection machine and installation that lives in a museum and represents live framed conversations from visitors to the museum. It’s just a beautiful piece to be in the same room with. But, at the same time, he does a lot of heavy analysis of things like Twitter streams for the New York Times. He’s just fantastic.

Metamarkets: You don’t necessarily consider yourself a data scientist, but do you consider yourself an artist?

Michal: Yeah, in a certain way. As a collective group, Stamen has had many pieces in museums. We have participated in two shows at the MoMA in New York. We are currently in a biennial exhibition in San Jose called Zero1. But a lot of the work that we produce for those shows tends to be almost like a gallery prepped version of work that we do that’s separate from the art world.

For example, the piece that we currently have up at the Zero1 Festival in San Jose is an investigation into the private transportation networks of Silicon Valley. We are looking at the way that Google and Apple and all these other companies have these private bus lines for their employees and how those are changing where people live and where they work in the Valley. We hired bicycle messengers and people with stopwatches to chase these buses around and mark down who was getting off and who was getting on. It doesn’t look like art, until you put it up on a gallery wall.

Metamarkets: What’s a current project that you’re excited about?

Michal: We are currently in the final stages of planning a conference for the OpenStreetMap project. There’s going to be a conference in Portland in about three weeks that I’m helping to co-plan with other foundation board members and community members. That’s probably the most exciting thing that I’m working on right now, largely because it’s such a core dataset to a lot of the ways that we work and a lot of the projects that we produce. It’s really exciting to get 50+ speakers together into a big space and just talk through all the different issues with collecting data, and scrubbing it, and making art out of it, and dealing with it, and helping the community.