Suppose you want feedback on an idea, so you offer an opportunity for people to provide feedback. Ultimately, you receive 1.1 million pieces of feedback, what do you do? What. Do. You. Do?!
This is exactly what happened when the FCC requested public comment on their proposed regulations of net neutrality. Absent Keanu Reeves, this can create a real challenge. The Sunlight Foundation looked at over 800,000 digital comments submitted to the commission with some text mining using the gensim Python library.

The above graph shows feedback grouped by topic. Below are some key findings:We estimate that less than 1 percent of comments were clearly opposed to net neutrality
At least 60 percent of comments submitted were form letters written by organized campaigns (484,692 comments); while these make up the majority of comments, this is actually a lower percentage than is common for high-volume regulatory dockets.
At least 200 comments came from law firms, on behalf of themselves or their clients.

The data visualization groups the comments by topic—a frequent tool in text mining. Words or phrases that appear together are often associated with the same topic. For instance, a writing campaign by Avazz is easily visible in one of the groups.
At any point, you can view the specific comments associated with each group. This is useful to get a mapping between the analytics and the raw data. At times, it’s clear where there is some misalignment.
In addition to the D3 visualization, additional analysis was performed on the text:

Around two-thirds of commenters objected to the idea of paid priority for Internet traffic, or division of Internet traffic into separate speed tiers. This topic was discussed in many independent comments, as well as form letter campaigns organized by the Nation, Battle for the Net, CREDO Action, Daily Kos and Free Press. Common keywords in this group included “slow/fast lane,” “pay to play,” “wealthy,” “divide” and “Netflix.”
About the same number of comments, including submissions from form letter campaigns organized by the Nation, Badass Digest, CREDO Action, Daily Kos and Free Press, asked the FCC to reclassify ISPs as common carriers under the 1934 Communications Act. Common keywords in these comments included “common carrier,” “(re)classify,” “authority” and “Title II” (a part of the act that might grant the FCC this authority). A smaller portion of commenters advocated a regulatory strategy with a similar effect but a different legal basis, relying on section 706 of the 1996 Telecommunications Act.
Several form letters either from the Daily Kos or of unknown provenance (combined with non-form letters) advocated treating broadband providers like a public utility. About 15 percent of comments discussed this topic.

These findings aren’t immediately visible in the visualization, revealing some of the visualization’s shortfalls. 
The work itself is available on GitHub. 
Suppose you want feedback on an idea, so you offer an opportunity for people to provide feedback. Ultimately, you receive 1.1 million pieces of feedback, what do you do? What. Do. You. Do?!
This is exactly what happened when the FCC requested public comment on their proposed regulations of net neutrality. Absent Keanu Reeves, this can create a real challenge. The Sunlight Foundation looked at over 800,000 digital comments submitted to the commission with some text mining using the gensim Python library.

The above graph shows feedback grouped by topic. Below are some key findings:We estimate that less than 1 percent of comments were clearly opposed to net neutrality
At least 60 percent of comments submitted were form letters written by organized campaigns (484,692 comments); while these make up the majority of comments, this is actually a lower percentage than is common for high-volume regulatory dockets.
At least 200 comments came from law firms, on behalf of themselves or their clients.

The data visualization groups the comments by topic—a frequent tool in text mining. Words or phrases that appear together are often associated with the same topic. For instance, a writing campaign by Avazz is easily visible in one of the groups.
At any point, you can view the specific comments associated with each group. This is useful to get a mapping between the analytics and the raw data. At times, it’s clear where there is some misalignment.
In addition to the D3 visualization, additional analysis was performed on the text:

Around two-thirds of commenters objected to the idea of paid priority for Internet traffic, or division of Internet traffic into separate speed tiers. This topic was discussed in many independent comments, as well as form letter campaigns organized by the Nation, Battle for the Net, CREDO Action, Daily Kos and Free Press. Common keywords in this group included “slow/fast lane,” “pay to play,” “wealthy,” “divide” and “Netflix.”
About the same number of comments, including submissions from form letter campaigns organized by the Nation, Badass Digest, CREDO Action, Daily Kos and Free Press, asked the FCC to reclassify ISPs as common carriers under the 1934 Communications Act. Common keywords in these comments included “common carrier,” “(re)classify,” “authority” and “Title II” (a part of the act that might grant the FCC this authority). A smaller portion of commenters advocated a regulatory strategy with a similar effect but a different legal basis, relying on section 706 of the 1996 Telecommunications Act.
Several form letters either from the Daily Kos or of unknown provenance (combined with non-form letters) advocated treating broadband providers like a public utility. About 15 percent of comments discussed this topic.

These findings aren’t immediately visible in the visualization, revealing some of the visualization’s shortfalls. 
The work itself is available on GitHub.

Suppose you want feedback on an idea, so you offer an opportunity for people to provide feedback. Ultimately, you receive 1.1 million pieces of feedback, what do you do? What. Do. You. Do?!

This is exactly what happened when the FCC requested public comment on their proposed regulations of net neutrality. Absent Keanu Reeves, this can create a real challenge. The Sunlight Foundation looked at over 800,000 digital comments submitted to the commission with some text mining using the gensim Python library.

  • The above graph shows feedback grouped by topic. Below are some key findings:We estimate that less than 1 percent of comments were clearly opposed to net neutrality
  • At least 60 percent of comments submitted were form letters written by organized campaigns (484,692 comments); while these make up the majority of comments, this is actually a lower percentage than is common for high-volume regulatory dockets.
  • At least 200 comments came from law firms, on behalf of themselves or their clients.

The data visualization groups the comments by topic—a frequent tool in text mining. Words or phrases that appear together are often associated with the same topic. For instance, a writing campaign by Avazz is easily visible in one of the groups.

At any point, you can view the specific comments associated with each group. This is useful to get a mapping between the analytics and the raw data. At times, it’s clear where there is some misalignment.

In addition to the D3 visualization, additional analysis was performed on the text:

  • Around two-thirds of commenters objected to the idea of paid priority for Internet traffic, or division of Internet traffic into separate speed tiers. This topic was discussed in many independent comments, as well as form letter campaigns organized by the NationBattle for the NetCREDO ActionDaily Kos and Free Press. Common keywords in this group included “slow/fast lane,” “pay to play,” “wealthy,” “divide” and “Netflix.”
  • About the same number of comments, including submissions from form letter campaigns organized by the NationBadass Digest, CREDO Action, Daily Kos and Free Press, asked the FCC to reclassify ISPs as common carriers under the 1934 Communications Act. Common keywords in these comments included “common carrier,” “(re)classify,” “authority” and “Title II” (a part of the act that might grant the FCC this authority). A smaller portion of commenters advocated a regulatory strategy with a similar effect but a different legal basis, relying on section 706 of the 1996 Telecommunications Act.
  • Several form letters either from the Daily Kos or of unknown provenance (combined with non-form letters) advocated treating broadband providers like a public utility. About 15 percent of comments discussed this topic.

These findings aren’t immediately visible in the visualization, revealing some of the visualization’s shortfalls. 

The work itself is available on GitHub.

UK Temperature Chart is an interactive graph showing the distribution and trend of UK’s climate. The chart is deceivingly simple in it’s execution. Scrolling down “builds” the graph and contains several points which highlight highs and lows of historical temperatures.
A full-sized animation is also available here.
On the downside, some experiments with a smartphone were not as enjoyable as I hoped. While the scrolling holds promise for the touch-based interface, it was a little slow.
Nevertheless, the result is a beautiful graph that is enjoyable to explore.

UK Temperature Chart is an interactive graph showing the distribution and trend of UK’s climate. The chart is deceivingly simple in it’s execution. Scrolling down “builds” the graph and contains several points which highlight highs and lows of historical temperatures.

A full-sized animation is also available here.

On the downside, some experiments with a smartphone were not as enjoyable as I hoped. While the scrolling holds promise for the touch-based interface, it was a little slow.

Nevertheless, the result is a beautiful graph that is enjoyable to explore.

What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes. What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.
As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.
Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes.

What are the ZIP codes with the highest percentile of college educated labor force and highest income? Washington Post’s dive into super ZIPs helps you explore areas with the percentile in income and college educated labor force.

As expected, population centers contain a high concentration of Super ZIPs, with Washington D.C. having the most outside of the city. Indeed, many super ZIPs exist outside the city limits.

Income and education are highly correlated and nearly universal. However, it’s worthwhile to explore income and education since many college towns (high in educated labor) may provide good, but not great incomes.

The Open Knowledge Foundation’s Open Data Index attempts to score the degree a country has opened several key datasets. In the United States, this is promulgated through data.gov site. The index measures ten categories—ranging from transportation timetables to releasing legislation—and nine criteria within each, such as whether the data exists through the timeliness.
The visualization resembles DNA sequencing. A simple color schema shows whether a criteria was met and an option to click for more information. The entire index, which consists of 60 countries, has a quality to be quickly scannable.
Other indexes, such as democracy index, is usually displayed as a global chloropleth, which seems to marginalize the smaller geographic areas or reinforce regional grouping. The OKFN visualization provides the same space and opportunity to each country, instead of tying it to its geography. The Open Knowledge Foundation’s Open Data Index attempts to score the degree a country has opened several key datasets. In the United States, this is promulgated through data.gov site. The index measures ten categories—ranging from transportation timetables to releasing legislation—and nine criteria within each, such as whether the data exists through the timeliness.
The visualization resembles DNA sequencing. A simple color schema shows whether a criteria was met and an option to click for more information. The entire index, which consists of 60 countries, has a quality to be quickly scannable.
Other indexes, such as democracy index, is usually displayed as a global chloropleth, which seems to marginalize the smaller geographic areas or reinforce regional grouping. The OKFN visualization provides the same space and opportunity to each country, instead of tying it to its geography.

The Open Knowledge Foundation’s Open Data Index attempts to score the degree a country has opened several key datasets. In the United States, this is promulgated through data.gov site. The index measures ten categories—ranging from transportation timetables to releasing legislation—and nine criteria within each, such as whether the data exists through the timeliness.

The visualization resembles DNA sequencing. A simple color schema shows whether a criteria was met and an option to click for more information. The entire index, which consists of 60 countries, has a quality to be quickly scannable.

Other indexes, such as democracy index, is usually displayed as a global chloropleth, which seems to marginalize the smaller geographic areas or reinforce regional grouping. The OKFN visualization provides the same space and opportunity to each country, instead of tying it to its geography.

Ah, 1880. An era so magical it was called guilded. To be rich was magical, to not be was soul-crushing. The United States was also coming from the Civil War and reconstruction, with the roots of federalism starting to take hold. As part of that, nationwide Census and statistical maps were a budding and remarkable practice.
It was also the year of the Garfield and Hancock election. Garfield handily won the electoral college, but by the narrowest popular vote margin of only 1,898 votes of 8.9 million votes. Two-hundred days later, Garfield was assassinated, despite revolutionary efforts of using a metal detector built by Alexander Graham Bell to locate the bullet and one of the first air conditioners ever built.
These maps are, according to Susan Schulten, are the first modern electoral maps. This map provides several techniques, now considered mundane, by providing granular electoral returns by county and by showing the degree of victory. The technique is crucial to begin displaying, through data, underlying cleavages in the society at the time—the assassination, however, was not politically motivated. As a result, there is a fine gradient showing the slow changing political positioning of the country from the north to the reconstructed south.
Illinois’ is an excellent example which highlights the republican leanings of the north compared to the democratic favor of southerners. The smaller graphs breaking-down the electoral and popular votes show Garfield’s marginal wins within the most populous states—driving a robust electoral college victory—but the substantial wins in the smaller, southern states.
The series of small graphs are a good touch by putting the main electoral map in context of the returns. Like other graphs of the era, the lithograph and slight imperfections only adds to the aesthetic quality. Ah, 1880. An era so magical it was called guilded. To be rich was magical, to not be was soul-crushing. The United States was also coming from the Civil War and reconstruction, with the roots of federalism starting to take hold. As part of that, nationwide Census and statistical maps were a budding and remarkable practice.
It was also the year of the Garfield and Hancock election. Garfield handily won the electoral college, but by the narrowest popular vote margin of only 1,898 votes of 8.9 million votes. Two-hundred days later, Garfield was assassinated, despite revolutionary efforts of using a metal detector built by Alexander Graham Bell to locate the bullet and one of the first air conditioners ever built.
These maps are, according to Susan Schulten, are the first modern electoral maps. This map provides several techniques, now considered mundane, by providing granular electoral returns by county and by showing the degree of victory. The technique is crucial to begin displaying, through data, underlying cleavages in the society at the time—the assassination, however, was not politically motivated. As a result, there is a fine gradient showing the slow changing political positioning of the country from the north to the reconstructed south.
Illinois’ is an excellent example which highlights the republican leanings of the north compared to the democratic favor of southerners. The smaller graphs breaking-down the electoral and popular votes show Garfield’s marginal wins within the most populous states—driving a robust electoral college victory—but the substantial wins in the smaller, southern states.
The series of small graphs are a good touch by putting the main electoral map in context of the returns. Like other graphs of the era, the lithograph and slight imperfections only adds to the aesthetic quality. Ah, 1880. An era so magical it was called guilded. To be rich was magical, to not be was soul-crushing. The United States was also coming from the Civil War and reconstruction, with the roots of federalism starting to take hold. As part of that, nationwide Census and statistical maps were a budding and remarkable practice.
It was also the year of the Garfield and Hancock election. Garfield handily won the electoral college, but by the narrowest popular vote margin of only 1,898 votes of 8.9 million votes. Two-hundred days later, Garfield was assassinated, despite revolutionary efforts of using a metal detector built by Alexander Graham Bell to locate the bullet and one of the first air conditioners ever built.
These maps are, according to Susan Schulten, are the first modern electoral maps. This map provides several techniques, now considered mundane, by providing granular electoral returns by county and by showing the degree of victory. The technique is crucial to begin displaying, through data, underlying cleavages in the society at the time—the assassination, however, was not politically motivated. As a result, there is a fine gradient showing the slow changing political positioning of the country from the north to the reconstructed south.
Illinois’ is an excellent example which highlights the republican leanings of the north compared to the democratic favor of southerners. The smaller graphs breaking-down the electoral and popular votes show Garfield’s marginal wins within the most populous states—driving a robust electoral college victory—but the substantial wins in the smaller, southern states.
The series of small graphs are a good touch by putting the main electoral map in context of the returns. Like other graphs of the era, the lithograph and slight imperfections only adds to the aesthetic quality.

Ah, 1880. An era so magical it was called guilded. To be rich was magical, to not be was soul-crushing. The United States was also coming from the Civil War and reconstruction, with the roots of federalism starting to take hold. As part of that, nationwide Census and statistical maps were a budding and remarkable practice.

It was also the year of the Garfield and Hancock election. Garfield handily won the electoral college, but by the narrowest popular vote margin of only 1,898 votes of 8.9 million votes. Two-hundred days later, Garfield was assassinated, despite revolutionary efforts of using a metal detector built by Alexander Graham Bell to locate the bullet and one of the first air conditioners ever built.

These maps are, according to Susan Schulten, are the first modern electoral maps. This map provides several techniques, now considered mundane, by providing granular electoral returns by county and by showing the degree of victory. The technique is crucial to begin displaying, through data, underlying cleavages in the society at the time—the assassination, however, was not politically motivated. As a result, there is a fine gradient showing the slow changing political positioning of the country from the north to the reconstructed south.

Illinois’ is an excellent example which highlights the republican leanings of the north compared to the democratic favor of southerners. The smaller graphs breaking-down the electoral and popular votes show Garfield’s marginal wins within the most populous states—driving a robust electoral college victory—but the substantial wins in the smaller, southern states.

The series of small graphs are a good touch by putting the main electoral map in context of the returns. Like other graphs of the era, the lithograph and slight imperfections only adds to the aesthetic quality.

Animated maps of tweets across the world, in regions, within urban areas, and within specific domains are appearing in the wild more frequently. Those maps usually emphasize dynamic visualizations, but even for fans of those animations, it is difficult to discern the important information, such as where and when they occurred.
The Atlantic Cities noted an outstanding visualization by Giorgia Lupi as part of the UrbanSensing project. At first, this map appears like all others, displaying the location of tweets across Milan
However, the colors on the map correspond to intensity of time, instead of overall intensity. Assuming tweets exist, the intensity shows when an area becomes active. Some insights include the tweets around the university peak during typical lecture hours; the business district during, obviously, business hours; the soccer stadium preceding the match, but not during.
But those tweets during mid-day represent a portion of tweets. In a simple bar graph, Giorgia also shows the simple distribution of tweets by 3-hour segments. Almost 40% of a day’s tweets are after work and during the late-night hours.

Animated maps of tweets across the world, in regionswithin urban areas, and within specific domains are appearing in the wild more frequently. Those maps usually emphasize dynamic visualizations, but even for fans of those animations, it is difficult to discern the important information, such as where and when they occurred.

The Atlantic Cities noted an outstanding visualization by Giorgia Lupi as part of the UrbanSensing project. At first, this map appears like all others, displaying the location of tweets across Milan

However, the colors on the map correspond to intensity of time, instead of overall intensity. Assuming tweets exist, the intensity shows when an area becomes active. Some insights include the tweets around the university peak during typical lecture hours; the business district during, obviously, business hours; the soccer stadium preceding the match, but not during.

But those tweets during mid-day represent a portion of tweets. In a simple bar graph, Giorgia also shows the simple distribution of tweets by 3-hour segments. Almost 40% of a day’s tweets are after work and during the late-night hours.

Using 45-degree angles is a great way to add perspective to scatterplots. For a different context, Arnold Kling recommended a similar approach for school performance.

datarep:

Public perceptions of US death rates for selected causes: Comparison of public opinion survey estimates versus empirical data shows how people overestimate of rare deaths

The need to compare cities seems to be innate, from best sports teams to the highest quality of life. Visually, the comparisons can be tough. ESRI seems to be willing to take a stab by releasing the Urban Observatory in an attempt to compare urban areas. The visualization shows cities—16 to choose from—side-by-side at the same altitude. You can choose from a list of 16 data points to display on the map.

Efforts like this always need to be evaluated in two parts: the technical success and the practical. The former refers to the technical feat of standardizing data, create side-by-side maps, and the visualization effort. The latter referring to the usefulness of information.

The technical success is interesting. The UI is intuitive and the display is engaging. The synchronous zoom is fun.

The usefulness of the display is debatable. It’s quick and relatively easy to see the landscape of, for instance, parks.But it is more difficult to discern any answer from the maps. Simple questions, such as, “who has more park space” is left unanswered. More nuanced questions are also harder to discern.

ESRI may continue to invest in this prototype by allowing more exploration of maps and data. 

Regression analysis is probably the most common form of parametric statistical analysis. Usually, regressions are visualized as slope and intercepts since it immediately relates to the interpretation of the coefficients. Unfortunately, the focus on slope makes it hard to display several coefficients on the same plane.

At Code a la Mode, Konstantin Kashin shows how to graph coefficients with standard errors in R.

pd <- position_dodge(width=0.2,height=NULL)
ggplot(treisman, aes(specification,coef, color=method)) +
geom_point(aes(shape=method),size=4, position=pd) +
scale_color_manual(name="Method",values=c("coral","steelblue")) +
scale_shape_manual(name="Method",values=c(17,19)) +
theme_bw() +
scale_x_continuous("Specification", breaks=1:length(specification), labels=specification) +
scale_y_continuous("Estimated effect of being a former British colony or the UK on TI98") +
geom_errorbar(aes(ymin=lb,ymax=ub),width=0.1,position=pd)
These graphs appear to be based on a standardized regression, which tends to have each coefficient around the same numerical value. Unstandardized regressions, which seem to be more common, can be more problematic due to a wider range of potential values. Nevertheless, this code provides a good basis for unstandardized regressions.

Always a good idea.

annkemery:

Dataviz doodling: I’ve sketched these findings 4 different ways and still despise every option… Back to the drawing board tomorrow!

P.S. If you work with charts, it’s always a good idea to keep a few markers next to your desk for easy sketches.