Data Surfer

Sites you need to see

This will be the last posting for Data Surfer.

I'll be writing similar entries for The Bee's new investigations blog, The Public Eye. In addition to the data-centered items seen here, you'll find a variety of postings by Bee reporters that support the "watchdog" mission of the paper. The purpose of The Public Eye is to "break news, as well as to follow up on investigations with tidbits, news breaks and behind-the-scenes descriptions of our news gathering process".

See you on the other blog.

Google recently released a cleaner version of its public transit maps database, which has grown to more than 450 cities worldwide.

Google Transit provides step-by-step directions for public transportation similar to what it does for travel by personal vehicle. You type in a starting address and destination, plus the date and time you want to leave. Google then tells you where and when to connect with buses and trains to reach your goal.

Northern California cities are pretty well represented in Google Transit. You'll find detailed maps for the Bay Area, Santa Cruz, Stockton, Santa Rosa, Redding and Rio Vista. The Sacramento region is covered, too, with mapping for Regional Transit, Roseville Transit, Unitrans and Yolobus.

Grateful Dead.JPGIt's spring break season. A good time for data guys to kick back and enjoy some of the terrific music available on the Web. You probably know about free streaming services like Pandora, Live365 and AccuRadio that let you listen to specific genres and customize your own channels of music. But I bet you don't know about Wolfgang's Vault, a site that specializes in recordings of live concerts from the past, mostly from the 60s and 70s.

This growing collection includes classic performances of top bands from a variety of genres. It's a real trip down memory lane. Here are just a few of the great featured musicians:

Rock: The Rolling Stones, Fleetwood Mac, Bruce Springsteen, The Grateful Dead 
Folk: Joan Baez, Bob Dylan, James Taylor, Gordon Lightfoot, Leonard Cohen
Country: Merel Haggard, The Oak Ridge Boys, Jimmy Webb, Dolly Parton
Jazz: Miles Davis, Count Basie, Dave Brubeck, Louis Armstrong, Oscar Peterson
Blues: Willie Dixon, John Mayall, Bonnie Raitt, Buddy Guy, Stevie Ray Vaughn
R&B: Booker T., Tina Turner, The Pointer Sisters, Ray Charles, Earth, Wind & Fire

The recordings are searchable by band, venue, genre and time period. In addition to free audio streaming of classic concerts, Wolfgang's Vault also sells downloads of the music as well as posters and photography. There's an iPhone app, too.

A new investigative journalism group debuted this week. It joins a growing number of online news efforts in the state that includes California Watch, Voice of San Diego and The Bay Citizen.

FairWarning is a nonprofit operation based in Sherman Oaks that specializes in health, safety and corporate conduct. Its "mission is to arm consumers and workers with valuable information, and to spotlight reckless business practices and lax oversight by government agencies." The site's first three investigations look at old GM pickup trucks that explode in crashes, gross undercounting and fudging of injury data at U.S. companies, and the growing number of accidents involving all-terrain vehicles.

C-SPAN.jpgThis week C-SPAN, the non-profit cable TV service covering national government and politics, announced the expansion of its online video archive to include every program aired since 1987. The archive contains over 160,000 hours of programming from all three C-SPAN networks (C-SPAN monitoring the U.S. House, C-SPAN2 watching the Senate, and C-SPAN3 broadcasting live public affairs events).

Getting around this massive archive is fairly easy using a search interface that lets you narrow your retrieval by program title, person name, organization name, location, date, subject keyword, etc. If you don't have something specific in mind, you can browse the collection by "Most Recent," "Most Watched" and "Most Shared". Or rummage through "Memorable Moments from the Video Library" -- a list of historically-important recordings that includes Al Gore's 2000 election concession speech, Barack Obama's 2004 convention address, George W. Bush announcing the capture of Saddam Hussein, Dan Quayle's remarks on Murphy Brown, and Bill Clinton's "I did not have sex" assertion. 

Thumbnail image for GATES.jpgAs a general rule I don't review celebrity blogs and Twitter feeds in Data Surfer. But when it comes to Bill Gates -- one of the world's richest men, technology mogul and leading philanthropist -- well, you just have to take notice. Gates, of course, is co-founder and chairman of Microsoft. In 2006 he ceased day-to-day activities at Microsoft to devote more time with his global charity, the Bill & Melinda Gates Foundation.

Yesterday, Gates began posting to Twitter ("sharing cool things I'm learning through my foundation work and other interests..."). And today he launched The Gates Notes, a means to keep the public apprised of his activities and travels. It's also a place to share conversations with experts in fields that interest him most: energy; global health; education; agriculture; development; environment; foreign aid and technology. The site includes links to related materials Gates finds particularly helpful

One of the great things about the Internet is its capacity to quickly start gathering essential information when a disaster hits. But because the Net is decentralized, it's likely such efforts will be fragmented and uncoordinated. 

That's what happened soon after the Haitian earthquake. Several web sites, including the Miami Herald, CNN and the New York Times began collecting information on missing persons and the people trying to find them. That created isolated "silos" of data which greatly complicated the process of reconnecting friends and families.

Fortunately, search giant Google stepped in to help coordinate the information flow by aggregating the data and establishing a single PersonFinder data-entry function for collecting the information. PersonFinder is portable and has been embedded in other web pages, such as Haitianquake.com and the U.S. State Department's earthquake page. So far, the Google database is tracking some 32,500 records.

"Covering the Decade in Magazine Covers" is the print media's answer to CNN's "The Decade in 7 Minutes". Prepared by the Magazine Publishers Association and the American Society of Magazine Editors, the former video is a two-minute survey of the Naughts as documented by 92 iconic images. You'll see many of the same events that defined the decade: terrorism, war, scandal, etc.

Barbasol.jpgHere's a pop culture quiz: what do the following phrases have in common?

"I like Barbasol so well I shave all over."
"It's New! It's Lilt!"
"Are your teeth alluring, too?"

They're all magazine advertising slogans published in the 1950s. And they're all part of an online collection of over 7,000 U.S. and Canadian print ads housed at the Duke University Libraries. This image database covers five product areas (beauty and hygiene, radio and television sets, transportation and World War II propaganda) and spans 1911-1955. The ads feature many well-known brands that have existed for decades (like Ivory Soap, Crest, Greyhound Bus and Listerine), as well as those that have vanished from the marketplace (Burma-Shave, Wildroot, Braniff). The Duke collection is searchable by product name, company, general category and date range.

If printed magazine advertising seems a little old-fashioned to you, Duke also has an online collection of vintage television commericals dating from the 1950s to the 1980s. These are well-indexed and playable in your web browser.

The investigative journalism team California Watch officially debuted its web site over the weekend (though it has been publishing stories for a few months now). CW projects have been published in many California newspapers including the recent analysis of the dubious transfer of campaign donations from county political party committees to individual candidates

Of special interest to this blog's readers is California Watch's Data Center, a growing collection of important state-related databases the group acquired and made available online in an easy-to-search format. The initial datasets cover such things as federal stimulus grants, state wildfires, local unemployment, crime, swine flu and Census stats.

Related to Data Center is CW's Resources page, a good listing of external sources of data on state politics, education, health, public safety and the environment. This page also links to databases you can download and manipulate yourself. 

December 31, 2009
Happy New Year!

bp midnight 2009.JPG[At midnight the K St. Mall comes to life with 12,000 celebrating the 2009 new year. January 1, 2009 photograph by Bryan Patrick  of the Sacramento Bee.]

A new study by the UC San Diego Global Information Industry Center attempts to calculate the total amount of information (digital and analog) that Americans gobble in a year. According to How Much Information? 2009 Report on American Consumers, U.S. households consumed 3.6 zettabytes of data last year. Most of that information came in the form of television and computer games. But it also includes activities like cell phone use, surfing the Internet, listening to the radio, and reading books, magazines, newspapers, etc. On average each American assimilated 33.8 gigabytes of information and 100,564 words every day.

So what the heck is a zettabyte? It's equivalent to 1 billion terabytes, or 1 million million gigabytes. (The typical PC hard drive holds about 100 gigabytes of information.)   

Weinstocks.jpg The Library of Congress and the National Endowment of the Humanities co-sponsor an effort to preserve electronically historic U.S. newspapers housed in the LOC. The National Digital Newspaper Program has scanned over 1 million pages and makes many of them them accessible to the public through the "Chronicling America" web site. So far the online collection includes only newspapers from 15 states that were published between 1880 and 1922. (Titles after 1922 are generally protected by copyright.) The material is full-text searchable, so visitors to the site are able to retrieve pages by entering words and phrases into a search function.  

Several California papers are among the digitized newspapers, including the San Francisco Morning Call, the Amador Ledger and the San Mateo Item. The only ones representing Sacramento are the Daily Record-Union (1875-1991) and Record-Union (1891-1903), predecessors of the modern Sacramento Union which ended daily publication in 1994.

[Weinstock's advertisement appeared in the Nov. 24, 1887 edition of the Sacramento Daily Record-Union.]

 

Over the weekend the online Bee launched redesigns of our Data Center and Investigations pages. The former aggregates all the valuable internal and external data sources (databases, interactive maps and charts) the Bee offers in an easy-to-browse listing. The latter showcases current investigative journalism produced by Bee reporters.

Coinciding with this redesign is a new name for this blog. I-Tool Tips is now Data Surfer. We think the moniker better represents what the blog has become. (And it's sure easier to say!) The aim here is to spotlight the most relevant and credible data and research related to the news. As always, your comments and suggestions are most welcome.

facebook.jpgCan you track a population's collective level of happiness over time like you do the unemployment or inflation rates? The folks at Facebook seem to think they can. They've come up with the Gross National Happiness index, a way to measure group feeling by analyzing the thousands of "updates" posted by Facebook users on any given day. An overall number is derived by counting the number of positive and negative emotion words found in the FB postings. Positive words include terms like "happy," "awesome," etc. Examples of negative words are: "sad," "tragic," etc.


The result of all this textual analysis is a longitudinal GNH graph displaying the peaks and valleys of the U.S. state of mind (or at least of the population of FB users). You see the GNH peaking on holidays, as well as during big media events such as the Super Bowl and the inauguration of President Obama. The GNH can also nose dive -- supposedly correlated with sad events like the deaths of celebrities Michael Jackson and Heath Ledger.

On the Internet a mashup is the marriage of data and some sort of visualization, typically a map. DataMasher is relatively new site that collects data produced by the federal government (much of it pulled from Data.gov) and makes it available as a downloadable spreadsheet or interactive map. Typically these data sets are broken down by state, so you can click on a state to see individual state data or look at state rankings in a table. DataMasher currently hosts 375 mashups, which cover the gamut of topics: health, economy, environment, crime, transportation, etc. You can browse these by "the latest," "highest rated" and "most discussed". Right this moment, the mosted discussed mashup is "Hate Crimes vs Population" (California is 15th). 

You can customize your own mashup on the site by choosing two data categories -- say total campaign contributions and population -- to generate your own map and table of campaign funding per capita. It's fun and fairly easy to do.

September 9, 2009
Follow I-Tool Tips on Twitter

twitterbird.jpgSeems like the whole world has gone crazy over Twitter. Entertainers, politicians, businesses and millions of regular people worldwide are posting to the microblogging service. Associated research tools have grown up around it. With the Twitter search engine you can use it to browse the latest "tweets" by word or phrase. Or use Advanced Search to refine your query by people, places, dates. Twittervision visualizes tweets on a world map seconds after they're posted, so it's been used to monitor local reaction to big news events in real-time. 

A few months ago I-Tool Tips joined Twitter to supplement this blog with links to the latest data- and research-centered news. Posting under the moniker Sacbee Research, the feed points to new statistical releases, surveys and other information generated by universities, government agencies, think tanks, etc. Check out the latest postings in the lower-right corner of this blog page. Or follow us on Twitter at: http://twitter.com/Sacbee_Research.

Incidentally, Sacbee.com provides a complete listing of Bee news and staff Twitter feeds

"Best of" lists are always suspect. Evaluation critieria are often fuzzy and so are the qualifications of the judges. Still, they can be fun to read and argue with.

Time Magazine just released its latest 50 Best Web Sites honors. It's a good mix of many of the most used, useful and entertaining online sites. You find the obvious giants of the Internet: Google, YouTube, Flickr, FacebookWikipedia, Skype, Amazon, Netflix. Then there are obscure but interesting items, like the music-streaming service Musicovery and the 3-D photo album, Photosynth. There are also sites that ought to interest this blog's readers:

* Wolfram-Alpha, a search engine that specializes in statistics and numbers. Plug in a place name and get demographic and geographic data on it.

* California Coastline, a photographic record of the entire 1,000-mile coast, including the lavish Malibu mansions of the stars.

* Popurls, aggregates the most popular blog, news and opinion sites into one big Web page.

* ConsumerSearch, organizes and summarizes the huge number of consumer product reviews available on the Net.

 

ourhousecrop.jpg If you use Google Maps, you know that service provides satellite views of streets and buildings. The level of detail varies from place to place, but generally the closest images display homes as fuzzy stamp-size blobs. Microsoft Live Search, on the other hand, offers a bird's eye view of properties. Actually four views -- each looking at the house from the east, west north and south. It's pretty creepy seeing one's home from four, relatively close-in angles that look like what Superman would see -- if he hovered outside your house.

In fairness to Google, the search leader does have 360-degree, street-level views of buildings. Not all streets are included, but there's a growing number of residential areas included. To use this feature, go into Google Maps, find an address and then drag the little orange man icon (found atop the zoom control) onto that location. If Google has street views for that spot, the street will turn blue and shift to the 360-degree photo. If the street isn't blue, then you're out of luck. You manipulate the street view with controls that let you pan up and down, as well as zoom in and out. 

geocode.jpgIn Internet-speak, a "mashup" is a web application that integrates two or more kinds of information into a new, useful resource. Mashups often use interactive maps to pinpoint various content (data, text, images, even videos) associated with a specific location.

Newspapers have started to "geocode" their stories to help readers browse the news closest to their homes. The Bee, for example, maps articles refering to places in the region. You can zoom in for a close look at your neighborhood and you can set the time period from a minimum of one week to a maximum of six months. (CrimeMapper is another ongoing Bee service that geocodes reported crimes in the region.)

Washington Post's Time-Space takes the idea a step further by including a time scale in their interactive world map of news, photos, commentary and video. As you slide the time gizmo back and forth, the distribution of news content in various countries changes day-by-day, hour-by-hour. The site includes AP reports as well as Post articles and photos.

In this Internet age, the definition of "news" is being stretched to include types of information not produced by professional journalists. Things like blog entries, press releases, crime logs, home sales and foreclosures, restaurant inspections and reviews, building permits, amateur photos and videos, etc. EveryBlock attempts to aggregate and geocode a variety content for "hyperlocal" browsing down to the neighborhood level. The web site thinks of itself as a "news feed" that can be viewed in an interactive map. EveryBlock currently covers 11 American cities -- the closest being San Francisco. You can search the content by address, ZIP or name of neighborhood.

Some of the coolest examples of map-mashups were developed by entrepreneur Dave Troy. Troy has married the live output of Twitter (short text postings), Flickr (photos) and YouTube (videos) with maps that continously update. The resulting sites -- Twittervision, Flickrvision and Spinvision -- are dynamic maps that display the latest tweets, photos and videos produced anywhere in the world. And because of their immediacy, these three sites are sometimes quicker to report breaking news than professional media. Last year's Chinese earthquake, for example, was known to Twittervision watchers before anyone else outside the quake zone.     

Although the Congress has pushed back the deadline for conversion to digital television broadcasting, hundreds of stations will shut down their analog signals on February 17 as initially planned. The FCC released a list of stations in each market that are dropping analog. Stations in our region are listed below.

City Network Callsign Licensee
Ceres N/A KBSV Bet-Nahrain, Inc.
Modesto Univision KUVS-TV Kuvs License Partnership, G.P.
Sacramento NBC KCRATV Hearst-Argyle Stations, Inc.
Sacramento The CW Network KMAX-TV Sacramento Television Stations Inc.
Sacramento N/A KSPX Paxson Sacramento License, Inc.
Sacramento Fox KTXL Channel 40, Inc.
Sacramento PBS KVIE Kvie, Inc.
Sacramento ABC KXTV Kxtv, Inc.
Stockton CBS KOVR Sacramento Television Stations, Inc.
Stockton MNT KQCA Hearst-Argyle Stations, Inc.
Stockton Telefutura KTFK-TV Telefutura Sacramento Llc

About 17.7 percent of Americans live in homes that receive television only through over-the-air broadcasts. The media research firm A.C. Nielsen has been tracking households who are still unprepared for the digital switchover. According to Nielsen, 5.8 million households (5.1 percent of all homes) are unprepared. Their stats are broken down by age, race and market. The Sacramento-Stockton-Modesto region has the 7th largest percentage of unprepareded households (7.1 percent). Albuquerque is first (12.6 percent).

Incidentally, the FCC recently published its 13th annual report on competition in the delivery of video programming. The 208 page document is filled with interesting market data on cable, satellite, broadband and broadcast companies. Though conventional cable continues to dominate with nearly 70 percent of TV households, satellite and other delivery systems are steadily gaining market share.   

December 31, 2008
Happy New Year!

newyears.jpg

Silver Creek Dance Hall at 4th and K Streets on New Years 1936. This image is part of the Faces and Places of Sacramento neighborhood history photograph project at the Sacramento Archives and Museum Collection Center. You can search for photos by keywords on their web site.

December 23, 2008
Numbers in the news

Journalists love statistics. They inject precision, clarity and reality into news stories. Or do they? Author Michael Blastland thinks numbers in the news -- especially really big numbers cited by politicians and the press -- often do more to confuse than enlighten. He observes in a recent interview on the NPR show On the Media:

"I think the problem with that number [$700 billion for the bailout] and a lot of the numbers that are hurtling around at the moment about the American economy is that they just have a heck of a lot of zeroes on the end, and probably most people's facility with numbers disappears as soon as they get to something bigger than their mortgage."

Blastland calls on journalists to explain statistics so they can be understood. For example, it's easier to relate to $700 billion if you express it as about $2,000 for every person in the country. Does that seem like a big burden for each citizen to carry? Blastland asks you to consider that obligation in comparison to the per capita size of the U.S. economy, which is $50,000.

Putting numbers in context sounds like good advice for the media to follow. 

Michael Blastland is co-author of The Numbers Game: A Commonsense Guide to Understanding Numbers in the News, in Politics, and in Life. He also co-authored the commentary, "The worst junk stats of 2007," a Times of London piece on the the most dubious numbers of the year.

Thumbnail image for PopMech.jpg The Google archiving juggernaut rolls on. Recently the giant Internet company announced the opening of its Life Magazine photo database. Now Google is adding vintage magazines to its growing Book Search collection. You search the full text of scanned articles for keywords, retrieve the journal and view the pages in a PDF-like reader. You can also use the "Advanced Search" option to narrow your research to specific magazine titles and publication dates.

So far the list of scanned magazines is limited to a few dozen diverse titles, including Ebony, New York Magazine, Baseball Digest, Prevention and Popular Science (which goes all the way back to 1872!). More are coming as Google completes agreements with publishers.

I tested the system with the search term "sacramento" and found an article in the Sept. 1922 issue of Popular Mechanics, entitled "United States to have another big ship canal". It describes in breathless detail the proposed 30-foot deep, 35-mile long canal connecting Sacramento to the Pacific Ocean by way of San Francisco Bay. The article notes the huge amount of mining and agricultural products shipped through Sacramento (illustrated by the photo below).

Thumbnail image for Sac Riverfront.jpg

CNNhologram.jpgThe days following a presidential vote is the time for media watchers to critique election night coverage. TV and online news organizations laid out a veritable smorgasbord of cool technology for viewers: live blogging, email alerts, twitter feeds, big board computer displays and even a Star Wars-type hologram. It was all pretty overwhelming.

As a data guy, I want to give kudos to the New York Times -- which may not have had its own Princess Leia -- but did compile a large number of national and state election results into an easy-to-use table and interactive map. (Both of these are still on their web site.) The NYT electoral college chart was simple and elegant. 

Presidential votes were displayed in five columns: states expected to be won easily or narrowly by one or the other candidate, plus battle ground states. At any time during the night, you could see the current state vote tallies, as well as electoral vote projections by about a dozen news outlets. The NYT's interactive map was equally impressive. It allowed the user to zoom in on a state and see each county color-coded red or blue. Put your cursor on a county and up popped the current vote count and percent. That feature let you easily see how well a candidate was doing in the rural and urban areas of a given battle ground state. Nice job. 


About Data Surfer

It's all about information -- statistics, documents and data of all types that help us understand the world, make informed decisions and monitor government. It's about empowering citizens with tools and sources so they can conduct their own investigative research. This blog is a place to discuss information that's available on the Internet. What's relevant, useful, valid and accurate -- and what's not.

We know the Sacramento region is home to knowledgeable people who use online information in their respective fields. We want to hear from you. Please tell us what you think of the data we use in stories and post on The Bee's website. And share tips about online resources you think are valuable to this blog's readers. Post comments on this blog or contact Pete Basofin directly at pbasofin@sacbee.com.

June 2010

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30