Metavid

Video archive of the US Congress

Help:FAQ

Contents

Use

How do I use Metavid?

There are a number of different ways to search, view and engage with our project. Perhaps you should check out our tutorials

Searching

I searched for a Member of Congress and it says there's no media there, why not?

Well, no speeches or clips have been associated with this search string. However, that does not necessarily mean that they don't exist within the archive. There are a few reasons that could explain why they could not be found.

It's possible that this member of Congress has not spoken in full proceedings, especially if they've been recently elected or appointed. Many members seem to do their best work in committee and don't necessarily make much of a show on the floor.

Our Person identification system uses an open source OCR application called Tesseract to read names from the overlays on ths creen. Our hit rate is fairly high, especially considering that the software is designed for reading print media, but it isn't perfect. Some names inevitably get matched to the wrong person, or not at all.

You can also try to find this member by doing a 'spoken text' search (as opposed to a person search) for their name -- if they are participating in a debate or have their name attached to a bill, it will likely come up. If you find that their name is indeed matching to someone else's, let us know and we'll see if we can tune the system.

I searched for a bill and it says there's no media, why not?

We don't have an elegent way of determining which bill is being discussed on the floor on a minute-by-minute basis. Speeches are hand tagged with speech data by Metavid volunteers. We will be adding in as much information as we can to help you limit bill searches to days in which congressional action occurred - until then you might look at some congressional data sites (govtrack, open congress) to help you hone in on what you're looking for.

If you do manage to find the bill, please consider tagging the relevant speeches for the next person to find. You can find a tutorial on speech annotation here.


Sources/Accuracy

Where does this video and metadata come from?

The video footage from the US Congress comes from a government produced feed that is made available to the Press Gallery[1]. Because this content is produced by government employees, it is Public Domain[1]. Along with the audiovisual feed, closed captions are also provided for the U.S. House by the National Captioning Institute[1] and by the Captioning Services Office in the senate [1].

At present, MetaVid digitizes these audio/video/text streams as they are rebroadcast (live) by C-SPAN. We use an open source OCR tool[1] to read names from the screen to determine who is speaking.

We also scan and scrape a number of other sources (both governmental and nongovernmental public web sites) such as bill summaries from THOMAS; these sources are cited inline.

How accurate is the information?

The video and audio streams serve as an authoritative document of what was said during these proceedings. The text layer provided by the government in a close caption stream often contains errors, as it is transcribed in realtime. Additionally since its transcribed in realtime the time sync is variable so when content is initially inserted it won't be perfectly in-sync with the audio. Some errors also acculturate as the analog close caption signal is converted to digital text.

For this reason and others, metavid is a wiki. It is our hope that with a small investment of time from many people we can build a more accurate record.

Note: What is said on the floor can be quite different from what appears in the official congressional record. Representatives often invoke their right to "revise and extend" their remarks. This changes what is recorded for that day and makes it difficult to do searches for this "revised" content.

Why use a wiki? How can I trust what people have changed/edited?

Because there are a number of imperfect methods and technologies involved in creating the text metadata, listed above, wiki technology is a powerful tool not only correct any errors or inconsistencies in this text record, but also to ensure that the history of chacnges is preserved. Furthermore, placing the text layer into wiki enables users to tag, annotate and extend what it said with categories, tags and links to outside sources.

Most contributions to the wiki will be positive: correcting sync and mistakes in the text layer, adding links to outside sources of data that might corroborate or contradict what is said in a speech. Furthermore, adding categories to speeches and debates enriches the archive and makes content easier to find.


Video

What formats of video does Metavid provide?

We provide 2 resolutions of Theora video and Vorbis audio using the Ogg container format. Our "low" streaming quality [ is ~300k/second 320x240, our "HQ" quality is ~900k/s 512x384. We also provide a fallback flash stream for users that do not have a quality theora decoder. The flash stream is ~300kbs 400x300 resolution. We also upload full broadcast quality MPEG2 to archive.org where its transcoced to mpeg1, mpeg4 & other formats.

How can I use Theora video in outside applications?

Most open source video editors generally support ogg theora natively, for example both linux only cinelerra and the cross platform Jahshaka support ogg theora in their base install.

The XiphQT libraries allow Theora to be played and exported from Quicktime powered editors, these include Final Cut, iMovie, and others for OS X. This plugin should also allow crossplatform support for Theora in the Processing programming environment.

For Windows platforms there are the Illiminable Ogg DirectShow filters, which should let you bring ogg Theora into Adobe Premier, Windows movie maker etc.

How can I embed video I've found into my blog?

Embedding Links are present in videos. Click the options button then embed or share to get the embed code.

Archive

What's in the archive?

Metavid's video archive consists of floor proceedings from the U.S. House and U.S. Senate. Because this footage is recorded by government employees, it is not subject to copyright protection. We began capturing Senate proceedings experimentally in November 2005. We began archiving the House and Senate more regularly in January 2006, and have been doing so since then with a handful of small gaps.


What's not in the archive?

Proprietary C-SPAN Content

C-SPAN occasionally cuts away from the gov't produced live feed when they feel the content is too boring for TV (yes, this threshold actually exists!). This includes House (but not Senate) votes, and sometimes during long Senate quorum calls. During these, C-SPAN will often cut to their own programming , which is protected under copyright and therefore not within the scope of our archive.

Committees

We do not currently archive Senate or House commitee hearings. Currently what is broadcast on C-SPAN is not public domain (as the committee hearings that C-SPAN carries are recorded by their own cameramen). C-SPAN has promised to release their content under a liberalized policy which many have interpreted to be CC-BY-NC.

Our archive is limited to public domain content. Open Data Activists such as Carl Malamud have posted public domain copies of committee hearings to archive.org. We look forward to being able to include these in the future.

I've noticed some issues with some of the earlier footage, what's up with that?

Quality from the first year or so varies, as we've struggled with a number of driver issues, dying capture cards and other difficulties. These issues range from garbled audio in long proceedings, garbled caption text, caption text that drifts out of sync from the video. Many, but not all of these problems can now be corrected Metavid users.

Since then, we have developed solutions to most of these problems, and the various open source programs and libraries we use have grown as well, resulting in a much more stable and accurate archive.


I've noticed some House proceedings are split up into 6-7 different streams. Why is that?

As we noted above, C-SPAN often cuts away from the public domain video feed during House votes. If, in a given session, they debate a bill or amendment and then vote, then debate, then vote, then debate... we end up having to split the video up into smaller bits to remove the proprietary content. On days where there are fewer votes -- or if the votes are held consecutively -- there will be fewer video streams.

How can I help?

There are many tasks that end users can do to help us improve the accuracy and importance of these congressional videos. Take a look at Help:Participate for more information.

Project

What does METAVID stand for?

Metavid is not an acronym, rather is a combination word, from the words meta referring to meta data and vid referring to video. That being said in keeping with software project conventions here are a few bacronyms: (feel free to add one):

  • Meta Enhanced Text Audio Video Interface for Democracy.
  • Democratic Interface for Video Audio & Text for Electronic Mediation (backwards)


Who is involved in the project? Where is it hosted?

Metavid's Principal participants are Michael Dale and Abram Stern (aka Aphid), both graduates of the Digital Arts and New Media MFA program at UCSC. The project is hosted at the University of California, Santa Cruz. For more info see the about page

How is the project funded?

Metavid is currently funded by a one year grant from the Sunlight Foundation. For more information see the About page.


Personal tools

MetaVid is a non-profit project of UC Santa Cruz and the Sunlight Foundation. Learn more About MetaVid

The C-SPAN logo and other servicemarks that may be found in video content are the property of their respective trademark holders. None of these trademark holders are affiliated with Metavid