Following the liberalization of their copyright policy earlier this year C-SPAN is now publishing a new index of its House and Senate floor proceedings — The C-SPAN Congressional Chronicle. According to them the video recordings are matched with the text of the Congressional Record as soon as the Record is available. It only includes members who appeared on the floor to deliver or insert their remarks. The text included is what the member submitted. Each appearance has a video link where users can watch and listen to the actual statements. This is great progress!
update see also the sunlight post, and notice the link back in list-by-day descriptions here on metavid
This is a big step, providing a slew of additional timed “metadata†(bill data, index to congressional record) that we can use to enrich the metavid archive. The C-SPAN site is using the Congressional Record with archivists manually syncing up the record with the daily proceeding at per speaker granularity.[1] The closed caption based search which Metavid uses allows people to zero in on matching sections of video quicker but the official record is generally more accurate. Using both should greatly enhance the metavid search functionality and may help illuminate the revision and extension of remarks that congress people are always taking about.
The video C-SPAN is providing doesn’t currently integrate well into the blogging conversation – there doesn’t appear to be any way to embed it into a blog post. While the footage quality is a big step up from the 120×160 used on the main C-SPAN site, there doesn’t appear to be any broadcast resolution footage immediately available (except if you pay through the nose that their archive/store). Also it seems C-SPAN is in the early stages of populating their content as notallthevideo is available online yet. The metadata on Congressional Chronicle does not currently appear to be made available in a easily [re]usable format. We’d like C-SPAN to directly make it available in XML, but if nothing else the data can be scraped from the current site and then secondarily made available in XML.
This is a very exciting step forward from C-SPAN. We hope this progress will continue with C-SPAN making all their government coverage source mpeg2 files directly available like the mpeg2s metavid has been posting to archive.org. And we hope they expand the Congressional Chronicle archive to include all of the committee video and metadata. This will allow Metavid and other video projects to focus more on high level functionality such as tagging, collaborative video remixing, advanced search, representative/issue syndication etc.
Each close caption segment is a few sentences. When doing searches keep in mind that a sentence may break mid stream so exact phrase match may be missed in broken sentences. The metavid search process currently uses mysql FULLTEXT SEARCH in boolean mode. This means that you have a few parameters that you can control in doing search queries. For example if you search for “iraq war” that will match the exact phrase, or if you search for +iraq -war that will find only instances where iraq is mentioned and war is not mentioned. You can see the full documentation on boolean full text searches in mysqls documentation.
I have also updated the search page and front page to highlight popular queries that people have been making.
update: you can now also jump to any available day on the search page.
Metavid Wiki - Real Time Collaborative Semantic Audio/Video Metadata in MediaWiki. Metavid Wiki is the in development second iteration of the Metavid archive project (metavid.org). Metavid Wiki builds off of the MediaWiki code base, the semantic wiki extension and the first version of the metavid. Metavid Wiki employs a structured temporal name space with the wiki page model to enable dynamic, versioned metadata for arbitrary stream segments. Congressional data along with stream metadata is used to build a rich query space for stream segment search. These segments can then be recombined into versioned sequences and referenced internally or externally
update: for audio podcasts of the event check out wikipedia weekly
The House and Senate are off this week for Independence Day observence; we’re using this break as an opportunity to break in our brand new admin interface, which makes the numerous behind-the-scenes tasks involved in transcoding and bringing the footage online much more streamlined. By the time Congress returns on the 9th, we expect to be caught up and able to get new footage online within a day or so of air — hopefully sooner.
The House and Senate are off this week for Independence Day observence; we’re using this break as an opportunity to break in our brand new admin interface, which makes the numerous behind-the-scenes tasks involved in transcoding and bringing the footage online much more streamlined. By the time Congress returns on the 9th, we expect to be caught up and able to get new footage online within a day or so of air — hopefully sooner.
Metavid has started to put all the original mpeg2 captures onto archive.org. (Previously we were just removing them because of space considerations) But now when ever you pull up a recent stream you should now have access to the mpeg2 original. For example the senate proceeding for june 4th now links to its associative stream on archive.org. Archive.org also makes the stream available in other formats such as flash flv and mpeg4. To see a list of all the streams available on archive.org so far you can check the U.S congress category.
Watch this space for more interesting collaboration with archive.org in the future:)
Let me be the first to call for more nursery rhymes on the floor of congress. I’ve been looking through ‘One Minute Speeches’ for gems, and found this one:
Let me be the first to call for more nursery rhymes on the floor of congress. I’ve been looking through ‘One Minute Speeches’ for gems, and found this one:
For those of you wondering where the clips of the new 110th congress are, well…. The bad news is that we’ve had some trouble importing the metadata from outside sources and are entering it in by hand, a tedious process which has made it difficult to run the image_crawler scripts which find the ‘person id’ and other useful metadata. The good news is that we have been capturing, so we have the footage - it’s just waiting to be scanned. It will be up soon!
For those of you wondering where the clips of the new 110th congress are, well…. The bad news is that we’ve had some trouble importing the metadata from outside sources and are entering it in by hand, a tedious process which has made it difficult to run the image_crawler scripts which find the ‘person id’ and other useful metadata. The good news is that we have been capturing, so we have the footage - it’s just waiting to be scanned. It will be up soon!