Categories are like tags, special links that group together similar things. When applied to annotation layers, (annotation layers describe a particular period of time in a Metavid stream with a start and endpoint), categories create collections of video.
One quality nearly every category shares is incompleteness — because there is so much content (over 3000 hours) and only a few of us tagging content, many categories are missing important content. As you watch speeches on Metavid, spending the extra 5 seconds to categorize them will make them easier to find in the future. If there’s a category you find interesting or appealing, feel free to expand it yourself. If you find a speech and there’s isn’t a category that makes sense for it — start a new one!
Check out this tutorial for more information about tagging clips. You can find a list of current media categories here.
The US House, having just passed the stimulus package after hours of contentious debates and procedural motions, moved on to more serious business: congratulating the winners of the Super Bowl. Here’s a short clip of somebody off-camera airing their opinion. This somehow didn’t make it into the official record. The gem is @01:26.
At present, the #1 most discussed political clip on YouTube is a very short 16 second clip of Chuck Schumer claiming Americans don’t care about pork barrel spending.
The neat thing about Metavid is that because we archive the full day of proceedings, we can take that same 16 second clip and expand coverage from either side and provide context to an otherwise self-encapsulated sound byte. Here we can see Chuck’s quote as part of a larger rhetorical flourish; he calls for the removal of the pork spending and highlights what he sees are the important elements of the bill. Here is the the clip on Metavid:
The ability to dig deeper and investigate a given argument that is being presented is fundamental to understanding what is really taking place. This is why a citation framework for web video is so important for healthy deliberation. This way, the context (and contextualization) of a given source document can be investigated. Tools like Metavid open up this citation process for continued dialog in contrast to allowing the clip fragment to act as the final word.
Although it’s a bit late coming, I’m happy (and somewhat relieved :P) to announce that we’ve finally brought the all 2009 footage of the new 111th congress online, from Jan 6 to the present. Now that our capture system is fully functional (see earlier post), we plan on having new proceedings in the Metavid system within 24 hrs. Anyway, we encourage you to dig through the last few weeks of footage. Some highlights include:
I’m happy to announce some improvements we’ve made in our capture architecture. We’ve spent the last couple months implementing our new capture and transcode system. We have cut out some unnecessary complexity and updated or replaced many of our core components. For instance, we’ve switched our OCR over to Google’s Tesseract and are getting a considerably higher hit rate reading names off the scree. This makes it easier for you to find speeches by a particular person.
We’ve also overhauled our methods for capturing and encoding closed caption text — our text metadata holds sync much better to video, even across those long 12hr debates. New streamlined work-flows have saved us several hours per day in transcode time, and now require less manual intervention. This means we can bring content online faster and more reliably with less work. And naturally, we’re using free software every step along the way.
There is a bit more to do to clean up nagging bugs (the crucial ones have been squished) and to document & generalize what we’ve done so that other archival projects can take advantage of it.
Following the liberalization of their copyright policy earlier this year C-SPAN is now publishing a new index of its House and Senate floor proceedings — The C-SPAN Congressional Chronicle. According to them the video recordings are matched with the text of the Congressional Record as soon as the Record is available. It only includes members who appeared on the floor to deliver or insert their remarks. The text included is what the member submitted. Each appearance has a video link where users can watch and listen to the actual statements. This is great progress!
update see also the sunlight post, and notice the link back in list-by-day descriptions here on metavid
This is a big step, providing a slew of additional timed “metadata†(bill data, index to congressional record) that we can use to enrich the metavid archive. The C-SPAN site is using the Congressional Record with archivists manually syncing up the record with the daily proceeding at per speaker granularity.[1] The closed caption based search which Metavid uses allows people to zero in on matching sections of video quicker but the official record is generally more accurate. Using both should greatly enhance the metavid search functionality and may help illuminate the revision and extension of remarks that congress people are always taking about.
The video C-SPAN is providing doesn’t currently integrate well into the blogging conversation – there doesn’t appear to be any way to embed it into a blog post. While the footage quality is a big step up from the 120×160 used on the main C-SPAN site, there doesn’t appear to be any broadcast resolution footage immediately available (except if you pay through the nose that their archive/store). Also it seems C-SPAN is in the early stages of populating their content as notallthevideo is available online yet. The metadata on Congressional Chronicle does not currently appear to be made available in a easily [re]usable format. We’d like C-SPAN to directly make it available in XML, but if nothing else the data can be scraped from the current site and then secondarily made available in XML.
This is a very exciting step forward from C-SPAN. We hope this progress will continue with C-SPAN making all their government coverage source mpeg2 files directly available like the mpeg2s metavid has been posting to archive.org. And we hope they expand the Congressional Chronicle archive to include all of the committee video and metadata. This will allow Metavid and other video projects to focus more on high level functionality such as tagging, collaborative video remixing, advanced search, representative/issue syndication etc.
Each close caption segment is a few sentences. When doing searches keep in mind that a sentence may break mid stream so exact phrase match may be missed in broken sentences. The metavid search process currently uses mysql FULLTEXT SEARCH in boolean mode. This means that you have a few parameters that you can control in doing search queries. For example if you search for “iraq war” that will match the exact phrase, or if you search for +iraq -war that will find only instances where iraq is mentioned and war is not mentioned. You can see the full documentation on boolean full text searches in mysqls documentation.
I have also updated the search page and front page to highlight popular queries that people have been making.
update: you can now also jump to any available day on the search page.
Metavid Wiki - Real Time Collaborative Semantic Audio/Video Metadata in MediaWiki. Metavid Wiki is the in development second iteration of the Metavid archive project (metavid.org). Metavid Wiki builds off of the MediaWiki code base, the semantic wiki extension and the first version of the metavid. Metavid Wiki employs a structured temporal name space with the wiki page model to enable dynamic, versioned metadata for arbitrary stream segments. Congressional data along with stream metadata is used to build a rich query space for stream segment search. These segments can then be recombined into versioned sequences and referenced internally or externally
update: for audio podcasts of the event check out wikipedia weekly
The House and Senate are off this week for Independence Day observence; we’re using this break as an opportunity to break in our brand new admin interface, which makes the numerous behind-the-scenes tasks involved in transcoding and bringing the footage online much more streamlined. By the time Congress returns on the 9th, we expect to be caught up and able to get new footage online within a day or so of air — hopefully sooner.
The House and Senate are off this week for Independence Day observence; we’re using this break as an opportunity to break in our brand new admin interface, which makes the numerous behind-the-scenes tasks involved in transcoding and bringing the footage online much more streamlined. By the time Congress returns on the 9th, we expect to be caught up and able to get new footage online within a day or so of air — hopefully sooner.