I recently mentioned that while I had used Reuters Calais to look for entities in Giorgio Vasari's "Lives of the Painters", I had something more interesting in the works, and here it is: BlogBigPicture. It lets you navigate a set of related blog entries based on the names, places, companies, movies, and other entities mentioned in those entries.
The default tab shows Hollywood gossip, but others have blogs and news about investing, English Premier League football, world business news, and U.S. politics. To get started on the Hollywood tab, click "Person" in the gray box on the right and then mouse over the names that appear. You'll see the titles of the entries mentioning that person highlighted in the main panel, where you can click those titles to read the entries. (Oh, that Amy Winehouse...) When I was doing the main work on this, Ashlee Simpson had just married what's-his-name, and the many entry titles that appeared when mousing over her name showed what a hot story they were that week in the world of Hollywood gossip. Using the same technique to evaluate hot news in the business world isn't quite as much fun, but ultimately much more valuable.
The news categories that I chose are just samples. I picked investment blogs and world business news because Calais is tuned for that subject matter, Hollywood gossip because it was fun, U.S. politics because there's a lot going on now, and Premier League Football because I wanted a sports category with international appeal that wasn't U.S.-centric.
BlogBigPicture is still pretty rough, and I have many ideas to improve it, but I decided that now that it works well enough for people to play with it, it was time to let them do so. Enjoy it, and let me know what you think!
Bob,
This is fantastic. I'm curious to know more about the architecture/implementation underlying it, if you have the chance. Are you regularly updating the feeds and running them through Calais and generating static Web content? Is there any dynamic discovery going on?
Lee
Sure, if by "regularly" you mean once or twice a day! It would be nice to have it happen more often (and of course, to let users choose their own RSS feeds and groups), but that's for the future.
The basic architecture is that the feeds get pulled down with feedparser, and after storing basic metadata about each feed and each entry in an RDFlib triplestore, each entry gets sent to Calais. The returned RDF gets stored in the triplestore, and then the interface is built from that.
greeting(s) sentient entities operating in the semantic web conceptual space ;-)
as part of the ongoing construction of ./lpkb, we have arrived as similar requirement albeit thru' very different assumptions and development paths.
given Natural Language Text ./lpkb can parse it i.e. generate a readable facsimile (not everything...every word... but 'enuf to make a readable copy)
i have no special training in natural language extraction or similar high end approaches but my requirements are a follows:
in: text
do: classify the text assign each word a cateogry
out: a structured archive of the parsing
(I would like this final stage to be in RDF-A)
(at one stage the parser did a round trip through jena for x-links)
(sparql queries below)
now lounging about all day "reading" is not what what ./lpkb is for, but I like the simplicity of just assigning one of four types to each word and being able to follow along. those four categories are [a] [e] (actor/event) which are usually verbs and nouns. [o] any other symbol not a noun/verb and [x] for x-link (these are in an "un-debugged" semantic network which was minded from the open-mind corpus).
the reason I mention of all of this is that it is:
-- well defined task
-- exists (un-web-ified) code
-- concerns natural language
if i am off base here or off topic *please* say so.
for your reference:
http://csksoft.com/RDF/tump.rdf
http://sparql.org/sparql?query=PREFIX+lpkb%3A+++%3Chttp%3A%2F%2Fhomepage.eircom.net%2F~cornagill%2Flpkb%23%3E%0D%0A%0D%0ASELECT+%3Fbigword+%0D%0AFROM++++++%3Chttp%3A%2F%2Fcsksoft.com%2FRDF%2Ftump.rdf%3E%0D%0AWHERE+%0D%0A++{+%3Fbigword+lpkb%3AconceptuallyRelated+lpkb%3Aread+}&default-graph-uri=&stylesheet=%2Fxml-to-html.xsl
Positively inspiring! You've demonstrated how easy it can be to slice and dice a lot of info, and more importantly, guide to more info!
I'm working on slicing/dicing/guiding around a WordPress MultiUser installation...this is a fantastic model for me to follow. Thanks much!
Patrick