FieldMarking

September 9, 2008

2008 Blogger Bioblitz Announced.

Filed under: bioblitz, biodiversity informatics, blogging, citizen science — Tags: — joel @ 3:43 pm

The 2008 blogger bioblitz is on for the week of Sept. 20 - Sept. 28. (Two weekends to work with!) Blindingly soon, yes, but what the heck.
A portal will be up next week with a data spreadsheet for download; instructions on conducting a blitz; and some basic browsing and querying capabilities.

Keen to put our semantic eco-blogging tools to use, the Spire project has volunteered to do this year’s data integration and analysis. If you want to share your observations, you will be able to contribute data any of 3 ways: by uploading your data spreadsheet; by maintaining an on-line spreadsheet (via, e.g., Google Docs); or by using Spotter to automatically generate an RDF record for each taxon observed. If you do one of the first two options, you’re data will be converted to RDF by rdf123. (Note: Spotter is currently broken on Firefox 3 - we hope to fix this shortly. UPDATE : Fixed.)

Our goal (beyond encouraging people to explore their natural environment) is to integrate data we receive with background and contextual data (e.g. invasive species lists, food webs, etc.), put it on a map, and make it browsable. Our broader goal is to develop technology that transforms bioblitz and eco-blog data into a global human sensor-net.

If you plan on participating, please either leave a comment on this site or send me email, so that we can link to your blog from the portal.

Many thanks to all who get involved!

February 27, 2008

A fizzled launch, but luckily this isn’t NASA

Filed under: biodiversity informatics — cyndy @ 11:59 am

Encyclopedia of Life logo
Encyclopedia of Life tried to launch yesterday but was immediately crippled by unexpectedly large crowds of visitors. David Shorthouse writes in the EOL Blog (which does still work):

We’re too Popular!
David Shorthouse
February 26th, 2008

You may have noticed that the EOL site has been flaky at best since approximately 12 EST this afternoon. Although we are serving the site from a load balanced cluster of several machines, we are experiencing phenomenal loads.

I just churned through the web logs from web machines in this cluster and there were 5.8M hits in the span of 3 hours. Most of these happened within 1 hour. We were down (and continue to experience intermittent access) for a few hours, then flipped the machines back on. Since then, there were an additional 5.7M hits, totaling 11.5M hits since 9AM this morning and it is now 2:45PM here. Wow!

We are working hard to resolve the issue so stay tuned and please have patience! I’ll post updates here as the day progresses.

I haven’t gotten a chance to see the site yet. My sources told me a month ago that it was done and they were shock testing it. I’m sorry they didn’t have the network infrastructure to handle the massive reaction from the public. On the one hand, it is embarrassing to be caught unprepared like this. On the other hand, it is testimony to the public demand for this kind of information (although one wonders how many journalists it takes to crash a website).

On the positive side, I expect the kinks to be worked out in the next few days. Unlike a failed NASA mission, the show can and will go on. The data are all still there and lessons learned can be applied next time. However, EOL anticipates a total re-engineering and so should expect many more bumpy roads ahead. For example, imagine the possible problems when the site goes semantic and is dynamically drawing information from other sites which are not nearly as well funded (it isn’t clear to me how much of the current implementation is dynamic).

Rod Page admits he is intentionally hypercritical in his review. Much of what he calls for is already planned, though he is concerned at the ability of the team to deliver.

I think the first release of EOL should have, at a minimum, provided at least as much information that I can get from iSpecies and Wikipedia. Other projects, such as Freebase, have pre-populated their databases with content from Wikipedia and other sources. Why didn’t EOL? If the argument is that they want authenticated content, then this doesn’t wash. Their authenticated content is minimal, and waiting for authentication will, in my view, cripple EOL.

EOL’s web site has no mechanism for people to extract data (e.g., RSS feeds, microformats, links to RDF, etc.). It’s intended to be read by humans, not machines. This greatly diminishes its utility.

The real question is how much the issues I’ve raised are things which are easy to fix given time, or whether they reflect underlying problems with the way the project is conceived.

I would point out that yes, the EOL is intended for humans not machines. The original sources from which the data come ought to be machine readable in the first place in order for EOL to get the data. That will be a huge challenge in itself, and a place where EOL can help. EOL eventually will be generating RDF, which itself is not difficult if you know how you want it to look. And then data harvesters will have to sort out which source is the best when the same data appear in multiple places.

Carl Zimmer, who wrote the New York Times blurb, sounds much more optimistic in his blog entry.

I would not be surprised that the interests of communities within biology drive a lot of the growth of the encyclopedia. If the kinks are worked out, it could be a tool that a group of people interested in, say, orchids, could use to store and study their data. Seen that way, it wouldn’t have to hit all 1.8 million species pages to achieve something important.

I could not agree more. The challenge, as I’ve stated before, is engaging those communities and providing tools (perhaps more than just one option!) so that they can not only easily create and moderate the content, but get some payback from it themselves. They don’t need to be on board with the EOL directly, but be producing content that plays nicely with EOL. Note that I have a vested interest. All of the projects I know have a hard time getting their communities on board, and they all have distinct aims and system architectures. We are all poised to see how we can funnel our efforts toward EOL without bankrupting ourselves. Can we use EOL to leverage success on our projects? It isn’t going to be easy, or cheap.

I do notice that the blog contains several observations I could get onto the semantic web by making SPOTs for them. For example, YouTube videos of honey badgers making tools in India, assisted by honeyguides, and allegedly causing problems in Basra, Iraq. Because these are very far removed from the original sources and have poor locality data, they are low quality observations. However, for demonstration purposes, they might be useful.

September 28, 2007

TDWG report

Filed under: biodiversity informatics, semantic web — cyndy @ 2:12 pm

For your reading pleasure, I’ve put an acronym- and technology-heavy post about the Taxonomic Databases Working Group meeting up at the Semantic Naturalist.

September 16, 2007

Announcing Spotter 1.0

On Friday we announced the release of our semantic blogging tool, Spotter. There are more details on our parent blog, eBiquity. This is the tool we’ve been using to add the little owl that links to RDF-formatted observation data on this blog. As long as you are using Firefox, you should be able to use it on any blog or any other web page where you can add links.

Please consider using Spotter and letting us know what you think. This is ongoing research and we need feedback to help improve our work.

September 13, 2007

Introducing The Semantic Naturalist

Semantic Naturalist logo -- dewy spider web
Our Spire colleague Allan Hollander of UC Davis’ Information Center for the Environment has launched a group blog, The Semantic Naturalist. Joel and I will join Allan there in tracking semantic technology developments related to biodiversity, so if that’s your interest, hope to see you there. We’ll keep this blog for actual observations or general biodiversity informatics news.

In related news, as Tim Finin notes, Peter Wayner’s article in yesterday’s New York Times featured our work. I’ll let others critique the article, but will say that it is nice to be cited as a concrete example of semantic web technology. We may not yet be reaping great rewards but we are making an effort to use real world data as openly as possible.

So far we’ve seen only a modest bump in traffic here on FieldMarking. Welcome, if you’ve found us through the NYT article.

September 6, 2007

Latest Encyclopedia of Life press

Filed under: biodiversity informatics, technology — cyndy @ 9:26 pm

Encyclopedia of Life logo
The New York Times published an Op-ed piece by E.O. Wilson today.

… a new project in biology, an ambitious effort to create a vast new electronic database of known species, should make it possible to discover the remaining 90 percent of species in far less than 250 years, perhaps only one-tenth that time, a single human generation. On May 9 of this year, a consortium of institutions from Harvard and the Smithsonian to The Atlas of Living Australia began compiling The Encyclopedia of Life, which one day will provide single-portal access to all knowledge of living organisms.

Simultaneously, an interview with David “Paddy” Patterson was published in the journal Nature:

In February next year, hopefully, there will be a major release of the first
edition of the EOL. The expectation is that within a ten-year period we will
have relatively well-informed pages on all 1.8 million species.
. . .
Some of the features we’re developing will be rather like wikis or the
social networking software out there. One of the things I would love to see
develop early on is a ‘my schoolyard’ function in which kids can go outside
with cell phones and take pictures of organisms and submit them to the EOL.
There, the pictures are sent off to experts who verify identification. And
when that is done, a little dot appears on Google Earth showing the presence
of, say, a daffodil in someone’s backyard.

My listservs, ecoblogs, and ADW staff emails are buzzing.

Some of my colleagues have been skeptical. We’ve heard these grand plans before, weren’t consulted about the technical details, and had no idea if there were opportunities for us. Donat Agosti plainly states that it is “a secretive project.” As I mentioned in a previous post, this “new” project rests on the backs of many of us who have been toiling, underfunded, for years to get information online and easily available to the public.

This latest PR seems to me an offensive in pursuit of more funding — but for whom? Even if EOL wants to keep the organizing membership elite (Harvard, Smithsonian, etc.), I’d urge the EOL organizers to reach out more to the community whose research should be informing their efforts. For example, parts of the My Schoolyard concept have already been tested by the BioKIDS project. And of course Spire is working on an even more Web 2.0 approach with our semantic Spotter tools.

I’ve been told that TDWG (Taxonomic Databases Working Group) standards are going to be followed. But there’s quite a lot of flux now. Will taxon names get marked up with TaxonX or TaXMLit? With the species microformat? I’ve also heard that there is a semantic web component planned — but haven’t seen the plan for how to mesh with existing efforts like ours (ETHAN, etc.) and the nascent Biological Observation standard. Others, such as David Shorthouse, have provided helpful suggestions.

Personally, I am less concerned about which standards and technologies are chosen as long as there is some sensible web service allowing the exchange of information. I am even willing to concede that semantics-lite approaches may mostly work. It is more important to me that we can support each other in building high-quality, user-friendly sites. I say sites because there will always be a role for special-purpose sites. Wikipedia is great but it hasn’t put everyone else on the web out of business. A portal like EOL won’t either.

June 12, 2007

Google Docs is tipping

Filed under: bioblitz, biodiversity informatics, technology — cyndy @ 11:42 am

Google docs

This is only tangentially related to wildlife observation, but some of you may be interested. Google Docs, the free online service that allows collaboration and maintenance of spreadsheets and word-processed documents, has been around since last October. After a flurry of reviews and some half-hearted suggestions we try to use it, it kind of faded from memory.

This week, however, three more or less unrelated projects I’m involved in have started to use it. For example, the Blogger BioBlitz spreadsheet. I did most of my heavy duty manipulation using Microsoft Excel, which is a good thing because I don’t think it is easy to grab a region and automatically fill in columns as you can in Excel. Rather than email my version to others for proofreading and whatever else needs to be done, I posted it to Google Docs and invited the other data junkies to work on it there too. I hope this will avoid problems with individual versions getting out of synchrony. Next I’ll invite all the bloggers who participated to take a look and let me know if they have changes. The best thing about it so far is the ability to see who’s been making what changes, and potentially even revert to a previous version.

Two other colleagues of mine have, for the first time, posted manuscripts online for us to collaborate on. So, for reasons unknown, suddenly Google Docs appears to be worth trying, suggesting to me that the application is tipping. Perhaps we just needed a few months of awkward attempts to solve our collaboration problems in other ways to convince us Google Docs might be better.

May 10, 2007

Encyclopedia of Life media push

Filed under: biodiversity informatics, citizen science — cyndy @ 2:03 pm

Yesterday I got scads of emails spreading the news that the big Encyclopedia of Life effort is launching, and of course the media are properly excited. The YouTube video doesn’t hurt. EOL aims to deliver rich pages on every species ever described (1.8 million and counting), and to do it with flash and panache.

EOL builds on the work of so many who have gone before — the list of potential sources for their mash-ups is impressive. But for the most part these efforts have never had the kind of financial resources and talent available to this project.

It is incredibly ambitious. Pulling together information on species we know well will be hard enough — negotiating vast scattered resources whose structure varies. Often the information conflicts and somebody has to figure out how to handle those conflicts. That’s part of why we haven’t yet got a global resource like this. To pull together information on species we barely know will be even harder, as much of it is in rare books or in hidden museum drawers. These pages will be the most valuable, as they will be new to the digital universe.

They say it will take ten years to complete, but if they do it right it should never be finished — it will change as our knowledge grows, as controversies arise and are resolved and as the world itself changes.

Partnerships among academia and industry and the general public will make the difference. Unlike Wikispecies, which is a similar Wikimedia effort, the scientific community appears to be behind this because it is the product of a consortium of respected scientific instititutions. Industry brings the “entertainment” and interactive sides to this that will help draw the public in.

As a scientist, I’ll want to know whether you can really get DATA out of it — could I find what I need, download and conduct comparative analyses on it? Or would it be easier and safer to go to the original sources? Will it be clear where the scientific jury is still out?

As someone who has spent eight years working on web-based biodiversity projects, I want to know will it be straightforward for me to add my online databases to the mash-ups? Will financial resources be available to source projects to develop, maintain and integrate their information with the larger effort? EOL’s model of sustainability won’t work if everyone else’s funding dries up. Will they use community standards for data exchange or propose their own? Use traditional technologies or foster new ones?

For the public, the question is whether it will be easier to use Google or Flickr or Wikipedia, or any of the previous, local or topical efforts at species pages. And whether we’ll be prompted to engage with nature in ways other than browsing an attractive website.

The potential benefits, however, are enormous. Baba Dioum, the Senegalese poet, said, “In the end, we conserve only what we love. We will love only what we understand. We will understand only what we are taught.” Is EOL a quantum leap towards understanding the world’s biological diversity, or is it just hype?

I, for one, intend to help where I am able.

Powered by WordPress