The Artima Developer Community
Sponsored Link

Python Buzz Forum
RSS for bioinformatics

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
RSS for bioinformatics Posted: Sep 21, 2003 7:04 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: RSS for bioinformatics
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement
One of the reasons I wrote PyRSS2Gen was to experiment with RSS for data collection in bioinformatics. Last year I came across PubCrawler, which periodically searches PubMed and GenBank and emails you a summary of new matches to your searches. It's a nice idea, in part because managing that data yourself is error-prone.

The trend these days is to make that data available through RSS. With a good RSS client this should be better than email because it can accumulates all the entries over time, and the trackbacks would let you make comments about the hits, potentially sharing it with others. (This is all theory - I haven't used a high-end RSS client.)

During that time I also found a site which did some RSS feeds for PubMed searches, but I can't find it now. I did come across HubMed and my.PubMed which do have RSS feeds. (I tested both to find one of my papers; search for "dalke" with a refinement of "tcl". I found HubMed the easier of the two. It wasn't obvious how to refine a search in my.PubMed.)

In theory, a lot of searches could have RSS front-ends. What about a BLAST job run every week, where the RSS feed tells you about the new matches? What about an annotation system where you can comment on regions of a sequence and let others know about it. (DAS does some of that, but I would like it to integrate with other non-biology tools. I think it's close, and something to consider for DAS2.) And how does PIE's editing features fit into all this?

There's a few prerequisites to doing this. The first is a way to automate PubMed, GenBank, BLAST, and other searches. Biopython, bioperl and the other Bio* projects all do this to some extent, though I think our EUtils code contributed to Biopython is the most powerful. The second is support for RSS generation; not a hard task, but there are still a lot of incorrectly formatted RSS feeds, so we developed PyRSS2Gen.

The third is time and money, since there are too many interesting things to work on and too many bills to pay. And the last is access to end-users, which is essential to know that what we're doing makes sense in the real-world.

All of our clients the last few years have been chemists, not biologists. Chemists also do searches, but it's a bit different than in biology. There isn't anywhere near the amount of public data for chemistry as there is for biology. There is ACD and the other commercial databases, but very few are freely available, and I'm told that those databases are only a small fraction of the data locked up in the various pharmas and other chemistry companies. And outside of conferences it's rare for people to talk to people in other companies about their research. Even in conferences its often highly vetted by the laywers.

This means most of the data systems are local, with a larger diversity of servers. Any software must know how to integrate with Thor/Merlin, Isis, Unity, local Oracle schemas, and whatever else might be hanging around. Since relatively little new data comes into the company compared to in-house generated data, it's often easier for one researcher to talk to the person doing the experiment instead of going through a computer system. Only in the large pharmas will you start approximating the problems that RSS and PubCrawler resolve.

This is another project we would enjoy working on, so if you are interested in funding us, let us know!

Read: RSS for bioinformatics

Topic: Testing considered harmful Previous Topic   Next Topic Topic: parsing html, many2many, ice cream, foaf, fame, elingsh, software, servers, parents

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use