Eran's blog

Google Personalized Search Launched

Posted by Sep Kamvar on Google Blog:

With the launch of Personalized Search, you can use that search history you’ve been building to get better results. You probably won’t notice much difference at first, but as your search history grows, your personalized results will gradually improve.

This whole concept of personalization has been a big part our lives since some of the team was in grad school at Stanford. We shared an office, which happened to be the same one Sergey had used before, and we were pretty familiar with the research he and Larry had done. Related to their work, we thought building a scalable system for personalizing search results presented an interesting challenge. We’ve still got a long way to go, but we’re excited to release this first step…

Seems like i’ll have to let Google track my search history now if only to see how well this new feature works.

PS. for further information:
Personalzied Search Help
Google Groups announcement


Filed under: Search

citeRel comments

While I was away in the desert, Peter Janes dropped by and left a couple of comments:

All of the examples you provide have the pattern CITE A … /A /CITE; since you’re really talking about link types and A takes the REL attribute, why not just use it instead of adding the extra element?

We based the format on suggestions made by Tantek Celik in “The Elements of Meaningful XHTMLâ€? and in discussions following that talk. You can read more about that on Ryan’s early posts about the microformat.

CITE A stands for citing the linked source well enough so that a new Rel value for A tags is not necessary. This leads to using the CITE A compound as the basic format for citing a reference. To further enhance this format with specific types of citations we added the various Rel classes in citeRel.

According to Joe Clark […snip…] CITE is basically for titles and terms, although the HTML 4.01 spec seems to have relaxed/redefined that somewhat.

I believe CITE can be used to indicate the source of a quote as you can see in Tantek’s presentation mentioned above and in the XHTML 2.0 draft.

redundant “relâ€? prefix.

I agree that it is somewhat redundant and as I noted before I do plan to remove it once CITE tags have Rel attributes. Right now, though, I think it lends some explicit meaning to the class names. See another reason for using this prefix in my reply to your next comment.

as link types, the relationships should probably be suitable for both REL and REV.

Yes, I believe you are correct. In fact, the examples I cited should probably have used Rev instead of Rel. Based on the suggestions you made, we should now have two classes for replies, updates and forwards for example:

  • relReplyTo – the cited document is a reply to this one.
  • revReplyTo – this document is a reply to the cited one.

Filed under: MicroFormats

citeRel is the new citeVia

I started thinking about citeVia again after a conversation with Ashton the other day about tracking a conversation over many blogs (what some call ‘distributed conversation’). With citeVia, Ryan and I tried to expose the relation between two blog posts. We limited ourselves to just one type of relation but there are others: comments, replies and revisions are all possible types of related documents. We discussed this last night and we now want to expand citeVia.

Ideally we would use the rel attribute to signify the type of relation between the two documents but it seems that cite has no rel attribute under XHTML 1.0 (it does under 2.0). To stick to the rel idea we suggest renaming the microformat to citeRel and using the following class names:

A via (or hat-tip) link.
In reply to the linked document. Can be used for comments on the same blog or with a new post even on different blogs.
An update or revision of the linked document.

Use of citeRel
Under XHTML 1.0 use:
<cite class=�CLASS_NAME�><a href=�SOURCE_URI�>source</a></cite>

Under XHTML 2.0 we might be able to use a simpler structure (and drop the rel prefix from the class names):
<cite rel=�CLASS_NAME� cite=�SOURCE_URI�>source</cite>
Hopefully, this will be rendered in a similar manner to the previous example (with emphasis and a link).


  1. Via: <cite class=�relVia�>
    <a href=�http://example.com/blog/post=17�&gt; Mr. Example </a>
  2. In reply to <cite class=�relReplyTo�>
    <a href=�http://Example.com/your-blog-annoys-me/�&gt;
    this post</a>
    by <a href=�http://theRyanKing.com/�&gt; Ryan King </a>.
  3. This is an update to my
    <cite class=�relUpdate�><a href=�http://Example.com/blog/?post=17�>previous post</a></cite>
    on this topic where I claimed that:
    Cows can fly.
    Well, I now have proof!

We’re looking for more feedback on this issue so please, let us know what you think.

Filed under: MicroFormats

MicroFormat Adoption

Julian Bond has a good point about adoption of new standards and microformats:

Now there’s at least some work being done on creating standards and providing transports and displays for all of these. But the catch is not only are they missing implementations at the toolkit level, but they’re also missing applications that actually do something useful with them. But much much worse, for quite a few of them there’s no obvious immediate payback for any of the end user, community or a system or application owner. To take just one example of OpenReviews; why should I make the effort to write a review and post it on my website. And especially if there the associated systems don’t exist to pick them up, aggregate them and get the extra link love of people reading them elsewhere and clicking back to me.

via: Breadcrumbs

Filed under: MicroFormats

More on Object Mediated Social Networks

Jyri Engeström gave a talk at Reboot 7 (some coverage here) about object mediated social networks. The two summaries he mentions along with the power point presentation (includes notes; a presentation without notes is like an RSS feed without full text posts) make a pretty good case for this model.

A good example for how real social networks emerge through objects is blogs. Blog posts serve as objects through which people connect to one another. You can clearly see more and more social network tools added to blogs (blogrolls, XFN, etc.) because there is an actual need for those tools to represent the connections created by blogs posts and comments. The social networking infrastructure in the blogosphere was created to satisfy an actual need, not the other way around as we see in most YASNs.

via: zengestrom

Filed under: The Net

RSSMangler, Grep, etc.

A couple of RSSMangler updates:

  1. RSSMangler now greps! You can use regexps to show only matching items (currently matching on title and description). The request structure for grep is http://hellonline.com/RSSMangler/public/?act=grep&src=RSS_FEED_URI&param=PATTERN
  2. As you may have noticed from the previous line, RSSMangler is now available under the hellonline.com domain. I’m still working on some config issues between apache, rails and fastcgi but things seem to work for now.
  3. Full source code is available on the trac page.
  4. A bug i’m still working on: Feeding RSSMangler an RSSMangler URL seems to sometimes result in an endless loop. Fun!

So there you have it, one more power tool.

Filed under: Projects, The Net

Web 2.0 Week

Ashton writes:

Microcontent MicroFormats Monday

Tag Tuesday

Wiki Wednesday

Thirsty Thursday

Folksonomy Friday

Also from Ashton, a list of people’s favorite flickr tags, random sampling from TagTuesday participants:

Filed under: General, The Net

Tag Tuesday

It’s Tag Tuesday, we’re sitting in the shadow of the bay bridge after leaving Gordon Bierch for less noisy venues (i.e. some stone steps on the Embracaderro). Kevin Marks from Technorati is talking about tagging, imagine that! The following is an attempt at summarizing what followed, it wasn’t easy catching everything that was going on, so give me a break, eh? I’ll try to note who said what, please drop me a note or a comment with any corrections or additions.

Kevin Marks:
Technorati tags are distributed unlike other people’s (delicious, flickr, etc.) the advantage is that the user has better control.

Initial experiment: New Year’s resolution. Failed because people linked to pages they thought were cool instead of using actual resolutions. New version (using rel=â€?tagâ€?) is more explicit.

Why tags in blogs fail: people get too clever. Make special database fields for tags and Technorati cannot connect it to the actual blog post. Embed in HTML, keep it simple and it all works.

Matt Mullenweg: How many tags come from categories and how many explicit?
Kevin: Most Technorati tags still come from categories not from people actually tagging posts.

Next stepped up Stewart Butterfield (from flickr/Yahoo):
Sort by date is not as good unless big events show up (when it becomes very timely). flickr is coming up with a smarter sorting algorithm to determine how interesting a photo is, trying to show “goodâ€? photos. (Note: there are people jogging on next to us, very amusing.) It takes time to get ranking this way. Timeliness vs. quality becomes an issue.

Tag spam is an issue. Gets easier when you own the entire system (gives you more info) rather than someone like yahoo or Technorati.
Kevin: we still get good information and can do quite a lot of filtering.

Kevin: Spammer meta-meta data might be useful. Gives you information on relation between tags, example: a person, who was spamming with real-estate information, had enough posts to create a relation between ‘San Jose’ and ‘San Francisco’ tags.

Both Technorati and flickr are planning a future for geo-tagging.
Flickr is planning a separate table for geo-tagging information, this enables special handling for geographic information, allowing them to calculate distances, etc.

Tantek: Geo tagging is hard. Using coordinates ended up in a “mirrorâ€? of the US superimposed over Mongolia (because people confuse longitude and latitude).
Stewart: This becomes an easier problem when you have the resources of Yahoo to solve the problem. It’s possible to use landmarks, roads, etc. to let people find the right location.

Ryan King: Technorati’s developers’ page has a page on geo-tagging (updated. thank you Tantek and Ryan for the URL). GPS info has little relevance as it doesn’t include scale and radius of interest. There’s an artificial precision to latitude and longitude information because most people use a simple system to get coordinates of the city they’re in which gives a location in the middle of the city.
We need a geo-tag that’s one click verifiable.

A microformat, hopefully, is coming.

Evo from Jet-Eye: do you expect that people will start using auto-tagging, thereby flooding the Folksonomy with popular information?
Stewart: There’s always tension between free form and auto-tagging. Flickr supports auto-tagging for photos sent by email (popular example is “cameraphoneâ€?). Basically, any info on a photo is better than what we have right now.

Kevin: Technorati has 1.2 million unique tags out of 40 million tags. But these numbers are influenced by Technorati’s use of blog categories.
Stewart: flickr has 0.5 million unique tags out of 40 million tags. Numbers get better if you look at tags that are repeated (Note: maybe because this eliminates spelling mistakes and personal Taxonomies?)

Stewart: A common mistake, repeated by flickr was using space separated tags. Bay to breakers ended up creating “bayâ€? and “breakersâ€? tags. Comma separated is more familiar to most people and we can be even smarter than that!

Final note: There’s apparently a 7.0 earth quake some 300 miles away from the California coast and we should all go home or face the wrath of the ensuing tsunami. A small part of the group reconvenes at a private residence (names and addresses withheld to protect the innocent) where I promptly ignore everyone and work on this here blog post.

Update: Kevin has his presentation up.
Update2: Scott Beale posted more pictures.

Filed under: The Net

Tagging and Structure

As probably everybody knows by now, delicious has some new systems tags automatically added by the system when you bookmark various types of media files. As you can see in the following quote, they’re organized in a nice little hierarchy:

The tag “system:media:audio” includes

  • *.mp3 tagged as “system:filetype:mp3”
  • *.wav tagged as “system:filetype:wav”

The tag “system:media:video” includes

  • *.mpg tagged as “system:filetype:mpg”
  • *.mpeg tagged as “system:filetype:mpeg”
  • *.avi tagged as “system:filetype:avi”
  • *.wmv tagged as “system:filetype:wmv”
  • *.mov tagged as “system:filetype:mov”

via: delicious blog

Already we can see the price of using taxonomies. Just look at the comments to the above blogpost and count the number of requests for new file types. It’s easy to talk about the higher cost of maintenance of Taxonomies vs. Folksonomies but it’s much more convincing when you see the request list growing on a daily basis.

So delicious is now using a hybrid Folksonomy + Taxonomy model, that’s very interesting. It’s good to see that they’re not stuck on being a ideal tagging site based on pure Folksonomy and intent on banishing the evil that is Taxonomies. The right tool for the right job and in this case, a small and limited Taxonomy is just what the doctor ordered. I just hope they don’t let it spread too far or too wide into the existing system, we don’t need another Yahoo.

Filed under: The Net

Power Tools for the Web

[limbo@hellonline]$ cat /var/log/httpd/access_log | grep technorati | cut -d " " -f4,7,11 | sort | cut -f2,3 -d " "

The above line extracts page and referrer information from my access logs sorted by request time. It really isn’t very impressive by itself; I’m simply using it as an example of how useful small and simple tools can be when used together. Wouldn’t it be wonderful if we could do the same thing on the Web? Here’s a quick example:


This horribly complex URL (broken into 3 lines and un-urlencoded) uses Brian Suda’s hCal2iCal application to extract iCal information from Laughing Squid’s hCal enabled event page. The created iCal file is then sent to RSS Mangler where it’s converted to RSS. Why is this good? At the very least, it allows me to read about those events using bloglines. But why stop with one feed? Pull in as many feeds as you’d like, plug’em in an Events directory and there ya go! How about automatically creating a feed based on an OPML list of URLs? Does that sound like a good way to keep up with every one on your blogroll? Automated stalking just doesn’t get simpler than that.

The “glueâ€? that lets you do that on Unix is pipes, on the Web things seem to be more complex. XML is a good start, XSLT helps quite a lot especially with some XPath based filtering. The problem with these two is they’re much too complex to just use on the fly. I think that a set of small power tools like the RSS Mangler can evolve into something very useful. I don’t have a clear plan yet; I’m thinking smart but not too smart, general but not abstract, using RSS as the glue and microformats for the actual data.
Read the rest of this entry »

Filed under: Projects, The Net