Blogmarks : Public marks with tags corpus & web

January 2007

dbpedia.org - Using Wikipedia as a Web Database

dbpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. dbpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

web wikipedia sémantique corpus

September 2006

Official Google Research Blog: All Our N-gram are Belong to You

by parmentierf & 1 other (via)

Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.

web texte google corpus taln

March 2005

start [WaCky]

by parmentierf (via)

The WaCky Project is a nascent effort (I always liked the expression nascent effort) by a group of linguists to build or gather tools to use the web as a linguistic corpus.

web search corpus

1 (3 marks)

PUBLIC MARKS with tags corpus & web

January 2007

dbpedia.org - Using Wikipedia as a Web Database

September 2006

Official Google Research Blog: All Our N-gram are Belong to You

March 2005

start [WaCky]

PUBLIC TAGS related to tag corpus

Active users

Blogmarks.net is a social bookmarking service.

Already an user?

PUBLIC MARKS with tags corpus & web

January 2007

dbpedia.org - Using Wikipedia as a Web Database

September 2006

Official Google Research Blog: All Our N-gram are Belong to You

March 2005

start [WaCky]

PUBLIC TAGS related to tag corpus

Active users