{"id":2571,"date":"2009-08-19T19:18:35","date_gmt":"2009-08-20T00:18:35","guid":{"rendered":"http:\/\/www.theoreti.ca\/?p=2571"},"modified":"2009-08-20T23:21:53","modified_gmt":"2009-08-21T04:21:53","slug":"google-book-search-settlement","status":"publish","type":"post","link":"https:\/\/theoreti.ca\/?p=2571","title":{"rendered":"Google Book Search Settlement"},"content":{"rendered":"<p>The <a href=\"http:\/\/www.googlebooksettlement.com\/\">Google Book Search Settlement<\/a>, if approved by Judge Chin, may be a turning point in textual research. In principle, if the settlement goes through, then Google will release the full 7-10 million books for research (&#8220;non-consumptive&#8221;) use. Should get even the 500,000 public domain books for research we will have a historic corpus far larger than anything else. To quote the Greg Crane D-Lib article, &#8220;What can you do with a million books?&#8221; and &#8220;What effect will millions of books have on the textual disciplines?&#8221;<\/p>\n<p>There is understandably a lot of <a href=\"http:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=1387582\">concerns<\/a> about the settlement especially about the ownership of orphan works. The American Library Association has a<a href=\"http:\/\/wo.ala.org\/gbs\/\"> web site on the settlement<\/a>, as do others. I think we need to also start talking about how to develop a research infrastructure to allow the millions of books to be used effectively. What would it look like? What could we do? Some ideas:<\/p>\n<ul>\n<li>To be only usable by researchers there would have to be some sort of reasonable firewall.<\/li>\n<li>It would be nice if it were truly multilingual\/multicultural from the start. The books are, after all.<\/li>\n<li>It would be nice if there was a mechanism for researchers to correct the OCRed text where they see typos. Why couldn&#8217;t we clean up the plain text together.<\/li>\n<li>It would be nice if there was an open architecture search engine scaled to handle the collection and usable by research tools.<\/li>\n<\/ul>\n<p><strong>Update<\/strong>: Matt pointed me to an article in the Wall Street Journal on <a href=\"http:\/\/online.wsj.com\/article\/SB125080725309147713.html\">Tech&#8217;s Bigs Put Google&#8217;s Books Deal In Crosshairs<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Google Book Search Settlement, if approved by Judge Chin, may be a turning point in textual research. In principle, if the settlement goes through, then Google will release the full 7-10 million books for research (&#8220;non-consumptive&#8221;) use. Should get even the 500,000 public domain books for research we will have a historic corpus far &hellip; <a href=\"https:\/\/theoreti.ca\/?p=2571\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Google Book Search Settlement<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,9,28,16],"tags":[],"class_list":["post-2571","post","type-post","status-publish","format-standard","hentry","category-internet-culture-and-technology","category-literature","category-online-publishing","category-text-analysis"],"_links":{"self":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts\/2571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2571"}],"version-history":[{"count":4,"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts\/2571\/revisions"}],"predecessor-version":[{"id":2583,"href":"https:\/\/theoreti.ca\/index.php?rest_route=\/wp\/v2\/posts\/2571\/revisions\/2583"}],"wp:attachment":[{"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/theoreti.ca\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}