Notes on Hahn 2008
Commercial
companies like google do not seem as invested in long term preservation as
librarians. Librarians need to set the best practices
Concern
is with the programs for mass digitization like google, yahoo and otehrs
Five
areas of concern:
- pace
- is everything happening too fast to think through policy
- lists many projects that are happening too fast (need to find out the status of these projects
- foolish risk versus vision
- general paradigm is full speed ahead of clean up any mess later
- opt out policy by google stating that people have to tell them they do not want to be included
- justification for digitizing
- she asks why digitize the books? who is using books for research anyway?
- unusual collections are being digitized to give access to those around the world (21)
- Google is digitize everything "the more of it, the better" (p. 21)
- she wants us to stop and think, prioritize and make sure we are meeting our long term preservation goals
- trust and
- many argue that digitization is the best form of insurance (p. 21)
- she argues that current digitization is too slapshod and does not follow standards and values of preservation
- need google as partners but need them to do a better job and remember they are following a corporate model
- Quality--she asks, is mass digitization preservation? (p. 21-22) But others argue that it isn't meant to be preservation but simply access
- google is not following good preervation values
- Secrecy--google is being secret about what they are doing
- stability--could google disappear go bankrupt?
- leadership
- 2004 report form the Association of Research Libraries endorses support of digitization as a preservation method
- she argues that information professionals must uphold standards and best practices (p. 23)
- no hits for "preservation" and google in search engines
Questions
- What is the current situation with google's digitization project?
- Is there no concern with copyright issues? She states that google and publishers agree that "intellectual property rights should be respected" (p. 20). However, many lawsuits suggest people aren't convinced
- Open access? Who will own what?
- How is the semantic web/linked data involved?
Maybe
she is coming at this from the wrong angle: google is not in the business of
preservation so why focus on that?
Responses
Piper,
2013
"We
should not look to google to do what libraries do best" (p. 23)
Instead
lets look at HathiTrust and Digital Public Library of America whose goals are
"to preserve, digitally, the great and rare collections housed in
libraries, musems and other cultural institutions" (p. 23) while also
allowing free access
Google's
purpose, "to create a comprehensive, searchable, virtual card catalog of
all books in all langauges that helps users discover new books and publishers
discover new readers" (p. 23)
Different
goals!
Hathitrust
was created in 2008 by 12 universites of the committee on institutional
cooperation and the U of California library consotrium and today has 60+
libraries. Libraries can join if they have content that contributes to the
whole and/or if they are willing to pay. So far its all academic libraries plus
the New York public library, the library of congress and the getty research
institute (feels very exclusive--one way of keeping poor people from access).
Some early scanned documents by google included are poor quality but Hathi is
committed to rescanning. They contain at least 11 million volumes with 31% of
the work in the public domain and can be viewed or downloaded by anyone
DPLA
launched in 2010 but there is some quesiton about what is a "public"
library.
Defines
a public library
"Public
libraries are many things to many people, and a large percentage of these are
social. Library as space has always been a critical function of a public
library, as is its role within a community. Public libraries host numerous
events, discussions, and training. In addition, they are gathering spots, nodes
of connectivity, and (in some cases) refuges. They are arbiters of censorship
and privacy." (p. 24).
SF
public library is a member of DPLA. Does not really haave stated goals
My
response
Both
Rothenberg (1999) and Hahn (2008) raise interesting issues about what role the
digital library can/will play in preserving records.
Questions
I have
- What is the current situation with Google's digitization project? And what alternatives are out there?
- What are the concerns with copyright issues? She states that google and publishers agree that "intellectual property rights should be respected" (p. 20). However, many lawsuits suggest people aren't convinced
- Open access? Who will own what?
- How is the semantic web/linked data involved?
- What is already lost?
Goethals, Oury,
Pearson, Sierman et al., (2015) discuss the work of the IIPC (International
Internet Preservation Consortium) group, noting that there are still many
challenges to digital preservation. They did a survey across their 46 members
(mostly libraries) and found that while most, 78%, have a preservation policy,
18% do not. And, within the group that has a preservation policy, about 33% say
that their preservation policy has a specific web archiving focus. The article
presents some very interesting information about the actual preservation
strategies that the libraries are using and their risk factors, raising the
trust issue that Hart and Liu (2003) discuss in "Trust in the Preservation
of Digital Information".
And they raise an
additional interesting question: how should the preservation materials be
accessed and presented to the user?
Another interesting
issue I came across when reading a New Yorker
article on preserving web materials (Lepore, 2015) is the process of
identifying web materials to be archived. Lepore notes that San Fran has a
Wayback Machine (sounds like something from a sci-fi flick), and the article
provides real life examples of the reasons for archiving. People can always
erase web information, and it is difficult
if not impossible to retrieve the original. Lepore notes that
"BuzzFeed deleted more than four thousand of its staff writers' early
posts, apparently because, as time passed, they looked stupider and
stupider".
But that raises
another issue, should we be archiving every page on the web, every day? Every
hour? Lepore (2015) notes that the
average life of a web page is 100 days. The Internet Archive, the owners of the
Wayback machine, are apparently trying to archive the web using a combination
of robots that crawl the internet making copies of web pages it finds and
librarians that submit suggestions. The Wayback machine, on average, takes a
shot of each web page it locates about every 2 months, which raises the
question, is that often enough?
Goethals, A., Oury,
C., Pearson, D., Sierman, B., & Steinke, T. (2015). Facing the challenge of
web archives preservation collaboratively: The role and work of the IIPC
preservation working group.
D-Lib Magazine,
21(5), 4.
Hart, P.
E., & Liu, Z. (2003). Trust in the preservation of digital information. Communications of the ACM, 46(6),
93-97.
Hahn, T. B. (2008).
Mass digitization: Implications for preserving the scholarly record.
Library Resources & Technical
Services,
52(1), 18.
My
response 2
After
a bit of research, I did find that the Author's Guild is taking the mass
digitization case to the Supreme Court, though it isn't clear whether the
supreme court will hear the case or not (O'Brien, 2015). Currently, it seems
like the mass digitization efforts tend to be given the benefit of the doubt in
the courts (O'Brien, 2015). O'Brien mentions HathiTrust Digital Library (HTDL)
as the library alternative to Google Books, and apparently, it built on the
Google Books Scanning project, and currently holds 13.7 million volumes. Of
these materials, almost 40% are available to the public. Another mass project is the Internet Archive
(IA) which started in 1996
Comments
Post a Comment