Essays & conversations about constitutional moments on the Net collected by the Berkman Center.

The Need for a "Knowledge Web" for Scholarship

Essay by Carolina Rossini, February 6, 2009


In many ways, our society is not taking full advantage of the generative possibilities the web brings to us. As James Boyle has noted, the networks bring enormous transformative power to us as consumers – of shoes, music, hotel rooms, gambling – but the systems by which we perform science and education remain remarkably untransformed, with few exceptions.

Resistance to network effects is not an innate feature of science or education. Indeed, both fields appear to be optimal places for peer production and network effects – both are in their own way already peer produced, significantly funded by public monies, and considered to be in the public interest. The beneficial potential for open networks and scholarship has been widely noted for years.

However, for all this potential to be realized, the Web has to be seen as a content management system – a knowledge web – and not simply a vast forest of web pages and hyperlinks. The existing web builds on the idea of a fragile, but massively scalable, system.  Thus, although we make links that break— the familiar “404 not found”— the massive total number of links means that the web itself is solid. The Net ignores the importance of durability in favor of explosive growth. Scholarship is an example that, in many ways, still lags behind in this revolution.

Because the scholarly producers and users do not have tools or methodologies that allow them to share their knowledge, to make billions of fragile links, to combine their work effectively or, many times, to manage copyright and contract barriers, the power of a sharing system is not fully realized. Nor is innovation.

A knowledge web needs to be capable of much more than linking and searching. It asks for more power in the individual link, and requires different balances between fragility and durability, allowing, for example, content “genealogy” – who had an idea first, and where? It also asks for new writing methodologies.  How do we replicate the writing methodology we observe in Wikipedia in order to connect a huge diversity of scientific information and, by doing that, generate knowledge? E-writing is very different than traditional scholarly writing, and scholars need to utilize these new writing methodologies as well as emerging infrastructures such as the semantic web or features such as annotations.

We also need to contextualize the incentives of scholars in the knowledge web. Scholars already share via publication in the paper world. And many users simply want to share -the desire to share and to build the “commons” independent of any single motivation – because they wish to fulfill the potential of the network. Thus, new impact factors also need to be developed if we want to see the full potential of the knowledge web realized*. Some help on this discussion is coming from the Open Access movement, which recommends, for example, requiring links to scholars’ online publications as a new field in the road for tenure.

It is well known that the acquisition and production of scholarly knowledge is a cumulative process that depends on human input, physical input and the information input. The informational input is founded on scholars’ continuing ability to access, collect and share data, primary scientific and technical literature and know-how. However, if these techniques of a knowledge web are not put in place, we face the risk of information overflow and low capacity of knowledge production.

As we strive towards the digital transformation of knowledge and the creation of a generative web for science and education, we will be faced with some of the key questions that underlie the Publius Project:  How will the knowledge web influence our conceptions of control over information, including who can produce, access, and distribute it?  How can we create new spaces for scholarly work and knowledge production on the web, while remaining cognizant of the need for new techniques, methodologies—and in some cases, rules—to govern that process?

One of the main elements of transfer of scholarly knowledge – peer reviewed papers - represents the biggest failure to make the leap to the web. While journals have migrated to the Web, they are just digitized version of paper. They are not “becoming digital”. The PDF versions of “papers’ are amorphous objects that promote cross-platform human readability but restrict machine readability such as text mining, semantic indexing, hyper-linking and direct integration with databases.

As noted by John Wilbanks: “… the human-readable paper is the least valuable format of knowledge from a cyberinfrastructure perspective.” To be integrated in the generative knowledge web “ we have to understand a important conceptual transformation that knowledge itself needs to be treated as something similar to software, something upon which computing happens and depends - and the implications of that transformation.”

*An effort of a group of scholars and Foundation that support Open Access  - such as the Open Society Institute – is working to develop these possible new metrics. Some initial results can seen here: New metrics for research outputs: overview of the main issues

Carolina Rossini is a Fellow with the Cooperation Research Group at the Berkman Center. She also coordinates a project on policy for Open Educational Resources in Brazil with the Open Society institute. Carolina holds positions at the Diplo Foundation as a fellow for the Intellectual Property and Internet Governance Program and at IQSensato as a Research Associate for the Access to Knowledge and Innovation Program.

Comments (3)

    • sohbet wrote:

      One of the main elements of transfer of scholarly knowledge – peer reviewed papers - represents the biggest failure to make the leap to the web. While journals have migrated to the Web, they are just digitized version of paper. They are not “becoming digital”. The PDF versions of “papers’ are amorphous objects that promote cross-platform human readability but restrict machine readability such as text mining, semantic indexing, hyper-linking and direct integration with databases.

      what is back?

    • mbaer wrote:

      The vision of a grand web of interlinked knowledge that is up-to-date, verifiable, truthful, and whose scientific value is greater than the sum of its part is no doubt compelling. However, while "the human-readable paper" may to some well be "the least valuable format of knowledge from a cyberinfrastructure perspective", I cannot help but appreciate those rare occasions when I read a paper or book that is coherent in itself and actually making a scientific point of its own. All to often, we simply refer to others, and selectably so, in order to make arguments just for their own sake.

      Maybe, after all, the web is REALLY best for buying shoes, pirating music, booking hotel rooms, and gambling and such.

      Plus, as an aside, someone with a Windows box should be so kind and create a version of the article refered to in the footnote that is readable for Linux guys, too -- in the spirit of open access, so to speak.

    • This seems to be in the same direction as Tim Berners-Lee in his recent TED talk on "Linked Data". It's hard to grasp how much could change, how much could be accomplished, from all this.

    Post new comment

    The content of this field is kept private and will not be shown publicly.
    CAPTCHA
    This question is for testing whether you are a human visitor and to prevent automated spam submissions.
    Image CAPTCHA
    Enter the characters shown in the image.