opensource.google.com

Menu

Improving Freenet's Performance

Friday, May 29, 2009

The Free Network project is the community that creates and maintains Freenet, free software that allows you publish and obtain information on the Internet without fear of censorship by means of a decentralized, anonymous network. Since version 0.7 , the software has had built-in support for downloading and uploading large files. These are long-term downloads, which persist between restarts of the node. This support has improved performance and usability, but it has also meant that when lots of downloads are going on at the same time, Freenet uses a lot of memory, takes a long time to complete the startup process, and crashes if you queue too many downloads. By storing the current progress of uploads and downloads in db4o.com's open source object database (= a file on disk) rather than in memory, Freenet's memory usage can be greatly reduced, the end-user doesn't need to worry about running out of memory, we can have an unlimited number of uploads and tens of gigs of downloads, and so on.

To begin at the beginning, Freenet divides all files into 32KB blocks (called CHKs), which are each fetched and decrypted separately. Then we have a layer of redundancy, and various complexities surrounding putting files together and putting in-Freenet websites together, which makes up the client layer. Before the db4o branch, uploads were persistent, but downloads were restarted from scratch after every restart, pulling huge numbers of blocks from the datastore (on-disk cache). Worse, memory usage was rather large if you had any significant number of downloads on the queue. 

The db4o project puts the client layer (persistent downloads and uploads) into a database (db4o). I had initially hoped that this would be a relatively quick project, which shows how much I knew about databases then! We decided to use db4o in a fairly low-level way, specifically to minimize memory usage. We had heard from testimonials that some embedded applications had done this, but unfortunately this is not really the way that db4o is usually used, which caused some complications. Overall, the project took one developer most of a year, the final diff was over 46K lines of code covering 320 files, and went well beyond its original remit, solving many long-standing problems in the process. New architecture was required for optimal performance, including using Bloom filters to identify blocks we are interested in, a queue of database jobs, major refactoring in many areas of the client layer, a new system for handling temporary files, etc.

The effort was well worth it. Our client layer overall has vastly improved and Freenet now
  • starts up quickly

  • resumes work on downloads and uploads almost instantly on startup

  • can have an almost unlimited number of downloads and uploads

  • doesn't need the user to worry about or configure the maximum memory usage

  • doesn't go into limbo with constant 100% CPU usage desperately trying to scrounge a few more bytes

  • can insert DVD-sized files and huge websites (or git/hg repositories) on relatively low end systems

  • uses fewer file handles

This project would not have happened without support from Google's Open Source Programs Office. It will be one of the most important changes in version 0.8  of Freenet when it is released later this year, and current work includes Bloom filter sharing, a new feature that should greatly improve performance both for popular and rare content. Google is also funding that project, watch this space!

Web Storage Portability Layer: A Common API for Web Storage

Thursday, May 28, 2009

As discussed in our Google Code Blog post on HTML5 for Gmail Mobile, Google's new version of Gmail for iPhone and Android-powered devices uses the Web Storage Portability Layer (WSPL) to let the same database code run on browsers that provide either Gears or HTML5 structured storage facilities. The WSPL consists of a collection of classes that provide asynchronous transactional access to both Gears and HTML5 databases and can be found on Project Hosting on Google Code.

There are five basic classes:

google.wspl.Statement - A parametrizable SQL statement class

google.wspl.Transaction - Used to execute one or more Statements with ACID properties

google.wspl.ResultSet - Arrays of JavaScript hash objects, where the hash key is the table column name

google.wspl.Database - A connection to the backing database, also provides transaction support

google.wspl.DatabaseFactory - Creates the appropriate HTML5 or Gears database implementation


Also included in the distribution is a simple note-taking application with a persistent database cache built using the WSPL library. This application (along with Gmail mobile for iPhone and Android-powered devices) is an example of the cache pattern for building offline web applications. In the cache pattern, we insert a browser-local cache into the web application to break the synchronous link between user actions in the browser and server-generated responses. Instead, as shown below, we have two data flows. First, entirely local to the device, contents flow from the cache to the UI while changes made by the user update the cache. In the second flow, the cache asynchronously forwards user changes to the web server and receives updates in response.

By using this architectural pattern, a web application can made tolerant of a flaky (or even absent) network connection!

We'll be available at the Developer Sandbox at Google I/O to discuss the cache pattern, HTML5 development and the WSPL library. Check it out! If you have questions or comments, please visit our discussion list.

Support for Mercurial Now Available for All Projects Hosted on Google Code

You may recall that we recently asked for help from some early testers of Mercurial on Project Hosting on Google Code. As of today, all of our Project Hosting users can make use of this added functionality. For full details, check out the Google Code Blog. Better still, if you happen to be joining us at Google I/O, stop by the Mercurial on Big Table Tech Talk to learn more.

.