What did we learn ?
Processing large amounts of web data
- No such thing as too much RAM
- Disks fill up too quickly
- 1?sec / word = 1 Alpha day
>>> Moore’s Law might save us
Web is a distributed database we should exploit
- Need a lot of tools to process HTML, XML, CSS, JPGs, GIFs, URLs, ...
- Web computation = Unreliable computation
- Structured-text (eg. HTML, XML) is a very “different” class of data type
>>> Need some reusable higher-level abstractions and tools
-
-