Web Automation Challenges
WWW is fundamentally unreliable
- Internet failures
- Site failures
- Unpredictable bandwidth and latency
Pages are often un-typed
- Little or no (machine understandable) semantics
- Difficult to reliably extract information
- Pages keep on changing as sites develop
Resource discovery problem
- Where are resources located ?
- What are the data formats ?