BLOCK VII - Arab Open University

Computing andNetworks

CONTENTS

Chapter 47 The World NetworkPlease download informationabout this TV programme fromthe Web site

Chapter 48 Operating Systems, ComputerArchitecture and Databases

Chapter 49 Persistent Objects, Streams andFiles

Chapter 50 The Internet, Security andNetwork Computing

Chapter 51 Hack the PlanetPlease download informationabout this TV programme fromthe Web site

Chapter 52 Software TechnologyPlease download this chapterfrom the Web site

Chapter 53 Code and CatastrophePlease download informationabout this TV programme fromthe Web site

COMPUTING:AN OBJECT-ORIENTEDAPPROACH

BLOCK VII

cLicensed for use by the Arab Open University

This edition produced 2003 for use by the Arab Open University

First published 1998, second edition 2000, third edition 2001, fourth edition 2002, fifth edition 2003

Copyright © 2003 The Open University

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, transmittedor utilized in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise.

Edited, designed and typeset by the Open University.

SUP 73299 8

1.1

CHAPTER 48

Operating Systems, Computer Architecture and Databases

ContentsIntroduction 4

1 System software 41.1 Operating systems 51.2 Utilities 9

2 Computer architecture 9

3 Large computer systems 123.1 An airline’s requirements 123.2 Mainframe computers and their operating systems 14

4 Dealing with large amounts of data 154.1 Databases 154.2 Querying a database 17

Review 19

Solutions to Exercises 21

Solutions to Questions 23

Glossary 25

Index 27

COMPUT ING : AN OBJECTCOMPUT ING : AN OBJECTCOMPUT ING : AN OBJECTCOMPUT ING : AN OBJECT ---- OR IENTED APPROACHORIENTED APPROACHORIENTED APPROACHORIENTED APPROACH

CHAPTER 48CHAPTER 48CHAPTER 48CHAPTER 48

CHAPTER 48 PAGE 3

Concepts revisited Although this chapter does not rely heavily on other chapters of the course, you should be able to recall the following:

> what is meant by the term ‘software’ (Chapter 1);

> what a system is (Chapters 1 and 27);

> what applications are (Chapter 1);

> how a programming language relates to executable machine code (Chapter 17).

If you need a quick reminder about software, systems and applications, look at the printed text of Chapter 1 (pages 15–22). You may also find it helpful to glance back at Section A of Chapter 2 in the Course Book, particularly at Figure 2-7.

For a discussion of how a programming language relates to executable machine code, see the printed text of Chapter 17 (pages 5–9).

New conceptsThe following key concepts are explored in this chapter.

> A computer system is, in effect, composed of several ‘layers’ of software built on top of hardware.

> The operating system can be envisaged as a software layer located directly next to the hardware. It serves as a mediating layer between the hardware and the application software.

> The architecture of a computer affects how fast the computer can operateand what operating systems it can run, as well as what devices can be attached to it.

> Large mainframe computer systems are required by organisations that need to store and process huge amounts of information quickly.

> Large amounts of information must be organised efficiently as computer data to enable the information to be processed quickly. A database is one way of doing this.

The first three of these concepts have been present implicitly in much of what you have already covered. The last two ideas have only been touched on peripherally, for instance in some of the TV programmes.

Planning your studyWe expect this chapter to take about 4 –5 hours’ study time, including time spent reading the Course Book. There are no formal computer-based activities associated with the chapter, but in order to identify the components of the computer system that have been discussed, you may want to find out more about your own computer and its operating system, perhaps by looking at the computer manual that came with it.

The parts of Section A in the Course Book that are about copyright and licensing are not required for the course, but as an informed consumer you should be familiar with them.

We advise you not to take the cover off your computer.

COMPUT ING : AN OBJECTCOMPUT ING : AN OBJECTCOMPUT ING : AN OBJECTCOMPUT ING : AN OBJECT ---- OR IEOR IEOR IEOR IENTED APPROACHNTED APPROACHNTED APPROACHNTED APPROACH

CHAPTER 48 PAGE 4

IntroductionSo far in the course you have considered computing mainly in terms of software, with the emphasis being on an object-oriented approach. In order to have a rounded picture of computing in general, you need a greater appreciation of the hardware components that make up a computer and the ways in which these components relate to the system software – the software that makes it possible to run applications. Moreover, you need to gain a sense of the size and complexity of large computer systems and the tasks that they are expected to perform.

The aims of this chapter are, therefore, twofold:

> to extend your knowledge of the computer ‘inwards’ beyond the applications towards the system software and the hardware;

> to look ‘outwards’ towards the wider world of large computer systems and their uses.

The chapter has several themes and is largely structured round a number of passages which you will read in the Course Book. After several of the readings, we shall go on to expand or reflect on the material that you have just read, so you should regard this chapter and the related Course Book material as an integrated whole.

Note that in this chapter we shall take the software side of computing for granted and we shall not bother about whether the software is object-oriented.

1 System softwareUntil now you have been concerned mostly with software applications and systems that are relevant to ordinary users, assisting them with a great range of tasks, be it in business, banking, education or the domestic environment. In this chapter, by contrast, we are going to focus, among other things, on the fundamental software that supports the applications, namely the system software.

In the Course Book, system software is defined as ‘software that helps the computer carry out its basic operating tasks’. The important part of this definition is the phrase ‘basic operating tasks’. It implies that system software is at a very low level: in the present context, this means ‘very close to the machine’, in that it is intimately linked to the precise components and structure of the computer. So, system software is not concerned with abstractions like bank accounts or aircraft flight plans, but with locations in memory, registers in chips, and so on.

Although you read Chapter 2 (Software and Multimedia) of the Course Book when you studied Chapter 1 of the M206 printed text, you should now reread Section B of Chapter 2 of the book so as to obtain an overview of the various categories of system software.

Review Question 1 Name at least three different operating systems for (a) personal computers and (b) mainframe computers.

Review Question 2 What are the five basic functions of an operating system? Which of these functions does your personal computer perform?

Course Book

When reading the section Computer Programming Languages, recall the discussion of languages and compilers in the printed text for Chapter 17 of M206.


CHAPTER 48 PAGE 5

Review Question 3 If you installed a new printer for your computer, how would the operating system of the computer ‘recognise’ the new device and how would it enable the device to be used?

(Solutions to the questions begin on page 23.)

Because of the importance of operating systems and utilities in the context of the present discussion, we consider them further below. In particular, we expand on operating systems at some length as they play such a key role in computing; a number of the ideas may seem familiar as they have been touched on elsewhere in M206, sometimes in different guises.

1.1 Operating systems

As we indicated in Chapter 1, the jargon of computing can be vague and obscure. The different nuances between ‘file’ and ‘document’, or between ‘program’ and ‘system’ often seem little more than annoying obfuscations. However, when one is trying to be exact about ideas, nuances matter. And this is the case in describing the characteristics of system software, such as those of operating systems.

Let us start with a statement which we shall pick apart and, in the process, we shall have to define carefully terms that are frequently treated as synonyms. The aims of this exercise are to make you comfortable with the subtleties of the terminology and to give you a greater appreciation of some of the factors that have a bearing on operating systems. Take the statement:

An important thing to note about operating systems is that each one is designed to work only on certain specific kinds of machine configuration. For example, the operating systems Mac OS and UNIX can run on Motorola 680X0 chips and on Motorola PowerPC chips, whereas Microsoft’s Windows 95 does not run on either of these types of chip.

But what do we mean by a kind of machine configuration? Indeed, what do we mean by a machine configuration? Broadly, a machine configuration is the specification of a computer in terms of various key hardware attributes, including the make and model of the processor (or processors) or other chips in the computer, the capacity and speed of the memory and disks, and the type and speed of the bus that connects these components. Using the phrase ‘a kind of configuration’ allows us to generalise, so we can talk about, for instance, small 68000 systems – meaning configurations that are based on one of the family of Motorola 680X0 chips and, as indicated by the word ‘small’, have limited maximum memory (say, 512kB or 1MB).

We can move to a higher level of abstraction than that represented by a configuration, if we think in terms of the architecture of a computer. By ‘architecture’ we mean a design based on a family of computers that have compatible memory, disk and bus designs. Computer architectures are typically labelled, albeit imprecisely, by the processor on which they are based or by the operating system they support. For example, one might refer to architectures for Microsoft’s networking operating system Windows NT, because these architectures do not depend on a single family of processors, but are applicable to several sorts of processor made by different manufacturers. Thus, the Windows NT operating system runs on computers with either Intel Pentium processors or DEC Alpha processors; such computer systems have different architectures, but

The details of chip manufacturers and chips given here are not important.

A bus bus bus bus is the electronic circuitry that carries data between the hardware components in a computer system. You will read about it in Section 2.


CHAPTER 48 PAGE 6

both are compatible with Windows NT. (Computer architecture is discussed in more detail in Section 2 of this chapter.)

Thanks to advances in designing software, even the rather general concept of computer architecture is too detailed for most practical purposes. Currently, the most abstract view we have of computer systems is in terms of ‘platforms’. A platform is a set of architectures based round a family of processors, such that software designed for a specific platform can run more or less unchanged on any computer system of that platform. Although this platform-based view is primarily useful when considering microcomputers, it can be used more widely.

One of the main platforms is the ‘PC’ platform (also known as the IBM PC platform). It covers all the computers based on Intel 8088, 8086, 80286, 80386 and 80486 processors, as well as those based on various Pentium processors. Until the early 1990s the favoured operating system for the PC platform was MS-DOS; later, MS-DOS was used with Windows 3.1 as the graphical user interface; now, the operating system is commonly Windows 95, Windows 98 or Windows NT. (The combination of Intel hardware and a Windows operating system has given rise to the term ‘a Wintel platform’.)

Our statement about operating systems should therefore be recast as follows:

An important thing to note about operating systems is that each one isdesigned to work only on certain specific platforms. For example, the operating systems Mac OS and UNIX can run on Motorola 680X0 platforms and on Motorola PowerPC platforms, whereas Microsoft’s Windows 95 does not run on either of these types of platform.

Generally, operating systems (and system software overall) are designed for one particular platform (or, indeed, architecture or computer) and, unlike Mac OS and UNIX, cannot be moved to another platform. This, of course, raises the question of portability. So, we shall digress at this point to consider some of the fundamental issues underlying the portability of software.

You may recall from Chapter 17 how software executes not as a high-level language source code but as a low-level machine code compiled for a specific computer architecture or platform; the specificity of the compilation is necessary because the different processors that are at the heart of the various architectures or platforms each use different instructions to accomplish the same tasks. In order to execute quickly, much system software (including operating systems) is written in either a very low-level language, or a high-level language which uses facilities that are low level. Therefore, much system software cannot be moved by simply recompiling the high-level language source code as described in Chapter 17 (see, especially, Figure 2 there), because the very things that the system software deals with (locations in memory, registers in chips, and so on) are particular to certain computers, architectures or platforms.

The basic ‘problem’ is that hardware is immutable: once printed circuit boards and chips have been manufactured and assembled together, you cannot change their range of behaviour. However, you can change software, which is malleable, although not trivially so. The solution lies, in part, in an important feature of computer systems, namely layering: a computer system can be regarded as being composed of layers of software and hardware. So, a layer of system software may be interposed between hardware components, between software and hardware components, and even between software components, in order to smooth out differences between the components by acting as a mediating layer. For example, a printer device driver provides a common protocol – a mediating layer – for applications (such as Notepad in Windows 95) that need to print.

DEC stands for the Digital Equipment Corporation.

The Intel 8088 was used in the original IBM PC, hence the term ‘an IBM-compatible PC’.

In fact, the original IBM PCs ran a variant of MS-DOS called PC-DOS. The Course Book uses the term ‘DOS’ to mean either.

Typical processor instructions are discussed in Chapter 5, Section C of the Course Book, which you will read later in this chapter.

For historical reasons, a verylow-level language is often called an assembly language or an assembler.

Recall the discussion in Chapter 17 about how software can be made portable by designing it in layers, sometimes including a virtual machine.


CHAPTER 48 PAGE 7

In general-purpose computers, such as personal computers, the operating system forms a mediating layer between the application software and the hardware, as shown in Figure 1. Because an operating system is usually designed to work with a particular hardware platform, the application writer does not have to worry about the characteristics of individual hardware components, which are many, but only about those of the operating system, which are relatively few. So, although you may have thought that only two layers were involved in executing software – the application software layer and the hardware layer – there are typically three layers: application software, the operating system, and the hardware.

application software

operating system

hardware

Figure 1 Layers in a computer system

Usually an operating system is so bound up with its platform that we consider the two to be a single layer. This means that for any compiled software to run on two different platforms, not only must the architectures be the same, but in most cases the operating systems must also be the same.

However, although an operating system working on different platforms may have the same name and may appear to be the same to both the user and the programmer, and although its utilities will have the same names and will work in the same way, you should not necessarily expect application software that works on one platform to work on another, because the operating system and/or hardware may not provide all the facilities needed by the application. For example, although the UNIX operating system runs on Sun Sparc platforms and also on DEC Alpha platforms, a particular application may not run on both.

There is also the matter of new versions of an operating system. In some instances a new version of an operating system may be able to run an application designed for an older version; this is known as downward compatibility. For example, Mac OS system 8 is downward compatible with Mac OS system 7.6. In other cases a new operating system may be quite incompatible with its predecessor. For example, although both MS-DOS, Windows 3.1 (remember that Windows 3.1 is a graphical user interface for MS-DOS) and Windows NT can run on the same PC (Wintel) platform, programs that run ‘under’ one of these operating systems will not necessarily run under the others because the operating systems deal with the same hardware in quite different ways.

Many applications are now designed to be multiplatform, that is, useable on a wide variety of computer systems. This causes us to expect more and more software to be portable from one computer system to another. The ideal situation for software companies that have written multiplatform software is to have a single set of source code for each product and to compile different versions for each platform (see Chapter 17). Unfortunately, companies are often committed to a single platform and may find it too expensive to redesign the software for other platforms. But other system software, which is introduced as a separate

In embedded systems, that is, computer systems dedicated to controlling non-computing hardware like car engines, washing machines or weapons, software is often permanently recorded in the hardware and works without an operating system or other system software.

When a program works with operating system X, we say it runs under X.

Designing systems in Smalltalk or in some other object-oriented languages usually means that they are multiplatform. This is often not the case for other languages.


CHAPTER 48 PAGE 8

software layer, can come to the rescue of users who need to run applications that are only available for other platforms. For example, Windows 95 provides a type of software, called an emulator, that allows users to run applications designed for MS-DOS and Windows 3.1. An emulator is, in effect, a layer of software interposed between an application that is designed for a certain operating system, and the different (hence incompatible) operating system of the computer being used. The emulator typically translates commands made by the application to its anticipated operating system into commands to the actual operating system being used; thus an emulator is a translation utility (utilities were discussed in Chapter 2 of the Course Book). Similar software is available for Macintoshes, allowing users to run MS-DOS, Windows 3.1 or Windows 95 software.

Exercise 1 Files which are readable by one operating system are seldom directly readable by another unless the disk format is the same. How might software be used to allow, say, a Macintosh to read a ‘foreign’ disk, such as a PC disk?

(Solutions to the exercises begin on page 21.)

Another important consideration about operating systems is whether or not they offer true multi-user and multitasking facilities. Most personal computers are not multi-user systems that allow several people to work at the same computer at the same time, although Windows 95 and Windows 98 can be configured to meet the individual requirements of a number of different users but these users cannot work simultaneously. With regard to multitasking, few of the older operating systems for personal computers provide true multitasking facilities that allow programs to run in parallel. More commonly, the operating systems allow morethan one program to start to run, but at any instant only one program is actually being executed. That program in effect takes over the processor, so the processor then devotes its processing effort to just one of the programs that have started. Typically, a window of the currently running program will be to the fore. If the user clicks on an element of another program’s user interface, the operatingsystem will suspend the execution of the current program and give the processor over to the selected program, which had previously been suspended. For example, in Windows 3.1, if you start a word processor and then start LearningWorks, the word processor will be suspended and LearningWorks will become the currently running program.

By contrast, UNIX and Windows NT are true multitasking systems: several pieces of software – tasks – may be executing at the same time. This means that a single processor has its processing effort shared out among the many programs (or processes or tasks), so the processor is rarely idle and the programs complete as quickly as possible. It is the responsibility of a multitasking operating system to ensure that this arrangement works effectively, and also that the computer’s memory is shared out among the executing tasks. As you will see in Section 2, some architectures support multiple processors in the same machine, so the operating system can share the work among more than one processor, resulting in much faster performance.

Review Question 4 Users employ applications for various purposes. In what way is an operating system involved in the running of an application?

Commands by application software to an operating system are called system calls.

The answer to Review Question 2 refers to some multi-user issues. You may want to glance back at it.

You will learn more about processors in Section 2.

In this context the terms ‘program’, ‘task’ and ‘process’ are synonymous. Such pieces of software may be part of a large software system.


CHAPTER 48 PAGE 9

1.2 Utilities

The final comments on system software are to do with utilities.

> Utilities are often the means whereby users interact with an operating system. So much so, that certain utilities are assumed to be part of the operating system, as the Course Book indicates (hence the term ‘operating system utility’). For example, a utility that allows you to view files and folders (directories) might automatically be started by the operating system. However, such a utility could be replaced by another of the same usefulness but with a different user interface. You will see this happen more and more as operating system developers allow users to replace desktop metaphors with more Web-oriented ones.

> The Course Book points out that some utilities may be purchased separately to extend the functionality of the operating system. Often, these utilities are incorporated into later releases of the operating system. For instance, the facilities provided by an older utility program that makes PC disks readable by Macintoshes have been incorporated into later versions of Mac OS.

> A minor point to note is that it is easy to tell if a program is a utility or an application by considering what kind of data it acts on. If it only acts on the disks, files or printer (or on some other hardware device) without providing further output, then it is probably a utility, but if it processes data not related to the computer hardware itself, then it is probably an application. To a large extent, this is merely a matter of the acceptable use of jargon, and the distinction is not usually significant.

Exercise 2 Name at least one operating system utility on your computer other than the one that formats disks.

2 Computer architectureAs the main aim of this chapter is to broaden your knowledge of computing, we now turn our attention from software to hardware. In a radical departure from the rest of the course we examine the computer from the inside and see how the various pieces in it interact to implement the instructions given by the software. Let us illustrate this switch in emphasis by an example. Take the fragment of Smalltalk code, total := total + 1; so far, when you have evaluated such an expression in a LearningBook, you have only been concerned that totalreferred to a value that was one greater than it had before, and you have not been interested in how the computer added the two numbers together. Now we shall look at how the components of a computer’s architecture work together to achieve this kind of result (though we shall not specifically examine how Smalltalk primitives work).

For a comprehensive overview of computer architecture, read Chapter 5 (Computer Architecture) of the Course Book, omitting the projects and Lab activities. Note that the details of the central processing unit instructions, and of instruction processing, given in the figures labelled ‘A simple microprocessor instruction set’ (Figure 5-19) and ‘Processing instructions’ (Figure 5-21) are not examinable.

Course Book


CHAPTER 48 PAGE 10

Review Question 5 Give at least one example (other than a computer) of (a) an analogue device and (b) a digital device.

Review Question 6 What does ROM stand for? What is its function in a computer?

Chapter 5 of the Course Book has explained the terms and concepts involved in computer architecture. We make a number of additional comments below on various aspects of the subject, using some of the headings from the Course Book.

Memory

> Physical memory, or RAM, has a very fast access time (about 60 nanoseconds and getting faster) compared with a hard disk (the time taken to retrieve data from a hard disk is about 10 milliseconds, over a hundred thousand times slower). So, in the execution of software, the more RAM a computer has, generally the faster it can process data and, hence, run applications.

> Virtual memory is one of a variety of means developed to overcome the problems that arise when a computer’s configuration is at variance with the requirements of the software. However, not only do applications using virtual memory operate more slowly than those using physical memory, as the Course Book indicates, but the use of virtual memory increases the wear and tear on the hard disk. (The hard disk is a mechanical device with rotating parts and is therefore prone to mechanical failure.)

> Software is usually designed and written to optimise the use of RAM and hard disk space. You have seen how virtual memory can be used to run an application that requires more RAM than is available on a particular machine.

> In the converse situation, when a computer has more RAM than required by an application, the application can run more quickly by using a so-called virtual disk, whereby some of the excess RAM is allocated to be treated as hard disk space by the operating system. By setting aside this excess RAM, very fast apparent hard disk performance is achieved.

Review Question 7 If a program required 10MB of memory but your computer had only 8MB of RAM, would you be able to run the program? If so, how?

Central processing unit (CPU)

> It should be emphasised that the central processing unit (which may be constructed from one or more microprocessors) is the heart of the computer, and the way it works dictates the rest of the computer’s architecture. Various features of the CPU (some of which you have read about in the Course Book) characterise the individual architecture and indicate what type of machine the computer is and which operating systems it is capable of running. The features concerned are:the processor’s manufacturer (for example, Intel or Motorola), the word size (8, 16, 32 or 64-bit),the instruction set (CISC or RISC), the method of processing (serial or parallel),the clock rate,the cache.

The term ‘physical memory’ or ‘real memory’ is sometimes used for RAM, to contrast with ‘virtual memory'.


CHAPTER 48 PAGE 11

> The features listed above largely determine CPU performance and, hence, the speed at which software runs: for instance, how fast LearningWorks or a Web browser starts up. However, other elements of the architecture also affect the overall performance of a computer: factors such as the size of the RAM and the capacity of the bus (known as the bus width) can have a big effect on the speed of execution. The operating system also has a considerable influence. Consequently, the use of (one or more) high-performance processors alone does not guarantee good overall performance of a computer system.

> As the Course Book indicates, some computers have multiple processors in the CPU. Such computers use programming languages that are designed to work in parallel, so that different parts of the same program can run at the same time, possibly on different processors. The fastest computers in the world, usually called supercomputers, work in this way and use special programming languages or special compilers that are designed to take advantage of such ‘parallel architectures’.

Review Question 8 What are the two parts of a CPU instruction called?

Exercise 3 Find out if your computer is a CISC or RISC machine. Hint: You could look at the documentation that came with your computer or ask your supplier.

Input/output (I/O)

> The Course Book points out that expansion slots allow you to add extra peripheral devices to your computer. It is worth noting that these devices may include some that had not been invented or even imagined when your computer was manufactured.

Review Question 9 What is the relationship between an expansion bus and a data bus?

Review Question 10 Name at least three different functions that can be provided by expansion cards.

The boot process

> The Course Book has taken you through the boot process step by step, but do not lose sight of the basic fact that when the computer is switched on, the boot process starts up immediately and prepares the computer for use.

> The term ‘boot’ comes from the word ‘bootstrapping’, which refers to the old expression ‘to pick oneself up by one’s bootstraps’, meaning to get up off the ground by oneself. That is exactly what the computer does (metaphorically speaking) when you apply power to it: it activates the instructions stored in ROM, which perform some self-tests on the computer before loading the operating system from the hard disk; once loaded into RAM, the operating system takes over.

The boot process is sometimes referred to as ‘booting' or ‘bootstrapping'.


CHAPTER 48 PAGE 12

3 Large computer systemsWe now move from looking inside the relatively small personal computer to the industrial, scientific and military worlds and we consider how large machines are used to help large organisations to work efficiently. The personal computer is simply not big enough or powerful enough to meet the requirements of huge organisations. Yet, although the machines used in such organisations are much bigger than personal computers, the basic functions of the systems are similar, but there are some differences as we shall see.

3.1 An airline’s requirements

We start by exploring the data storage needs of a large company, exemplified by an airline, and from there we go on to discuss the kind of computer system required by companies such as airlines, banks, supermarket chains and energy supply companies (like gas and electricity suppliers). The systems used all resemble one other and are similar to those used by the government (for example, for tax and benefits) and by organisations like the National Health Service.

When you read Chapter 4, Section A of the Course Book, you see an example of the data used to indicate a flight on an airline (American Airlines, AA) from Chicago (ORD) to Cedar Rapids, Iowa (CID), with the flight number 4199, leaving Chicago at 9:59 and arriving in Cedar Rapids at 11:09. The same flight, with the same flight number, goes one or more times a week. In fact, an airline may run several flights a day between any two cities: in this example, let us assume that American Airlines has four flights per day originating in Chicago and going to Cedar Rapids. The information needed about a particular flight is kept in the form of a computer record in a database; here a record means simply a number of pieces of related information aggregated into a whole. (Later in this chapter we shall consider databases in some detail.) Assume that each record of a flight consists of 20 characters. If ASCII characters are used and if each character requires one byte, the records of the four flights will need 4 × 20 bytes of storage space. Not much, you may think. But there are probably also four flights per day originating in Cedar Rapids and going to Chicago, so that means 2 × 4 × 40 bytes of storage space are needed. Moreover, American Airlines has flights between, for instance, Chicago and San Francisco, Chicago and New York, Chicago and Denver, and Chicago and Miami, as well as international flights between Chicago and London, Chicago and Toronto, Chicago and Mexico City, … and many, many more (it is one of the world’s biggest airlines). And, of course, there are return flights from San Francisco to Chicago, New York to Chicago, London to Chicago, … .

Already you can see that the information storage needs of the airline are getting large. But, we have not finished yet. Each flight is on a particular configuration of aircraft, with given numbers of seats in first class, business class and economy class. Each seat, on each flight, on each day, needs a record, because at some point (the airline hopes) that seat, on that day, on that flight, will be sold to someone. Once it is sold, the airline needs to keep track of the purchaser: name, address, telephone number(s), method of payment, flight number, date, seat assignment, number of people travelling, and so on. By now, we are talking about several million bytes of storage.


CHAPTER 48 PAGE 13

A large business like an airline has another ‘problem’. It has to provide online, near-instant access to all of these records. At any time of the day or night, an intending passenger could telephone and:

> change the date of travel, the flight or the seat;

> ask for a special bargain fare;

> request to be upgraded to a different class;

> book a non-standard inflight meal;

> arrange to stop off at an intermediate destination or go on a connecting flight to another destination;

> cancel altogether, or add more people to his/her party.

If you have dealt directly with an airline, either on checking in or over the telephone, you may have noticed that the process of bringing up the necessary information on the computer terminal, checking it and changing it as necessary, all happens within the space of a conversation. You are, at that time, one of perhaps 10,000 or more such ‘requests’ for data from an enormous database, all waiting to be processed. Airlines do not like to waste time, so they hope to provide information within, at most, 2 seconds for all 10,000 requests!

When a passenger checks in, the clerk uses information from the database to issue a boarding pass, to print baggage tags (usually bar-coded), and to confirm any special catering requirements. Behind the scenes, the baggage bar codes are used, in the first instance, to direct baggage along the conveyor belts to the loading bay for the flight. The bar codes also serve as a way of identifying wrongly directed baggage and sending it to the correct destination (though this does not always seem to work as it should).

As well as all that, the airline needs a whole series of schedules.

Exercise 4 Jot down what some of the schedules required by the airline might cover.

In addition to the passenger and flight information, the airline has to maintain:

> inventories of supplies, parts and other items;

> personnel records;

> a payroll involving complicated pay calculations for flight crew due to differential rates of pay for ‘coming on duty’, preparing for flight (including briefings), actual flying hours, stop-over time, and so on.

You can see that the data storage needs of an airline’s computer system are enormous, with the data having to be available instantly in response to requests that passengers, schedulers or others may make at random. Furthermore, the various kinds of data have to be linked to each other in many ways: the passenger to personal details (like address, telephone number and credit card information), the passenger to a seat on a flight at a certain time on a given day between specific locations, the passenger to his or her baggage, the baggage to a loading bay and the loading bay to a flight, the individual crew members to a flight, the route the flight takes to weather information, and so on. None of this can be accomplished by the kind of computer that you have been using in M206, no matter how


CHAPTER 48 PAGE 14

powerful it is, how much memory it has, or how fast it is, at least for the foreseeable future. Instead, a mainframe computer is needed. However, with the exponential growth of processing power in PCs, many mainframe systems may eventually be replaced by networks of collaborating small, powerful computers.

Exercise 5 Take a minute or so to think, then write down other businesses/organisations which might have requirements for data processing and storage that are similar to those of an airline.

Exercise 6 List the major data processing and storage requirements that these kinds of businesses/organisations have.

3.2 Mainframe computers and their operating systems

A mainframe computer may be characterised as a physically large computer which is very powerful and can serve as the centre of a complex computing operation. It can deal with data-handling requirements like those described above. However, size and power may not be all that is required to fulfil the computing needs of a large organisation – the computer system may have to offer other features to meet various operational demands. For instance, in a business like an airline, which operates over many time zones, with requests for data arriving throughout the 24-hour period, it is essential that the information held by the computer is always available. No computer can run without being stopped occasionally for maintenance, even if it never breaks down. Thus, an airline, and businesses like it, need fault-tolerant systems: that is, systems which can suffer a fault, an outage, without service being interrupted for more than a fraction of a second. The approach usually adopted is to have at least two mainframe computers, identically configured, and two sets of disks with identical copies of the data on them, always available. In the event of one system failing or needing maintenance (expressed in jargon as being ‘down’ or ‘offline’), the operator brings the backup system ‘online’.

There are less obvious requirements that a mainframe and its operating systemhave to meet. In a large organisation, no one employee ‘owns’ the files or is responsible for managing file storage, so the system itself has to take over these functions. Many departments may use the same data or the same applications, and in most organisations there has to be a means of charging departments for the storage and use of the data, as well as for the running of the central computer. Consequently, there has to be a means of linking a department (in this context, a cost centre) to the use it makes of the company’s computing resources, and then charging it for an appropriate share of the overall cost of those resources.

Since a huge number of users will be running a huge number of applications on one computer system, the operating system has to ensure that the users and the applications do not impede one another. For example, if a user tied up a major system resource through an error, and this was not detected quickly, the outcome could be expensive and, perhaps, disastrous. More and more applications (and their users) would queue up to use a computing resource that never became available – an example of a condition called deadlock. Eventually the whole system would be frozen into a state of inactivity. It is the job of the operating system to detect deadlock, or conditions that could lead to it, and to prevent it from happening.


CHAPTER 48 PAGE 15

Earlier, in Review Question 2, we asked you what the five basic functions of an operating system are. The functions listed in the answer apply to operating systems in general, be they for personal computers or large mainframes. In mainframes, the tasks carried out by the operating system are complicated by the size and complexity of the computer system, so the operating system necessarily has to be more sophisticated.

Exercise 7 Suggest some of the factors that complicate the tasks carried out by the operating system of a very large computer system.

The operating system of a mainframe computer must be able to cope with all of these complicating factors. We shall not explain here how large operating systems accomplish this, as it is beyond the scope of M206. The points you should remember are that such operating systems exist, that they are in widespread use and have not been replaced by linked smaller systems, and that they must carry out complex tasks to satisfy complex requirements. Examples of systems of this kind are MVS (IBM’s proprietary large-scale operating system), VM (another IBM large-scale operating system) and TPF/II (also an IBM product, aimed at the requirements of transaction processing systems like those for online banking and airline booking).

4 Dealing with large amounts of dataThe previous section highlighted the fact that large organisations have vast amounts of data that need to be ordered so that information can be retrieved quickly. In this section we look at how this problem is tackled in the context of the personal computer; however, much of what is written is equally applicable to mainframe systems. In particular, we look at how very large quantities of persistent data can be managed.

4.1 Databases

The word ‘database’ can mean different things to different people. In common parlance, it means a collection of information about a category of things. But, strictly speaking, a simple file used to store data in a computer is not a database, as you cannot manipulate the data. For example, a file that has been produced by a word processor or text editor and which contains information about, say, a CD collection is not a database – you cannot ask such a text file, ‘What are the serial numbers of all the CDs by Brian Hayes?’ On the other hand, a file that has been produced by software that structures the data so that operations can be performed on the various elements of the data is a database file. When several such files are consolidated into a unit, one has a database. So, a database does more than just store data – it can be used to explore relationships between various types of data.

These issues are examined in this chapter mainly via the Course Book. In the book, you will encounter some new terms which may seem similar to terms that you have already come across; for example, you will meet the terms ‘entity’, ‘field’, ‘record’, ‘record type’ and ‘relationship’. After you have read the relevant part of the Course Book, we shall discuss briefly how these terms relate to the object-oriented treatment of computing used in M206.

You should now read Sections A–D of Chapter 14 in the Course Book.

Persistence of data will be discussed further in Chapter 49.

Course Book


CHAPTER 48 PAGE 16

The terminology in the Course Book needs some comment, firstly because the usage is a little imprecise, and secondly because it is instructive to compare traditional database terms with object-oriented ones. In the Course Book reading, you met a new term, entity, referring to a person, place, thing or event about which you wish to store information. To be pedantic, each entity (for example, an individual person or employee) should more correctly be referred to as an occurrence of an entity type (for example, ‘an occurrence of a person type’ or ‘an occurrence of an employee type’). In other words, an entity type is an abstract or general description of an entity, just as a record type is defined in the Course Book as an abstract or general description of a record.

When a traditional database is being set up, the entities and entity types to be used are determined at the analysis and design stages, when the purpose of the database and the kinds of processing required are identified. The entities and the entity types are subsequently programmed, or implemented, as records and record types. (Database practitioners, just like object-oriented people, tend to leave out the word ‘type’ when it is implicit and hence they talk about relationships between entities. So be prepared to insert a ‘type’ depending on the context.)

We turn now to a brief comparison between traditional database terms and the object-oriented terms we have been using in the course. Things in the real world, their properties, and how they are related, are usually thought of, or modelled, in the traditional database world as entities, attributes and relationships, respectively, whereas in the object-oriented world, they are thought of, or modelled, as objects, attributes and associations. In the traditional database approach, the software representation of entities is in the form of records, while the attributes of entities are represented by fields in the records. But, in object-oriented software, ideas are represented much more directly: thus, objects are represented simply as objects, classes as classes, and so on.

By now you have probably recognised that traditional database and object-oriented ideas have a rough correspondence to each other, as can be seen in the following table:

Traditional database modelling

Traditional database implementation

Object-oriented modelling and implementation

entity type record type class

entity record instance/object

attribute field attribute externally; instance variable internally

relationship various possibilities depending on database type*

association

operation procedure (in an application program)

message externally; method internally

* For example, in a hierarchical database, a relationship is implemented as a link.

However, do not get bogged down in the terminology. The main point to remember is that the structuring of a database is the key to its success.

Review Question 11 What is the main difference between a numeric data type and a character data type? The Open University's main postcode is MK7 6AA. What data type does it represent?

By ‘a traditional database’ we mean a non-object-oriented database.

‘Modelled’ is used here, as in earlier chapters of M206, to mean the production of a representation of things in the real world.


CHAPTER 48 PAGE 17

Review Question 12 The correspondence that exists between record type and record should be familiar to you. Can you think of a similar correspondence that you met earlier in the course?

Review Question 13 In the Open University the relationship between student and course is many-to-many. Explain what this means.

4.2 Querying a database

You have read in the Course Book about the general principles involved in querying a database. In this section you will learn how to compose queries to retrieve information from a database. You will find that some of this is familiar, as the logical operators AND, OR and NOT are used. You met these earlier in the course in Chapter 16 when learning about conditional expressions. Moreover, you may have already composed queries to databases (perhaps without realising it) when using search engines to find information on the Web.

You should now read the Course Book Chapter 14 pages 645 to 650: Using Boolean Logic in Queries, but omit the Lab activity.

Exercise 8

(i) Use the printout of a small toy store database (shown overleaf) to determine the results of the following queries. Note the dates are in US format, that is, month/day/year.

(a) Quantity < 50

(b) Wholesale > 2.50 AND Wholesale < 4.00

(c) Ordered < ‘7/1/89’

(d) Brand = “Nature’s Kids” OR Brand = “Plastic Pets”

(e) Brand = “Galaxy Toys” AND Brand = “Flying Fun”

(f) (Brand = “Flying Fun” OR Brand = “Nature’s Kids”) AND Retail < 5.00

(ii) Using the fields in the toy store database, write out the query you would use to find:

(a) all the toys ordered between July 1 and July 7.

(b) any toys with a wholesale cost more than $4.99 that were not ordered by J. Mathers.

(c) all the Brown toys that retail for more than $10.00.

(d) any toys that were ordered by T. Livingston or A. Hayes before 7/1/89.

(e) any purple toys that retail for less than $5.00 or any green toys that the store has fewer than 10 in stock.

Course Book


CHAPTER 48 PAGE 18

Toy store inventory

Toy Brand Color Ordered Ordered by Qty Wholesale Retail Stock

Fortran Learner Computer Fun Yellow 7/1/89 A. Hayes 50 $3.99 $5.39 32

Rocket Racer Flying Fun Red/Blue 7/14/89 A. Hayes 50 $2.50 $3.38 32

Day-Glo Frisbees Flying Fun Lime/Pink 7/1/89 A. Hayes 50 $2.50 $3.38 45

Model Cessna Flying Fun Black 7/1/89 A. Hayes 50 $8.99 $12.14 0

Model Satellite Flying Fun Red/White 6/25/89 A. Hayes 50 $5.50 $7.43 12

Satellite Launcher Flying Fun Silver/Taupe 6/24/89 A. Hayes 50 $12.50 $16.88 10

Kaleidoscope Galaxy Toys Purple 7/14/89 T. Livingston 50 $3.99 $5.39 10

Warp 10 U.F.O. Galaxy Toys Silver/Blue 7/1/89 T. Livingston 50 $2.50 $3.38 40

Space Station Galaxy Toys N/A 7/1/89 T. Livingston 25 $12.50 $16.88 50

Orbitron Galaxy Toys Magenta 6/23/89 T. Livingston 25 $13.99 $18.89 3

Brontosaurus Bruce Nature’s Kids Olive 7/14/89 T. Livingston 25 $4.99 $6.74 6

Ferdie Frog Nature’s Kids Green 7/1/89 T. Livingston 25 $2.50 $3.38 70

Stegosaurus Sam Nature’s Kids Brown/Olive 7/1/89 T. Livingston 25 $2.50 $3.38 84

Agatha Alligator Nature’s Kids Green 7/1/89 T. Livingston 25 $4.99 $6.74 54

Eight-color Paintset Non-Toxic Toys N/A 7/14/89 J. Mathers 25 $2.99 $4.04 75

Wooden Tugboat Non-Toxic Toys Brown 6/25/89 J. Mathers 50 $6.50 $8.78 23

Wooden Train Set Non-Toxic Toys Brown 6/25/89 J. Mathers 25 $8.50 $11.48 1

Inflatable Crab Plastic Pets Red 7/5/89 J. Mathers 50 $4.99 $6.74 1

Inflatable Lobster Plastic Pets Pink 7/5/89 J. Mathers 50 $4.99 $6.74 18

Inflatable Snails Plastic Pets Green/Olive 7/5/89 J. Mathers 50 $4.99 $6.74 20

Plastic Penguin Plastic Pets Black/White 7/5/89 J. Mathers 50 $4.99 $6.74 20

Micro Mice Plastic Pets White 7/5/89 J. Mathers 50 $4.99 $6.74 21

Vampire Fangs Plastic Pets Pink 7/5/89 J. Mathers 50 $4.99 $6.74 22

Thinking Trees Plastic Pets Brown 7/5/89 J. Mathers 50 $4.99 $6.74 13

Model Mercedes Rick Folks Blue/Tan 7/5/89 J. Mathers 50 $4.99 $6.74 4

Model Ferrari Rich Folks Red/Blue 7/5/89 J. Mathers 50 $4.99 $6.74 25

Plastic Sushi Rich Folks Yellow/Green 7/5/89 J. Mathers 50 $4.99 $6.74 33

Expanding

Expresso

Rich Folks Brown 7/5/89 J. Mathers 50 $4.99 $6.74 27

Rock ‘n’ Roll Ron Zap Toys Magenta/Red 7/5/89 J. Mathers 50 $4.99 $6.74 28

Singing Slugs Zap Toys Green/Olive 7/5/89 J. Mathers 50 $4.99 $6.74 2


CHAPTER 48 PAGE 19

ReviewAfter studying this chapter you should understand the following ideas.

> A computer system can be envisaged as being composed of several layers of software built on top of hardware. In particular, layers of system software are used between software applications and the hardware.

> All general-purpose computers, both personal and mainframe, need a piece of system software called an operating system to manage and coordinate the use of the computer’s resources. The operating system, in effect, serves as a mediating layer between the hardware and the applications

> The main functions of an operating system are: to control input and output, to allocate system resources, to manage storage, to detect equipment failure, and to maintain security.

> The basic types of hardware components of all computers are the same. The general design of a computer in terms of these components and how they are connected is referred to as the architecture of the computer. The three main components are: the memory (RAM), the central processing unit (CPU), and the input/output devices. Memory is used to store data and instructions; the CPU executes the instructions; and the input/output devices are used to communicate with the computer.

> A set of architectures based around a family of processors is referred to as a platform.

> Large organisations, such as banks and airlines, require considerable computing power to meet their needs. This requirement cannot be met by small personal computers, so large mainframe computers are necessary.

> The operating systems of mainframe computers perform the same basic functions that are carried out by personal computers, but the additional complexities of dealing with large amounts of data and many simultaneous users mean that the operating systems of mainframes are much bigger and more complex than those of personal computers.

> A database can be used to organise and structure a large amount of data so that it can be easily searched, manipulated and updated. There are different types of database system that can be used to model the relationships between data, but relational databases are the most common.

> Relationships between entities in traditional database technology are similar to, but not the same as, associations between objects in object technology.

> Information can be retrieved from a database by using queries that specify the search criteria in terms of the relevant field name(s), the required condition(s) and, often, one or more logical operators (such as AND, OR and NOT).


CHAPTER 48 PAGE 20

ObjectivesAfter studying this chapter, you should be able to do the following.

> Understand and use the terminology of computer architecture and database technology.

> Describe in approximate terms the architecture of a typical personal computer.

> Describe how layers of system software are used, especially in the form of an operating system.

> Summarise the data handling requirements of large organisations and indicate some of the features that mainframe computer systems must offer to meet these requirements.

> Outline the way that databases are structured and how they may be queried.


CHAPTER 48 PAGE 21

Solutions to ExercisesExercise 1 The solution is to introduce a new layer of software to translate and read the foreign disk format. Because accessing a disk should be made as fast as possible, operating system designers are reluctant to complicate a design and hence slow down the system, but a layer of software can be introduced in the form of a translation utility that makes the disk of one operating system readable by another. In fact, these days, because many software companies offer their products for more than one platform, such translation utilities are commonplace, and the software and hardware are so fast that most users would be hard-pressed to tell that such a utility was operating. Furthermore, as networks connect many different sorts of platform, users require file formats to be interchangeable; in this case, a layer in the network software can be used to ensure interchangeability.

Exercise 2 There are potentially a large number of answers to this question, but the most common operating system utility that you use is the program that lets you view the list of files on your hard disk. Under Windows 3.1 this is called File Manager, under Windows 95 (or 98) it is Windows Explorer, and under Mac OS it is Finder. Other utilities include programs that allow you to copy files, delete files, rename files, query printer jobs, and so on.

Exercise 3 If you have an ordinary personal computer with an Intel processor (Pentium or 486), then you have a CISC or complex instruction set computer. If you have a PowerPC, Power Macintosh, Sun 4 or higher, or DEC Alpha, then you have a RISC or reduced instruction set computer.

Exercise 4 Among the many schedules that an airline needs to hold are those covering:

(a) the movement of aircraft, so the airline always knows where each ’plane is;

(b) aircraft servicing – when each ’plane is due for service, where it has to be for that service, and the availability of the requisite new or reconditioned parts at the right place for the service;

(c) flight plans for each flight, based on the current weather and on the short-term forecasts for the areas that the ’plane will fly through, with variables such as the weight of the aircraft, and wind direction and speed taken into account;

(d) flight crew duty rosters – each member of the flight crew has to have his/her hours scheduled within the number of hours per month and per shift allowed byregulations (these hours differ for different crew members, so this a very tricky scheduling algorithm, as the airline does not want a pilot to ‘run out’ of flying hours in mid-flight and have to leave the cockpit);

(e) the ordering and loading of meals for each flight, taking into account special catering requirements;

(f) airfreight handling, tracking and invoicing.

Exercise 5 There are a number of businesses and organisations with similar data handling requirements to those of an airline: banks, building societies, insurance companies, supermarket chains, other major retailers, car rental firms, large charities, energy supply companies, water companies, local government, the Inland Revenue, the National Health Service, large employers in general (for keeping personnel and payroll records), and many others.

Disk formats are described in Chapter 4, Sections C and D of the Course Book.


CHAPTER 48 PAGE 22

Exercise 6 We have listed the following requirements:

(a) rapid, random access to data items in response to very large numbers of queries and changes;

(b) very fast response times to a large number of users;

(c) the ability to process all data in a batch, for example when airline schedules or pricing structures change;

(d) data input and output over a wide geographic area;

(e) large-scale, long-term data storage;

(f) high levels of security and reliability.

Exercise 7 We have listed the following complicating factors (you may have others):

(a) large numbers of simultaneous users running different trivial applications, as in an airline booking system;

(b) many attached input/output devices, including special input devices (such as sensors, as in the case of radar systems used for air traffic control) and special output devices (such as environmental installations like air-conditioning units);

(c) the need to ‘tie’ together the user, the input/output device that the user is employing, the program, and the data being processed, so that the correct data are processed by the correct program and sent to the correct user;

(d) the requirement to manage files that are ‘owned’ by the organisation and therefore are not the responsibility of any particular user;

(e) the long-term storage of large amounts of data;

(f) the provision of very high levels of security to prevent unauthorised access to sensitive information;

(g) the apportioning of costs amongst departments or cost centres.

Exercise 8(i) (a) 8 (b) 3 (c) 5 (d) 11

(e) 0 (There is only one brand for each item.)

(f) 4There are five items which satisfy the condition Brand = ‘Flying Fun’ and four items which satisfy the condition Brand = ‘Nature’s Kids’. As there is only one Brand for each item, these two sets are mutually exclusive, hence nine items satisfy the combined condition (Brand = ‘Flying Fun’ OR Brand = ‘Nature’s Kids’). How many of these nine items also satisfy the condition Retail < 5.00? The answer is four items.

(ii) (a) There are two possible answers depending on how you interpret the time interval specified by ‘between’. Ordered ≥ 7/1/89 AND Ordered ≤ 7/7/89 (This would include ordering on 1 July and 7 July.)Ordered > 7/1/89 AND Ordered < 7/7/89 (This would exclude ordering on 1 July and 7 July.)

(b) Wholesale > 4.99 AND (NOT (Ordered By = ‘J. Mathers’))

(c) Color = ‘Brown’ AND Retail > 10.00

(d) (Ordered By = ‘T. Livingstone’ OR Ordered By = ‘A. Hayes’)AND Ordered < 7/1/89

(e) (Color = ‘Purple’ AND Retail < 5.00) OR (Color = ‘Green’ AND Stock < 10)


CHAPTER 48 PAGE 23

Solutions to Review QuestionsReview Question 1(a) Some operating systems that can be used for personal computers are MS-DOS,

Windows 3.1 (actually an operating environment as it is a graphical user interface for MS-DOS), Windows 95, Windows 98, Windows NT, OS/2 and Mac OS. There are versions of the UNIX operating system available for personal computers, but UNIX is mainly used on minicomputers and mainframe computers.

(b) Mainframe operating systems include UNIX, VMS and MVS.

You may also know of some other operating systems.

Review Question 2 The five basic functions of an operating system are:

(a) to control basic input and output;

(b) to allocate system resources;

(c) to manage storage space;

(d) to detect equipment failure;

(e) to maintain security.

Your personal computer performs all of these functions to some degree, except possibly the security function. With regard to security, MS-DOS, Windows 3.1 and Mac OS, being single-user operating systems, allow whoever is using the computer to see all the files and run all the software without a user ID or password (although it may be possible to buy a special application program that will appear to lock your system or files so that they can only be accessed with a password, but this is not part of the operating system and can be circumvented by running the operating system without the program). Windows 95 allows different users to set individual login profiles with a password, but all the users can still see all the files. However, Windows NT is a true multi-user operating system, with the operating system controlling who has access to sensitive files.

Review Question 3 The operating system of your computer would only recognise the device (and thus allow it to be used) if the computer had the appropriate device driver. Most device manufacturers supply device drivers with their products, and some operating systems like Mac OS or Windows 95 have a ‘plug-and-play’ facility whereby the operating system recognises new devices as they are added and then either prompts you to obtain the relevant device driver or finds it from among its own files.

Review Question 4 The operating system acts as a broker – a mediating layer –between the computer hardware and the application, managing the allocation of computer system resources to the application. Thus, when the application needs some computer resource, such as a printer to print a document or memory to save data, it will request the resource from the operating system, which then makes the allocation. The operating system is, therefore, the main layer of software that the application deals with.

Review Question 5 There are many possible answers to this question.

(a) Analogue devices display continuous data so the user can interpret intermediate values, as in a watch with hands, a thermometer where the value is read off from the height of a column of liquid, or any of the gauges on a car dashboard that use a needle.

(b) Digital devices display information as discrete numbers or digits, as exemplified by a digital watch. In a car the speedometer is usually an analogue device while the odometer is a digital device.


CHAPTER 48 PAGE 24

Review Question 6 ROM stands for read-only memory. It is provided by a set of chips that are able to store information permanently. Its function is to store a series of permanent instructions that enable the computer to start up.

Review Question 7 A program that requires more memory than the RAM provided by your computer can run by making use of virtual memory, where part of the computer’s hard disk simulates memory and so supplements the RAM. The additional 2MB of memory needed in this case would be appropriated as virtual memory.

Review Question 8 The two parts of a CPU instruction are the op code, which specifies the action to take, and the operand, which represents the data (or, more precisely, the RAM address of the data) that the op code acts upon.

Review Question 9 An expansion bus is the part of a data bus that carries data to and from peripheral devices.

Review Question 10 Expansion cards can add various functions to a computer system. For example:

(a) a graphics card allows a monitor to be connected to the computer (upgrading a graphics card may enable the computer to draw images faster and with more colours);

(b) a network card makes it possible to connect the computer to a local area network;

(c) a modem card provides an internal modem that can connect the computer to the telephone system;

(d) a sound card enables the computer to synthesise speech and music, and to play sound through speakers;

(e) a serial or parallel card can add an extra serial or parallel port, respectively, for connecting, say, a printer or a modem.

Review Question 11 A numeric data type refers to data that can be manipulated mathematically, whereas a character data type cannot be manipulated mathematically even though it may consist of digits (for instance, a telephone number). The postcode is an example of a character data type as it cannot be treated mathematically.

Review Question 12 The correspondence between class and instance is similar, in that ‘class’ is a template for ‘instance’, which is a particular occurrence of that class. There is, of course, a major difference, in that a class has a protocol, but a record type refers only to a general record format, hence there are no methods associated with a record type.

Review Question 13 It means that a course can be taken by any number of students, and also that a student can take any number of courses. In practice this is not strictly true as students are advised not to take courses that aggregate to more than 60 points in one year.


CHAPTER 48 PAGE 25

GlossarySeveral important terms have been defined in the passages in the Course Book that you have read while studying this chapter. These terms can be found in the Glossary of the Course Book and have not been included here unless an amplified definition has been considered helpful.

architecture See computer architecture.

bus The electronic circuitry that carries data between the hardware components in a computer system.

bus width Bus capacity.

central processing unit The main processor of the computer system. It is responsible for performing all the data processing instructions in the computer. It is composed of several discrete elements. These include a control unit to direct and coordinate processing and an arithmetic logic unit (ALU) to perform calculations.

CISC Complex instruction set computer.

computer architecture The design of a computer system, based on a family of computers that have compatible memory, disk and bus designs. In practical terms, the design of a computer system is represented in terms of the main hardware components and the connections between them.

configuration See machine configuration.

cost centre The part of an organisation that has budgetary responsibilities.

database A collection of information organised in a manner that makes it easy to manipulate the data.

deadlock A condition that can arise in a computer system when a set of incomplete transactions exists, with the completion of each transaction being dependent on another, or when there is a conflict between two applications that need access to the same resources, for example, the same files, printer, or other peripheral device, resulting in the applications waiting indefinitely for the resource.

downward compatibility The facility whereby old software can be run on a newer operating system.

embedded system A computer system dedicated to controlling some non-computing hardware, like a washing machine, a car engine or a missile.

emulator In effect, a layer of software interposed between an application designed for a particular operating system and the different (hence incompatible) operating system of the computer being used, so as to enable the application to run on the alien operating system. The emulator typically translates commands made by the application to its anticipated operating system into commands to the actual operating system.

fault-tolerant system A system that continues to be able to function when part of it fails.

layering An important feature of computer systems, whereby a computer system is regarded as being composed of layers of software built on top of hardware. Hence, a layer of system software can be interposed between hardware components, between software and hardware components, and even between


CHAPTER 48 PAGE 26

software components, so as to smooth out differences between the components by introducing a mediating layer.

logical (or Boolean) operator Operators used to construct complex queries from suitable combinations of single conditions in Boolean algebra. Examples of logical operators are AND, OR and NOT.

machine configuration The specification of a computer in terms of various key hardware attributes, including the make and model of the processor (or processors) or other chips in the computer, the capacity and speed of the memory and disks, and the type and speed of the bus.

mainframe computer A physically large computer that is general-purpose, very powerful and usually is at the centre of a complex computing operation.

offline Jargon term to describe a computer (or any part of a computer system, such as a printer) that is disconnected from a network (either deliberately for, say, scheduled maintenance, or accidentally, perhaps as a result of network failure). An offline computer cannot receive inputs and/or provide outputs.

online Jargon term to describe a computer (or any part of a computer system) that is connected to a network and so is able to receive inputs and/or provide outputs.

operating system A set of programs which manage the resources of a computer, including controlling the input and output, allocating system resources, managing storage space, maintaining security, and detecting equipment failure.

outage The period when a computer system is unavailable, offline or ‘down’.

platform A set of architectures based round a family of processors, such that software designed for a particular platform can run more or less unchanged on any computer system of that platform.

portability The ability of some software to run on more than one platform without the need for re-design.

RISC Reduced instruction set computer.

supercomputer A computer with a very powerful processor. It is usually used to handle large mathematical problems involving many calculations, such as weather forecasting.

system call A command by application software to an operating system.

virtual disk A means of improving disk performance, whereby a computer system allocates some RAM (which has a much faster access time than a hard disk) to be ‘set aside’ to supplement the storage function of the hard disk; the computer system then benefits from very fast apparent disk performance.


CHAPTER 48 PAGE 27

Indexarchitecture (computer), 5bootstrapping, 11bus, 5, 11bus width, 11central processing unit, 9, 10computer architecture, 9computing requirementsairline, 12large organisations, 12

configuration (machine), 5cost centre, 14database, 15deadlock, 14device driver, 23downward compatibility, 7embedded system, 7emulator, 8expansion slot, 11

fault-tolerant system, 14input/output, 11layering, 6logical operator, 17machine configuration, 5mainframe computer, 14memory, 10multitasking, 8multi-user facility, 8operating system, 5outage, 14parallel architecture, 11platform, 6portability, 6supercomputer, 11system call, 8virtual disk, 10virtual memory, 24


CHAPTER 48 PAGE 28


Persistent Objects, Streams, and Files

Contents1 Persistence and files 6

1.1 ASCII and text files 61.2 Binary files 81.3 LearningWorks Files 9

2 Text files and streams 122.1 File and stream classes 122.2Writing to a stream 152.3 Reading from a stream 172.4 Accessor, iterating, and testing messages 232.5 Errors and closing streams and files 242.6 Summary of messages 26

3 Internal streams 263.1WriteStream and ReadStream 263.2Why internal streams? 28

4 Saving objects 294.1 Objects saved as expressions 294.2 BOSS 314.3 Filing out and filing in classes 32

Review 33

Solutions to Exercises 34

Solutions to Questions 36

Glossary 38

Appendix of protocols 39

Index 41


CHAPTER 49 PAGE 3

Concepts revisitedThis chapter assumes you have studied all the chapters on Smalltalk programming up to now, particularly Chapters 20–24. It revisits and explores further the concepts involved in saving and creating various kinds of file. Before you started M206, you will have been used to the way operating systems organise information stored on disk as files; you will have also been familiar with the idea that to save your work (for example, a word-processed document) you must create or update a file. During the course you have also used files; you have read, altered and saved LearningBook files and perhaps created them too.

The chapter assumes you are aware of the difference between simple text files (these contain simple characters that have not been styled in any way) and more elaborate formats that contain, say, colour or graphical information or sound. So, for example, you are aware of the different types of file you can save from OpenWord and how these are different from those you can save from OpenDraw. Indeed, it is assumed that you have used a word processor outside the Smalltalk environment to write and save your TMAs to file.

If you are unsure about how much you know about files, you might find Chapter 4 of the Course Book helpful, particularly Section A.

You will also find some of the description of the way the LearningWorks environment uses files easier if you remember the discussion of compilation and virtual machines in Chapter 17.

Check your familiarity with these by doing the following revision questions.

Revision Question 1 How, and why, do you save a LearningBook?

Revision Question 2 How, and why, do you save an OpenWord document? What

options are available? When would you use each of the options?

Revision Question 3 Give an example of what could be in a file containing information about yourself.


New conceptsIn practical terms, the chapter is about how you make your work permanent by saving it on disk – that is, about making it persistent. However, it is concerned with the boundary between the world of objects in the Smalltalk environment and the non-object-oriented world of files in the environment which is your operating system. Consequently, you will address both the storage of objects using files in different ways, and how to process files that do not contain objects using object-oriented programming; that is, using Smalltalk. Hence you will use Smalltalk to create files of text (that humans can read) and files of objects. In either case you will encounter a new sort of object – ‘stream’ objects. To give you further insight into the way that LearningWorks and LearningBooks work, we also look at the files that make up the LearningWorks system itself.


CHAPTER 49 PAGE 4

Planning your studyMost of Chapter 49 is optional in the sense that we do not consider it essential that you study all the material. Importantly (with the exception noted below) Chapter 49 will not be assessed in either a TMA or in the examination.

We suggest that you study the following material which will help you in your practical work and may be assessed in either a TMA or in the examination:

> Practical 11 in Session 4 of Learning Book LB-49;

> Section 4.3 of this chapter.

You should then move on to Chapter 50, and only return to the bulk of Chapter 49 later in the year if time and inclination permit.

The selected material explains how classes can be saved as text files using a process called the ‘file out’ mechanism. In a similar vein, there is a ‘file in’ mechanism for reading a class into the Smalltalk environment.

If you do have time to study the whole of the chapter, we would expect the chapter and its associated computer-based activities to take about 9 hours’ study time.

Paper-based reading and exercises – 4 hours.

Computer-based study and practical exercises – 5 hours.

The practicals are grouped into four sessions. You are advised to work through the practical sessions when prompted by this text.

IntroductionOne of the fundamental problems of computing, indeed of humanity, is storing and maintaining information. Before the advent of clay tablets, information was passed on by word of mouth and people had no way of passing unchanging information from one to another; information tended to be changed by the transmission process, it was not persistent. It could not be stored at one time and retrieved unchanged at some later time; information was passed between people by speech which could be misinterpreted and was held in human memory which can be faulty.

In most computers the information represented in the relatively fast randomaccess memory (RAM) persists only as long as electric power is available. For example, if someone pulls the mains lead out of your computer when you are writing in your word processor, you lose that information. This is because the document you are writing is represented in some fashion in the computer’s RAM, which needs mains power to keep it going. Consequently the information is perishable; remove the power and it will disappear. In this sense RAM is more volatile than human memory or clay tablets!

Furthermore, information is only accessible in RAM, and we can usually only make sense of it, courtesy of some piece of software; quit the application that organised the information in RAM and, in effect, it will be lost completely. Again, to use the word processor example: if you type a letter and quit without saving it you will not be able to regain what you typed by restarting the word processor. What was in RAM will have been lost.


CHAPTER 49 PAGE 5

The only reason for RAM to be volatile like this is that it is expensive to produce non-volatile RAM; that is, memory that does not need power to sustain it. Fortunately, computers have another form of memory, generally provided by hard disk, floppy disks, CD-ROMs and various types of digital tape. These non-volatile media enable you to keep a permanent record of your work. This kind of memory is called persistent memory. Of course, just as clay tablets may be broken, so hard disks can wear out or be damaged. So persistence is not permanence in the strict sense.

We mention in passing that what is stored on disk is not usually exactly what was in RAM; that is, the binary representation on disk does not exactly match the pattern of bits that was in RAM. For example, the representation of a word-processed letter in RAM may be very different from the representation on the hard disk.

In earlier chapters you have been dealing with objects that persist as well as those which were ephemeral. Every time you have started the LearningWorks environment, persistent objects – often ones created by the Course Team – have set up the environment so that you can use it. For example, the Launcher, which as you have seen persists throughout the course, also references another persistent object which is responsible for finding the LearningBooks stored as files on disk. Other objects, like LearningBooks themselves, you must arrange to save, and they include objects you have created, classes you have modified, pages you have written to, and so on.

During the course you have rarely had to consider how objects are stored in RAM (that is, what bits are needed) but you do know that they have unique identities. If you were to save an object to disk to make it persistent and later restore it, then you would either have to arrange that its identity was preserved or take the view that its original identity is not relevant when restoring it to a new identity. So the ‘thing’ that persists on the hard disk may not be the same thing that was in RAM and may not be sufficient to recreate the original object.

In this chapter we look at two possible representations of objects, first as text representation and then as a binary representation. Text is readable by humans whereas binary is usually read by some software application. As you will see, the binary representation of objects results in a representation that is closer to the way an object is stored in RAM.

Information is stored on a hard disk in files, and a large part of this chapter is devoted to their study. We shall have to address the issues involved in programming with files – how to create them, how to write them, how to read them. Special objects, in particular ‘stream’ objects, are needed to create these files in Smalltalk, and some of these are looked at in detail.

To summarise, this chapter looks at:

(a) saving characters and strings on disk as simple text files;

(b) saving objects as text files which can be read to create similar objects;

(c) saving LearningBooks and other objects as binary files that can readily become objects again.

The LearningWorks files themselves will be discussed as examples of text and binary files. You will then consider how objects, in general, can be saved to disk in a binary format.

Some word processors allow you to save documents in different formats, usually via options available in the save dialogue box. This is because even a simple letter that looks the same on different commercial products is stored differently on disk (and no doubt in RAM). You might like to experiment with your own word processor: save a document in an unusual format and try to read the file again.

You might make information persistent using databases. We do not discuss these in this chapter but consider them later in the course.


CHAPTER 49 PAGE 6

1 Persistence and filesThe concept of information being recorded at one time and remaining available so that it can be used at a later time is called persistence. For relatively simple items like word-processed documents, this means that a representation of the document created in RAM must be stored in non-volatile memory, such as a hard disk or a CD-ROM.

In this section you will investigate files a little more deeply by first looking at the sorts of file you have encountered during the course so far. All information on a hard disk is represented in bits, the binary digits 0 and 1. So, it is certainly the case that, however else a file might be described, it can be described as ‘binary’. However, operating systems for microcomputers such as those used in personal computers perform at their best when they access either single or multiple bytes; the number depends on the particular computer. Therefore, files are thought of as a sequence of bytes and not as a sequence of bits. We begin by looking at the simplest type of file – the text or ASCII file.

1.1 ASCII and text files

In Chapter 22 we looked at Character objects and how they are stored. The representation of ASCII characters was discussed there and you will recall that ASCII codes 0–127 correspond to the alphabetic characters, the digit characters, punctuation characters and special characters like tab, space, carriage return, linefeed and delete. Codes 128–255 correspond to more ‘exotic’ characters such as accented letters. Because many operating systems are able to read and display files that contain these higher codes, we shall use the term plain text to mean text that contains only ASCII characters (that is codes 0–127). So a plain text filewill be taken to mean a file that contains only ASCII-encoded bytes. The term text will be used to mean text that contains characters with codes 0–255 and text file will be used for files containing such text.

A very simple plain text file is the one which you may have received as part of an automatic response to an email request for information on the course. The 1997 version contained the plain text shown in Figure 1.

1.12 Draft Calendar

An outline of the week-by-week study pattern follows. Note that the blocks are there to help you pace your work and that there is a wide variety ofmaterial in each block.

Figure 1 Part of a plain text file

Note that the text contains non-printing characters – a tab before ‘Draft’ and carriage return characters at the end of each line, but that it does not contain any ‘exotic’ characters.

You can picture plain text files as in Figure 2, which depicts the beginning and end of a file containing the characters shown in Figure 1.

1 . 1 2 tab D r a f t sp C b l o c k .

Figure 2 Bytes of a plain text file as ASCII characters

The same file interpreted as a binary file is given in Figure 3.

IMPORTANT CONCEPT

IMPORTANT CONCEPT

Chapter 4 of the Course Book provides extensive background to this subject, and you may find Chapter 9 of the Course Book interesting after studying persistence here.

We shall not be concerned with characters that require two bytes of storage.

Anyone who emails [email protected] receives an automatic reply containing most of the information sent to students by regional offices.


CHAPTER 49 PAGE 7

00110001

00101110

00110001

00110010

00001001

01000100

01110010

01100001

01100110

01110100

00100000

01000011

01100010

01101100

01101111

01100011

01101011

00101110

Figure 3 Bytes of a plain text file as binary

But plain text files need not only be used to hold ASCII characters which make sense in English. A file can be plain text but can nevertheless represent, say, a complex document, such as the files used for LearningBook practicals. When you use one of these files in a LearningBook, what you see is formatted text, including text that is coloured and text that is in bold face. These files also let you ‘jump around’ the document by clicking on links. Despite all this clever functionality the files themselves are plain text – they consist only of ASCII characters. Figure 4 is an example from LearningBook LB-15, Practical 6.

<BODY><H2>Practical 6: initialize, home and super</H2><P>Create an <TT>initialize</TT> method for <TT>Lefty</TT> which invokes <TT>initialize</TT> of <TT>Frog</TT> and then sets <TT>position</TT>to <TT>5</TT> using a <TT>home</TT> method for<TT>Lefty</TT> (which you should also create). Test your modified version of <TT>Lefty</TT> by creating some new instances and ensuring that they behave correctly.<P><A HREF="c15s2d6.htm">Discussion 6</A></BODY>

Figure 4 Plain text HTML file

What you see when you use this file is shown in Figure 5.

Figure 5 Results of interpreting HTML

The content of this file follows the rules of HTML – HyperText Mark-up Language. This language is interpreted by Web browsers, and they have the task of interpreting the characters in the way determined by the language rules. So, for example, the text <H2> corresponds to a command that tells a browser that the text that follows a level 2 heading. The end of the heading is marked by </H2>. A Web browser (that is, an HTML viewer) knows how to handle such headings; for example, as you can see from Figure 5, this particular browser emboldens all the characters in the heading but does not change the size from the normal


CHAPTER 49 PAGE 8

paragraph’s font size. The <P> text tells the HTML browser that what follows is a normal paragraph, and so on.

So while HTML files can be read by humans, interpreting the ASCII (plain text) characters they contain may not be quite so easy. However, HTML files provide a way to transport complex information to any machine that has an HTML-capable browser running on it. This is one of the ways in which complex documents can be delivered over the Web to machines of totally different hardware types running totally different operating systems.

The use of plain text files to represent more complex information than conveyed by the characters themselves is an important tool for programmers. However, there is a cost; considerable processing of the text may be required, and consequently considerable time may be needed to interpret the text.

Later in this chapter you will use plain text files to hold information that can in effect represent objects.

1.2 Binary files

Plain text files can always be read by humans, albeit with some difficulty as in the case of the HTML above, but what about files that are not meant to be read ‘directly’ by humans? The bytes they contain will have patterns that we could attempt to interpret as text, but if we do so the result will be meaningless. For example, a graphics file used frequently on OU Web sites contains the OU’s logo, shown in Figure 6.

lFigure 6

If you were to try to look at this file with a text editor or word processor you might see the information shown in Figure 7.

GIF89ao 2 ~ ΦΦΦΦ±'Ôˆ.Ô≠/flÓÙfl™7œÂÔœ¶?ø›Èø£HØ‘‰ØüPüÃflüõXè√Ÿèò`Äîh!ª‘p≥œ`™…`çxP¢ƒPâÄ@ôø@Üâ0ëπ0Çë à¥

Figure 7 Bytes of a binary file as ASCII characters

Except for the letters ‘GIF89’ at the beginning of the file, it clearly has a different internal structure from a text file and has contents not meant to be interpreted by humans.

We shall describe a file as a binary file if its content is not meant to be interpreted as text by software. More often than not binary files have some internal structure that allows the software that is to read and process them to work efficiently. Typically this means that binary files are larger but faster to process than text files.

A LearningBook is an example of a binary file that can only be read by a certain software (LearningWorks). Applications, like a word processor or a spreadsheet program, are also binary files.

IMPORTANT CONCEPT

Of course we can never ‘directly’ read a file. What we mean by ‘directly’ is that there is some general-purpose software that can show us the file as readable text.

In fact what you would see will depend on what characters the bits represent on the particular computer system that examines the file!

Take care if exploring these ideas by opening binary files using a text editor or word processor. The codes that are displayed may be used for special purposes by the software you are using and can even make your screen behave peculiarly. For example 00001100 (decimal 12) is the special ‘form feed’ character used by many applications to mark a new page.


CHAPTER 49 PAGE 9

Exercise 1 Using your experience as a computer user, categorise the following files as text or binary files (you can guess and do not have to check using your computer):

(a) executable applications (for example, a Web browser, a word processor);

(b) information files provided with applications, often called something like

readme.txt;

(c) files used to configure Windows operating systems; that is, the files with extension

.ini;

(d) bit map picture files such as created by the LearningWorks Capture button, usually

with name extension .bmp.


Finally, note that many operating systems have mechanisms that enable the type of a file to be recorded as part of the file name. A crude, but common, mechanism is to add an extension to the name of a file. For example, in the name invitation.doc the .doc part is the extension that indicates the type of the file; in this case on a PC it is likely that .doc means the file is a binary file that was produced by Microsoft Word and hence can be read by that application. However, you cannot guarantee what an extension means. The extension .doc was used on many operating systems before Microsoft Word became available, and anyway systems permit users to change file name extensions and so an extension such as .doc is no guarantee that the file is a Microsoft Word file. Despite this caveat, two extensions are almost universally agreed upon:

.exe means a binary executable file just about everywhere;

.txt means a text file (often plain text file) just about everywhere.

Others that have become commonly accepted because of the Web are:

.htm (or .html) for HTML plain text files;

.gif and .jpg (or .jpeg) for binary graphics files supported by most Web browsers.

The use of file extensions has become so widespread that even on operating systems that record the type of a file in a more sophisticated manner you are well-advised to use extensions if you need to be able to remember the type of a file just by looking at its name.

Exercise 2 You have encountered a file that has a .lw extension on your system. What do you think this indicates about the file?

Review Question 1 What is a binary file and how does it differ from a text file?

1.3 LearningWorks Files

The way that M206 uses files for its Smalltalk environment is interesting because it exhibits a mixture of text files and binary files. So that you can understand how the environment you have been using is organised, in this section we discuss the way the various system files are used in the LearningWorks environment. We begin by looking at the directory structure used to organise the files.

There is more on file extensions in Chapter 4 of the Course Book.

Note that on some operating systems the names of files and folders (directories) are shown in capital letters.


CHAPTER 49 PAGE 10

The M206 LearningWorks environment consists of four main files together with a number of LearningBooks organised in subdirectories. The files are the image file (m206lw.im), the executable file (lwwin252.exe) that implements the virtual machine (sometimes called the engine) described in Chapter 17, a sources file (m206lw.sou) and a file to configure the system (course.dfn), see Figure 8 overleaf.

\M206

\LW

lwwin252.exe – ‘engine’ software that implements virtual

machine

m206lw.im – ‘image’ containing all basic Smalltalk objects

m206lw.sou – source code of all basic Smalltalk classes

course.dfn – file that configures LearningWorks

\LBs – original and new

\Original – course supplied LearningBooks

lb01.lw – binary file for LB-01

lb02.lw – binary file for LB-02

…\Saved – student’s saved versions of course LearningBooks

lb015.lw – binary file for saved version of LB-15

lb023.lw – binary file for saved version of LB-23

\Own – student’s LearningBooks created from scratch

\HTML – all HTML and graphics files for practicals

\Lb-HTM01 – all .htm and .gif files needed for LB-01

\Lb-HTM02 – all .htm and .gif files needed for LB-02

…\Docs – OpenDraw documents & text files created by LearningBooks

\Opendraw – OpenDraw documents

\Bitmaps – .bmp files saved using capture utility

\STcode – .st files saved using later class browser

Figure 8 M206 LearningWorks directory structure

When you start the LearningWorks environment the file lwwin252.exeexecutes in conjunction with the image file (m206lw.im) to load the classes and objects the latter contains into RAM. The file course.dfn is then called upon and it causes the Launcher to start.

As you can see from this brief description, the image file contains the key classes and objects that are essential to the running of the LearningWorks system; it is an elaborate structure that allows the Smalltalk object-oriented environment to persist on disk.

The image file m206lw.im is a binary file which contains the bytecodes for each of the methods in the system. So you may well ask: how is it that when you use a class browser you can see the text corresponding to all the classes in the system? Well, if the method was not created by you, but is part of the image, then the text for the method is found in the sources file m206lw.sou. For reasons of efficiency (both speed of execution and economy of memory) the objects and their executable bytecodes are kept separate from the source code that defines classes and their methods.

Thus you can see that in the case of the Smalltalk environment persistence is achieved by a mixture of text and binary files.

The other major files in the system are arranged in a number of subdirectories, the most important of which are those concerned with LearningBooks. The LearningBooks always end with the extension .lw (for example, lb49.lw for the

You may investigate these mechanisms by looking at the file course.dfn which contains an expression series containing messages to the class M206Launcher.

Recall that the text of a method body is translated into bytecodes after accepting the text of the method in a browser (described briefly in Chapter 17 and mentioned in Chapter 22). Although the method bytecodes are kept in the image, the text of the method body is not.


CHAPTER 49 PAGE 11

LearningBook for this chapter). The LearningBooks contain classes and objects that are needed in a particular chapter of the course, and you can save your own classes and objects in them.

As an aside you might be interested to know that many Smalltalk environments have a changes file which contain a record of changes made to an image, especially which contain source code of new classes and methods. Because M206 never saves a new image, but saves all work in LearningBooks, there is no permanently saved (or persistent) changes file. One exists while you are working on a LearningBook but it is deleted when you close the LearningBook.

When you open via the Launcher a LearningBook as supplied by the Course Team, it sends a message to an object which, in effect, goes to the Originalsubdirectory in the LBs directory and loads the object from the file of the corresponding name into the image. In this way the objects and classes the LearningBook contains are made part of the image and can be executed by the engine. So, for example, when you open LearningBook LB-13, the Course Team generated file Original\lb13.lw will be loaded (that is, the file lb13.lw in the folder Original). This brings into the image the classes Account, Frog, HoverFrog and Toad. Each LearningBook loads its own classes when opened and in fact removes them when closed. This allows different versions of the same class to be used on the course.

If you subsequently save the LearningBook, a new file will be created –Saved\lb13.lw. This will be marked in the launcher by showing the LearningBook in bold. Not only will it do this, but any methods you have changed or created will be stored as will any instances and their state. So when you re-open the LearningBook, all will be as it was when you closed it.

Exercise 3 LearningBook LB-21 also contained a description of the classes Frog, HoverFrog and Toad in which they were subclasses of Amphibian rather than Object (as they were in LearningBook LB-13). Explain why the system does not confuse the two definitions.

Exercise 4 Given the above explanation, can you guess why it is not possible to have more than one LearningBook open at once?

Review Question 2 The left-hand column below contains the names of four files. The right-hand column contains five file types. Rearrange the second column to indicate which file type belongs with which file. Add a third column to indicate if the files are text

or binary. Add a fourth column to say whether or not the files are executable (label those that are not executable as ‘documents’).

(a) a.im LearningBook

(b) a.sou engine

(c) a.lw sources

(d) a.exe image


CHAPTER 49 PAGE 12

2 Text files and streams This section looks at how to access text files using the class Filename and how to write to these files using Stream objects which, as the name suggests, allows ‘streams’ of objects to be produced or consumed. This allows you to make text persistent. Although what we describe is applicable to text, all the examples will in fact use only plain text (ASCII codes 0–127).

2.1 File and stream classes

The Smalltalk system has to be able to communicate with the hardware in order to write files to or read files from disk. You need not concern yourself with great detail of how this is done, but we shall concentrate on two classes that you must know about in order that you can read and write files. They are the classes Filename and Stream.

The Filename classFor simplicity assume throughout that we are always working in the current directory. Unless you explicitly program to work in a particular directory, the current one will be used. The current directory is the directory in which the currently loaded image file is located; typically it is the folder LW. Therefore all the files mentioned below would be in that folder.

Suppose a disk contains, in the current directory, a text file called tmas.txt for holding TMA scores. Consider the following code:

|tmaFile|tmaFile := Filename named: 'tmas.txt'

When evaluated this expression creates an object which represents the named file (here tmas.txt) that can subsequently be manipulated by referring to it with the variable tmaFile.

The class Filename is in fact an abstract class and so you do not actually create an instance of it. Although the instance creation message named: causes the appropriate class method to execute in Filename the result is not an instance of Filename. What happens is that Filename uses the information about what operating system the Smalltalk environment is running on and creates an instance of an appropriate concrete subclass; for example, FATFilename on certain personal computers running particular Microsoft operating systems. In this chapter we shall not usually refer to the concrete subclasses. Just as we did for strings we will describe Filename objects and their protocol and ignore the particular concrete classes. Therefore, when we describe an ‘instance of Filename’ we actually mean an instance of the appropriate subclass.

Filename objects contain, most importantly, the name of the file to be accessed or created.

In outline, the class Filename includes in its protocol the following messages:x contentsOfEntireFile – answers with a string that is the contents of the file represented by the receiver;

x delete – deletes the file or directory represented by the receiver;

Stream objects are not only used with files; they can be used in many contexts in which collections are used. However, we use them mostly for manipulating files and thereby ensuring persistence of information from a Smalltalk environment.

The class Filename is similar to the classes String and Symbol in that they are all abstract classes, but you should always program with them rather than with their concrete subclasses which do the work.


CHAPTER 49 PAGE 13

x directoryContents – answers with an array of strings that are the file names of the directory represented by the receiver;

x exists – answers with true if the file represented by the receiver exists;x fileSize – answers with number of bytes in the file represented by the receiver;

x makeDirectory – creates in the current directory a new directory represented by the receiver.

We shall illustrate the use of the last of these messages and leave you to try out the others. To create a directory called MyWork as a subdirectory of the current directory we would evaluate:

|dirName|dirName := Filename named: 'MyWork'.dirName makeDirectory

Note how the class Filename is used despite the fact that we are creating a directory. Evaluating this code when MyWork already exists will result in an exception.

Exercise 5 Write a Smalltalk expression series that obtains the contents of the file course.dfn and shows the contents in a dialogue box.

Exercise 6 Use the exists message to determine whether a file called tmas.txtis in the current directory. If it is, delete the file and display a dialogue box confirming the deletion. If the file does not exist, display a dialogue box to that effect.

The Stream classes

The message contentsOfEntireFile when sent to an instance of Filenameenables the contents of the file to be recovered. Often you will not want the complete contents of a file but will want to process some part of it. Furthermore you have yet to see how a file can be created.

To get better control over reading files and to be able to create files, you need to study objects called streams. Pictorially the relationship between a stream and a file is as shown in Figure 9. The figure shows a situation where the file is to be read via the stream; it is positioned just before the first character of the file, that is, the position of the stream is 0. Messages are provided in the protocol that enable a byte to be retrieved or stored and to adjust the position of the stream.

What is happening in this collaboration is that the file provides the storage and the stream provides the mechanism to access information as bytes. The way a stream can operate depends crucially on whether the stream is for reading only, for writing only, for reading and writing, or for appending (that is, placing at the end). The way that Smalltalk allows these variations is by using objects of different stream classes.

direction of stream

1

position in stream

. 1 2 tab D r a f t sp C b l o c k .

Figure 9 Stream over a collection of objects

It may help you to visualise how streams and files collaborate if you think of the file as shelving in a warehouse and the stream as a small truck that moves along the shelving retrieving the items from the shelves or loading them onto the shelves.


CHAPTER 49 PAGE 14

The stream classes used for manipulating files are called external streams to distinguish them from internal streams that you will study later. An external stream object is used whenever sequences of bytes need to be sent to or from hardware devices like a hard disk.

The stream classes that we shall be concerned with are summarised in the partial hierarchy below. (We have omitted many subclasses.)

Stream ExternalStream ExternalReadStream ExternalWriteStream

Stream is an abstract class from which ExternalStream inherits (via several classes not shown). The concrete classes you need to know about are ExternalReadStream and ExternalWriteStream.

Creating a file

Here we look at the practicalities of creating files. Creating a file means writing a stream connected to a file. Suppose you wanted to create a file named calendar.txt on your hard disk that contains the text in Figure 1. To do so is a four-stage process:

(a) a Filename object is created that references the name of the file which is to be created on the hard disk;

(b) an external stream is created for writing – that is, for sending strings of characters to the file – and is connected to the external disk file;

(c) the stream is processed;

(d) the stream is closed. (You will see what this means shortly.)

Here is the code corresponding to this process:|infoFile outputStream|infoFile := Filename named: 'calendar.txt'.outputStream := infoFile writeStream.outputStream nextPutAll: '1.12 Draft Calendar

An outline of the week-by-week study pattern follows. Note that the blocks are there to help you pace your work and that there is a wide variety of material in each block.'.outputStream close

First, the Filename object is created and assigned to infoFile.

Judging solely by the name outputStream, the next line would appear to create an instance of WriteStream, which indeed it does. But the code should strike you as odd because there is apparently no instance creation message being sent to a Stream class. In fact, external streams and objects that connect them to files collaborate so closely that the Filename instance protocol includes messages for creating the stream and making the connection. Thus you never have to worry about creating stream instances directly for use with files; you send an appropriate message to the Filename instance and let it get on with all the hard work. Here the message is writeStream which you can deduce is a message for creating a stream for writing. At this point you would say the file is open and thereby ready for processing.

IMPORTANT

The Stream hierarchy is a very elaborate one, which you will discover if you use Stream printHierarchy.


CHAPTER 49 PAGE 15

There then follows the message nextPutAll: to outputStream. This message has a string argument (beginning ‘1.12 Draft Calendar …’) which ishow bytes are written to the stream and hence to the file. We shall look in more detail at how to write to streams using messages like nextPutAll: in the next section.

Last, you will notice that the object referred to by outputStream has been sent the message close. All streams must eventually be closed and this operation disconnects the connection between the stream and the disk file. Thus we say that a file is closed. Failure to close a file can have serious consequences for the file contents.

Review Question 3 What classes are involved in deleting a directory in the default

directory?

2.2 Writing to a stream

This section looks in more detail at the messages used to write characters, and strings of characters, to an instance of ExternalWriteStream.

The class Stream contains an abstract protocol consisting of most messages a stream might use and these are inherited by ExternalWriteStream.

As outlined above, an ExternalWriteStream instance essentially consists of a position indicator, a character string and a connection to the file on disk. An instance variable position holds an integer value which indicates the current position for writing to the stream. The characters in an external stream are stored in a string referenced by an instance variable called collection. There are a number of other instance variables in an external stream, in particular one that handles the connection referred to above, but we shall not be concerned with the details. Messages are provided that allow characters beginning at the current position in the collection to be written while simultaneously moving on the position indicator. We investigate some of these messages below.

The message nextPutAll:

You have already seen an example of the use of this message. The message is used to place a string onto the stream starting at the next position after the one indicated by the stream’s position variable. Unlike the first example, the code below uses this message twice. Indeed, the message can be used as many times as required while the file remains open.

aFilename := Filename named: 'M206.txt'.anExternalWriteStream := aFilename writeStream.anExternalWriteStream

nextPutAll: 'Computing:';nextPutAll: ' An Object-oriented Approach'

Note how we do not have to concern ourselves with position in these messages; that is all taken care of. The effect of the second nextPutAll: message is to concatenate its argument to the string already on the stream. In other words, the space before An is placed immediately after the colon that is already on the stream.

Most programming languages and operating systems use an area of memory called a bufferto store bytes on their way to and from a file; bytes left in the buffer must be cleared out, and the operating system must make sure it has recorded the size and location of the file. ‘Closing’ a file involves these operations.

IMPORTANT CONCEPT


CHAPTER 49 PAGE 16

We have left this file open so that we can add more to it below. Whenever a file is open, as here, it can only be accessed by the application that opened it. This means that another application such as a word processor cannot access it until we have closed it.

The messages nextPut: and cr

The message nextPut: takes a Character object as argument and puts it onto the stream at the next position after the one indicated by the position instance variable, in other words after all the characters that are on the stream already. In more detail, nextPut: adds the character to the string collection referenced by the stream’s Collection instance variable. If it is the first character to be put on the stream, a string collection containing just that character is created. The expressions

anExternalWriteStream nextPut: $ .anExternalWriteStream nextPut: $1.anExternalWriteStream nextPut: $9.anExternalWriteStream nextPut: $9.anExternalWriteStream nextPut: $8

would add the string '1998' to the stream so that its content would now be 'Computing: An Object-Oriented Approach 1998'.

Non-printing characters can be added to a stream but special messages are provided for those, like carriage return and tabs, that occur frequently. So while we could use the rather verbose nextPut: Character cr, we would instead append a carriage return by directly sending the message cr to anExternalWriteStream as follows:

anExternalWriteStream cr

Streams are normally assembled with a mixture of nextPut:, nextPutAll:and cr messages, with nextPut: being used to append characters one at a time.

The message next:put:

It is quite common to have sequences of the same character repeated in a text file. For instance, you might indicate divisions between sections of a long document by using sequences of hyphens -------------------. The message whose method header is:

next: anInteger put: aCharacter

writes anInteger number of aCharacters onto a stream. The following piece of code adds 43 hyphens to the stream anExternalWriteStream.

anExternalWriteStream next: 43 put: $-.anExternalWriteStream close

As a result of the above processing the file M206.txt would contain the following:

Computing: An Object-oriented Approach 1998-------------------------------------------

Because nextPut: takes a character as an argument, you can use it to write up to two bytes at a time. (You usually do not have to consider this.)

Note the first character in this sequence is a space character.


CHAPTER 49 PAGE 17

Two final messages that you may find useful in constructing streams are as follows:

x space – write a space character to the stream (this is equivalent to nextPut: $, but has the advantage that the space character is referenced by name rather than a character that is essentially invisible when printed in text as here);

x tab – write a tab character to the stream.

Review Question 4 What does the following expression series do?

filename := Filename named: 'info.txt'.anExternalWriteStream := filename writeStream.anExternalWriteStream

nextPutAll: 'Information'; cr;next: ('Information' size) put: $-; cr;nextPutAll: 'When you arrive ';nextPutAll: 'please proceed to Reception.'; cr.

anExternalWriteStream close

2.3 Reading from a stream

The class ExternalReadStream is used for reading characters from a stream. A file that is opened for reading using an instance of ExternalReadStream cannot be written to. This section introduces messages used to read characters and strings from an ExternalReadStream.

When attempting to read from a file there are two things you must always bear in mind: does the file that you are attempting to read exist and, if it does, is it empty? It is quite easy to create a file which is empty by opening a file for writing and closing it immediately without appending anything to the stream. Attempting to process such an empty file can be problematic.

Opening a file to read

The way to open a file for reading is analogous to that for writing. Essentially the word write gets replaced by read and so the strategy is:

(a) a Filename object is created that references the name of the file which is to be read from the hard disk;

(b) an external stream is created for reading and is connected to the external file;

(c) the stream is processed;

(d) the stream (and hence the file) is closed.

So, opening M206.txt for reading could be achieved by:aFilename := Filename named: 'M206.txt'.anExternalReadStream := aFilename readStream

You should be able to deduce from this that the message readStream has created an instance of ExternalReadStream which is then assigned to the variable anExternalReadStream. This stream is now ready for processing.

Remember you do not create Stream instances directly.


CHAPTER 49 PAGE 18

Before we do this it is worth remarking that, had the file M206.txt not existed in the current directory on disk, an exception would have resulted and processing would have stopped. In this section we shall assume that all the files we use do exist.

Processing files for reading is more complex than for writing because we have more control over where in the stream we wish to process. You will recall that all streams have an instance variable position which controls where in the stream processing takes place. When writing to streams we are not directly concerned with position but when reading them we often have to be concerned with where the reading position is. In other words there will be situations where we want fine control over where in the stream the reading is to be done. However, before we embark on that we shall study some messages which give us reasonable control over what is being read but which do not require us to consider position.

Messages for reading streams

In this section we shall look at some of the messages that enable streams to be read that do not involve us in directly thinking about the read position of the stream. As an example for this section we shall assume there is a file address.txt in the current directory whose content is:

The Open University*Walton Hall*Milton Keynes

Because addresses such as that of The Open University do not always conform to the format of ‘business name, street, town, county’, the method of representing the institution as a single item illustrated above is sometimes used instead. The asterisk is referred to as a separator. In choosing a separator you have to ensure that the separator character does not appear in data being represented.

One of the tasks we shall set ourselves is outputting this address on the screen in the format:

The Open UniversityWalton HallMilton Keynes

For each message that we study below we shall assume that the file has been opened using the code:

addressFile := Filename named: 'address.txt'.addressReadStream := addressFile readStream

The message isEmpty

This message can be used to test if a stream, and hence the file to which it is connected, is empty. Its use is illustrated by:

(addressReadStream isEmpty) ifTrue: [Dialog warn: 'The file is empty']ifFalse: [Dialog warn: 'The file is not empty']

In the remainder of this section we shall assume that the file is closed and then re-opened before going on to study the next message. In this way we know that the stream is positioned at the start of the file in each section.


CHAPTER 49 PAGE 19

The message contents

The expressionaddressString := addressReadStream contents

will result in the addressString being the string 'The Open University*Walton Hall*Milton Keynes'. It is very easy to confuse the message contents with contentsOfEntireFile. Remember that the latter has a receiver which is an instance of Filename so could be sent to addressFile, whereas the former has a receiver that is an ExternalReadStream instance.

The message next

The next message answers with the next character in the stream; that is, the character at the position which is one after the position indicated by the position instance variable. Normally you need not worry about the position because this message can be used to read the stream character by character in sequence – that is, byte by byte. This message gives a little more control than the message contents in as much as it enables you to process individual characters.

The following code will read only the first character from a text file, and will assign it to the variable called firstChar.

firstChar := addressReadStream next

The message atEnd

Clearly, reading a single character is not very useful. Normally all characters are read in turn. This can be done by successively reading each character using next. But in so doing we need to know when we have come to the end of the stream. The Boolean message atEnd answers true if the end of the stream has been reached and false otherwise. So the whole of the file M206.txt can be read character by character and displayed in the Display pane as follows.

[addressReadStream atEnd]whileFalse:

[display showChar: addressReadStream next]

Here you will notice the use of the whileFalse:message. The Boolean condition block is repeatedly evaluated until as a result of the sending of next in the second block, atEnd answers with true. This is a standard arrangement for file reading in all programming languages: check whether the end has been reached and if not continue to read until it has. Of course, if the file happened to be empty, this code would not display anything because the block would not execute at all.

The message upTo:

The message upTo: aCharacter answers with the string up to but not includingthe first occurrence of aCharacter in the receiving stream. It then resets position so that aCharacter is effectively skipped. In the event that aCharacter is not in the receiver Stream then it answers with the whole of the receiver and resets position to be at the end of the stream. This is the message we need to use to present the address in the file address.txt with each component on a new line. Study the code below and then read the commentary that follows.

IMPORTANT

In the next section we shall look at position in detail and see the integer values it assumes.


CHAPTER 49 PAGE 20

aLine := addressReadStream upTo: $*.display show: aLine; cr.[aLine ~= '']

whileTrue:[aLine := addressReadStream upTo: $*. display show: aLine; cr].

addressReadStream close

To explain the detail here we need to recall that the file stores the following:The Open University*Walton Hall*Milton Keynes

As a result of the first assignment aLine is the string 'The Open University'and position is changed so that the next message to read will start with the character W. This means that only

Walton Hall*Milton Keynes

is left to be read.

The code then displays aLine followed by a carriage return. The Boolean condition block answers true, so the whileTrue: argument block is evaluated which results in aLine being the string 'Walton Hall'. This is displayed, followed by a carriage return. At this stage only

Milton Keynes

is left to be read. The Boolean condition block again answers true and so the whileTrue: argument block executes and, since * does not appear in the remainder of the stream, the remainder of the stream is assigned to aLine. So 'Milton Keynes' gets displayed.

Exercise 7 Explain what happens next.

Open LearningBook LB-49 and work through Session 1. When you have completed Session 1, return to this printed text at this point.

Resetting and accessing the stream position

This section covers instance methods used by ExternalReadStream to move to a different position in the stream, or to obtain the position. Before we start we need to be a little more specific than hitherto on precisely how position works and to give you a pictorial representation of it. We shall use the example immediately above to show the integers that position assumes as the code is executed.

Initially position is the integer 0 and reading will start from position 1, as shown in Figure 10.

O

direction of stream

T

position in stream

h e p e n U i K e y n e sn

Figure 10

After the first upTo: $* message the stream can be represented as shown in Figure 11.

Remember the ~= binary selector means ‘not equals to’.

So effectively the * is skipped.

LearningBook LB-49Session 1


CHAPTER 49 PAGE 21

MW

direction of streamposition in stream

t y a l t n H a l l io **

Figure 11

At this point position references the integer 20 so that the next character read will be that at position 21, that is the character W. After the next upTo: $*message, position references the integer 32 so that the next character read will be from position 33 which corresponds to the letter M.

The class ExternalReadStream includes in its protocol get and set methods for position so that the position where the next read occurs can be adjusted. Great care must be taken when adjusting position and you need to remember that a read always takes place one byte further along the stream than the integer stored in position.

We shall illustrate its use on a file called address2.txt. This file will store many addresses but the maximum length of a single address will be 50 characters. Addresses that are shorter than 50 characters are filled out with space characters and those that are longer have to be abbreviated in some way. So examples are shown below in which we have used the symbol ∇ to denote a space character.

The∇Open∇University*Walton∇Hall*Milton∇Keynes∇∇∇∇∇London∇Transport*55∇Broadway*London∇∇∇∇∇∇∇∇∇∇∇∇∇∇∇National∇Express*4∇Vicarage∇Rd*Edgbaston*Birminghm

Hence, if 50 addresses like these were stored in a file, then that file would have a consistent format. There would be an address every 50 characters. A consistent file structure like this makes it easy to recover individual addresses from the file. We simply read the first 50 characters to get the first address, the second block of 50 characters to get the second address, and so on.

The message next:

The next: anInteger method allows us to do this: it reads the next anInteger elements of the receiver. Here is an example which reads the first two addresses.

addressesFile := Filename named: 'address2.txt'.addressesReadStream := addressesFile readStream.address1 := addressesReadStream next: 50.address2 := addressesReadStream next: 50

Note that we have left the file open but unlike earlier examples we can now read at any position we like rather than from where we have just left off. Suppose we wanted the fifth address. Then we need to set position to 200 so that the read starts at position 201 and extends to 250.

addressesReadStream position: 200.address5 := addressesReadStream next: 50

The message reset

The following code allows a user to choose a numbered address to be retrieved from the file. Because we may not be entirely sure of where we are in the stream as a result of the previous activities, we have to reset the position variable to 0.


CHAPTER 49 PAGE 22

We could have done this using the message position: 0 but instead we have used the reset message of the ExternalReadStream protocol.

addressNumber :=Dialog request: 'Which address number?' asNumber.

addressesReadStream reset.addressesReadStream position: 50 * (addressNumber - 1).display show: (addressesReadStream next: 50).

Exercise 8 The message setToEnd resets position so that it is at the end of the stream. By using this message calculate how many addresses are stored in the file

address2.txt and then use a timesRepeat: message to write each of them on a new line in the display.

The message skip:

The message skip: sets the new position value to be the old position plus the integer referenced by its argument. When the integer is positive, skip:moves you towards the end of the stream; when it is negative it moves you towards the beginning. The code below shows how it is possible to use skip: to recover the last address in the file.

addressesReadStream setToEnd.addressesReadStream skip: -50.lastaddress := addressesReadStream next: 50

The message skipToAll:

It is somewhat unlikely that a user would request, say, the third address in the file. More likely they would have a business name and would want its address. The message skipToAll: aString is a pattern recognition message. It will find the first occurrence of aString in the stream and alter the positionvariable to indicate the start of that string.

addressesReadStream reset.addressesReadStream skipToAll: 'National Express'.theAddress := addressesReadStream next: 50.addressesReadStream close

would retrieve the string 'National Express*4 Vicarage Rd*Edgbaston*Birminghm'. Finally we have closed the file.

Exercise 9 The Stream classes have a very rich protocol and we can only give you a brief overview here. We would encourage you to use the browser to explore them further. To give you some practice we ask you investigate the message nextMatchFor:whose method header is:

nextMatchFor: anObject

(a) If you were trying to find this method in the browser, where would you look for it?

(b) Using the initial comment of the method,

nextMatchFor: anObject"Read the next element and answer whether it is

equal to anObject."

describe what the method does.

(c) Using the solution to (b) write an expression series that would write out all the addresses in address2.txt whose business name begins with the letter A, that is, whose first character is the letter A. You may assume that the file is already open and connected to addressesReadStream as in the examples in the printed text.


CHAPTER 49 PAGE 23

Review Question 5 Write an expression series that displays only the last line of a text file.

2.4 Accessor, iterating, and testing messages

The final external stream message we shall consider is the iteration message do: aBlock. This evaluates the argument for each of the remaining characters after position. It works in a similar way to the do: message used by collection methods, but for external streams has the added power of only working for characters which come after the current position in the stream.

For instance, imagine a field biologist is recording statistics on a family of eight rabbits living on a hillside. He keeps a file of records of how many rabbits are out foraging for food at noon each day for a week. The file begins with the date followed by seven single-digit numbers, representing the number of foraging rabbits. For instance the file for the week beginning 5 November 1997, rab.txt, looks like.

051119977167656

Notice that the file has a fixed format, the date always takes up eight characters (in this case 05111997), and the number of rabbits counted can always be represented by one character. In this way the programmer who set up this file has taken account of the year 2000 problem by representing the year in four numeric characters (say 1988) rather than in just two (say 88). However, the programmer has not done so well with the representation of the number of rabbits: it is represented not by an integer, which might require several characters to represent it, but by one character. This will limit the number of rabbits to nine. Presumably, the programmer has reasoned that, for example, character 1 can easily be converted into an integer object referencing the literal 1 and has ignored the fact that a sequence of characters (that is, a string) can easily be translated into a number (in Smalltalk with the asNumber message).

The average number of rabbits foraging at noon for that week can be calculated using the following code:

rabbitFile := Filename named: 'rab.txt'.rabbitStream := rabbitFile readStream.rabbitStream skip: 8.total := 0.rabbitStream do:

[:j | total := total + j asInteger].average := total/7.rabbitStream close

The skip: message skips the read position past the characters representing the date in the stream. The do: message’s argument is the block of code needed to calculate the total number of rabbits seen on the hillside in the week. Notice that each j, a character object, has to be ‘converted’ into an integer before the running total of the number of rabbits can be calculated.

Many programmers of commercial systems over the last few decades have not been so careful, resulting in severe problems for systems expected to distinguish between dates like 1988 and 2088. However, a recent article in The Economist pointed out that the initial cause of the ‘2000 bug’ was not so much carelessness but the problem of limited space on 80-column punched cards. In that context the decision to truncate the date was a sensible one. This is not to condone later practice, however!


CHAPTER 49 PAGE 24

2.5 Errors and closing streams and files

In none of the descriptions of messages have we specifically mentioned error conditions. For example, what happens if you set position to an integer that is beyond the end of the stream, or you skip to a position that is beyond its end? We leave you to discover most of these things for yourself by using the browser but here we give a brief overview of how to avoid error situations, particularly those that can result in loss of information. There is nothing more frustrating than doing lots of work in a LearningBook only to make an error from which it is impossible to recover without closing down the system with subsequent loss of all your work. Saving your LearningBook frequently will minimise this risk.

A simple error to make is to try to open a write stream on a file that is already open. An exception results that informs you ‘only one write access path allowed’. This is relatively harmless but if you see this message you will know that the file you are trying to open is already open. What is not harmless is failing to close a stream and we discuss various scenarios below.

The most common failure when using a stream is not closing it after you have used it to create or manipulate a file. It is not a good idea to leave a stream open once you have completed its immediate processing even if you think you might write to it again later on. This is because you may cause an error in the meantime, in which case the file will still be open and information may be lost, either from the LearningBook or the file. For example, suppose you create a write stream on a file and then close, without saving, the LearningBook you are using. The file will remain open but you will have lost the variable that referenced the stream, and so you will have no means of closing the stream. The only way to force a closure would be to quit the LearningWorks environment completely and this would result in a loss of all the information that you had sent to the stream.

Failing to close streams can also result in unexpected outcomes. For example, imagine you create a file using a write stream and then wish to read the first character in the file using a read stream. Will the following code do this?

aFilename := Filename named: 'test.txt'.aWriteStream := aFilename writeStream.aWriteStream nextPutAll: 'hello'.

anotherFilename := Filename named: 'test.txt'.aReadStream := anotherFilename readStream.aReadStream position: 1.aChar := aReadStream next

You will have recognised that a close was needed after the first three expressions. But if you had evaluated all seven expressions Smalltalk would have happily accepted them. You can have a read stream reading from this file, which already has a write stream to it. The only problem is this file would have nothing in it! So instead of reading the expected character $h, the last expressions would have read nil. It is the close message that tells a write stream to write its contents to the file on disk. Because this message had never been sent, nothing was written to disk.

Manipulating the position instance variable, using a skip message, can also easily result in errors. For example, an ‘EOF error’ exception will result if you try to set the position of a read stream to an integer beyond the end of the stream.

EOF stands for ‘End Of File’.


CHAPTER 49 PAGE 25

Conversely, if you skip back too far in a stream, towards its beginning, you will be notified of an exception ‘attempt to position to a negative number’. These exceptions messages are relatively informative but the system cannot always be so helpful. Consider the following code which attempts to use a character, rather than a string, as the argument for nextPutAll:.

aFilename := Filename named:'t.txt'.aWriteStream := aFilename writeStream.aWriteStream nextPutAll: $a.aWriteStream close

This results in an exception ‘Message not understood do:’. Such is the variety of errors that we humans can make that it is impossible for the system always to pinpoint exactly what went wrong and thereby give us an informative error report.

When evaluating stream and file handling expressions in the workspace, you can easily recover from the consequences of an exception by sending the closemessage to the variable referencing the stream, thereby closing the file. Be cautious with streams: when in doubt, close them.

The situation is much more difficult when the code manipulating a stream is in a method. Typically, the stream will be referenced by a temporary variable or an instance variable that can only be accessed from within the method. When an exception occurs within the method the stream remains open and you are required to close it ‘by hand’. To do so you must use the debugger Messages page to find the source code containing the message close in the method and then evaluate it using the debugger Evaluate button. Simply cancelling the exception might leave an external stream open. As we have remarked above, that could lead to loss of information but in this instance things are worse because it could mean that you may not even be able to open the LearmingBook subsequently. You will not often have to manipulate files within a method but if you do then it would be a wise precaution to backup the LearningBook in which you do the work (as a separate .st file outside the Saved subdirectory) prior to implementing the method.

Review Question 6(a) Which two (main) instance variables have you studied in

ExternalReadStream?

(b) Name the two concrete stream classes you have encountered in this chapter so far.

Review Question 7 Does a file created using an ExternalWriteStream contain a persistent string object on the hard disk?

Review Question 8 Does the class Stream have to contain the code to perform the nextPutAll: operation? Must it have a position instance variable?




CHAPTER 49 PAGE 26

2.6 Summary of messages

The following summarises the messages introduced in this section.

x contents – answer with a copy of the receiver’s collection.x do: aBlock – evaluate the argument for each of the remaining characters after position.

x isEmpty – check if the stream’s string contains characters.x atEnd – answers true if the stream is positioned at its end, answers falseif it is not.

x nextMatchFor: aCharacter – answer true or false as to whether the next character is equal to aCharacter.

3 Internal streams In this section we very briefly look at some internal streams, streams that do not use external files for storing objects, but use objects which are ‘internal’ to memory. In particular we look at the way that print-strings are generated, as this has relevance later to how we can write files containing textual descriptions of objects.

3.1 WriteStream and ReadStream

Internal stream classes include WriteStream and ReadStream which perform similar functions to ExternalWriteStream and ExternalReadStream. The hierarchy for both the main internal and external streams is shown below.

Stream ExternalStream

ExternalReadStream ExternalWriteStream

InternalStream ReadStream WriteStream

InternalStream inherits from Stream via several classes that we have not shown. Instances of class WriteStream are used throughout the LearningWorks system within methods that generate a string description of any object. Every Smalltalk object can respond to the message printString, which answers with a description of itself in the form of a string. Thus to obtain a string description of an object you obtain its print-string by executing something like:

aDescription := anObject printString

To compare the way internal streams are used in comparison with external streams, we shall have a look at how printString works. The original printString method is defined in Object:

printString"Answer a String whose characters are a description

of the receiver."

|aStream|aStream := WriteStream on: (String new: 16).self printOn: aStream.^aStream contents

As before, we have not included all the subclasses of Streamin this diagram.

Remember that you do not actually have to print the print-string.

The String class method String new: 16 creates a string of size 16.


CHAPTER 49 PAGE 27

Suppose aFrog is an instance of Frog and that we evaluateaFrog printString

The (Object) message printString creates a writable stream on a string (initially of size 16) using the selector on: which is inherited from the class protocol hierarchy of WriteStream. Note that here, unlike external streams, we not only have explicitly to create the stream itself by sending a class message to WriteStream, but we also have to specify what object the stream is to open on. Here we have explicitly chosen a string as the collection for the writable stream. For external streams, we worked via an instance of Filename which implicitly used a string.

The next expression sends the message printOn: aStream to the receiver aFrog. The methods of Frog are searched to see if there is a method called printOn:. There is, and so it is used in the evaluation. This is the crux of how objects are able to describe themselves. Each object must have printOn: in its protocol. A default version is inherited from Object, but each class requiring something special (as Frog and Toad do) must provide the method printOn:that contains the details of how it is to be described. In fact printOn: must be overridden because Object defines a default version of printOn: which we shall examine shortly. What printOn: has to do is to assemble in the stream object the required description. Once printOn: has concluded and assembled the stream, the final expression in printString answers with the contents of the stream.

So, what is the default behaviour of the method printOn: as it is implemented in Object? Examine the code and then read the commentary that follows.

printOn: aStream "Append to the argument aStream a sequence of

characters that describes the receiver."

| title |title := self class name.aStream nextPutAll:

((title at: 1) isVowelifTrue: ['an ']ifFalse: ['a ']).

aStream print: self class

The default description for any object is simply a string consisting of the class name preceded by the indefinite article. This is assembled by sending name to the receiver’s class. This answers with a symbol corresponding to the name of the class (think of aFrog, then aFrog class name is #Frog). The first character of the symbol is examined and this determines whether ‘a’ or ‘an’ is needed. The message nextPutAll: (familiar from external streams) is then used to add whichever is appropriate to the stream.

Now all that is needed is to append the class name. However, nextPutAll:must supply a string to aStream (because it is a stream on a string). One way of doing this is to write:

aStream nextPutAll: title asString

or evenaStream nextPutAll: self class printString

When creating an external stream we did not send a class message to one of the Streamclasses – we let a Filenameinstance do the creation for us.

So, for example, we get 'a Converter', 'an Object', 'an Account'.


CHAPTER 49 PAGE 28

The latter is more subtle because it uses the printString message that relates to the class Frog (as opposed to the print-string appropriate for instances of Frog), and classes know how to respond to such requests: they answer with a string that is their name.

In fact, neither of these ways is how it is done. The implementors of Smalltalk instead chose to use a Stream message print:, which we shall not be studying, but which has the same overall effect.

You can now examine how printOn: was overridden in Frog by the Course Team. To be able to follow the code we remind you that if printString is sent to a newly-created instance aFrog, then the string produced would be

'An instance of class Frog (position 1, colour Green) '

The code for printOn: is:printOn: aStream

"Create a description of the receiver on aStream (an instance of Stream). Answer the receiver."

aStream nextPutAll: 'An instance of class'.aStream nextPutAll: self class printString.aStream nextPutAll: ' (position '.self position printOn: aStream.aStream nextPutAll: ', colour'.self colour printOn: aStream.aStream nextPutAll: ')'

The second nextPutAll: message in this method has to assemble the name of the class. This it does by sending self (that is, aFrog) a message to answer with its class which is then sent the message printString. This of course results in a string that corresponds to the name of the class. The third line of the code appends the string '(position' to the stream so that at this stage we have assembled 'An instance of class Frog (position'.

Next we need to append the receiver frog’s position. The integer representing the position is given by self position and this, being an Integer, has a printOn: message by which it can be attached to a stream. Note how this printOn: is the one defined in the Integer class and not the one you are studying here. We now add more characters using nextPutAll: and then append the colour. This requires the use of printOn: which is the message appropriate for the textual representation of colours. Finally we place a bracket as a string.

Exercise 10 Assume there is a new subclass of Frog called DietingFrog for frogsthat want to keep track of their weight. DietingFrog has an extra instance variable weight (and has the usual accessor messages). Write a suitable version of printOn: for the new class.

3.2 Why internal streams?

One of the reasons streams are used in Smalltalk is that they are often more flexible than collections (for example, arrays are normally of fixed size and changing their size is complicated). Being able to use the protocol for (internal) streams with collections turns out to be a very powerful programming facility.


CHAPTER 49 PAGE 29

A bonus is that stream methods such as nextPut: make use of primitives and so are very fast – much faster than similar collection operations.

So why not subclass the collection classes to give them stream behaviour? The main reason is that a collection, conceptually, is simply several objects which are grouped together. When you think of a collection in a general manner, you do not think of it as streaming away to a text file or a printer. The streaming is behaviour which is added to a collection. A stream has state, a collection together with a position. If a collection is not used as a stream, it is carrying around stream state that it does not need – that is, a redundant position indicator. Collections are used in many places in a system where economy of space is needed, and where they do not need to act like a stream.

Review Question 9 Which instance method should be overridden if you want to

provide a particular printString for a class other than the default one? Explain your answer.


4 Saving objectsIn previous sections you saw how to create text files from Smalltalk. In this section we show how to save objects in a text file. At first sight, this seems an unlikely thing to be able to do; after all, how can an object with all its associated state and behaviour be saved as text? This quandary is circumvented by showing how objects are saved as the Smalltalk expressions needed to ‘recreate’ them, that is to create objects of the same class and state. Later we discuss how objects are saved in binary files.

4.1 Objects saved as expressions

In this section we look at how Smalltalk expressions are generated from which objects can be recreated. We illustrate how this works by assuming that we have a Frog instance called sam that has position 2 and colour Brown. Such an object could be created as follows:

sam := Frog new.sam position: 2; colour: ColorValue brown

or more succinctly assam := (Frog new) position: 2; colour: ColorValue brown

The expression (Frog new) position: 2; colour: ColorValue brown

is the crux of how objects are stored in text files because it represents, in text, a means whereby the frog can be recreated. All we have to do to recreate the object is to assign this expression to a variable name in an evaluation pane and evaluate it.


Remember that colours are created using the class ColorValue. The global variables, Green, Brown, … we use in LearningWorks are just convenient references for the objects created by ColorValue green, ColorValue brown, … .


CHAPTER 49 PAGE 30

That describes the general principle, but it hides some tricky details. First, given an object such as sam, how can we create an expression by which it can be recreated to its current state? Second, even if we can obtain such an expression, it has to be expressed as a string before it can be appended to an instance of ExternalWriteStream and thereby be stored in a file. Third, given such a string has been recovered from a file, how can it be ‘evaluated’ to recreate the object?

The solution to the first two problems is provided by the message storeOn:which all objects inherit from Object. This message both generates the string and appends it to a stream. You will see the details shortly.

The third of the problems is solved by using the class Compiler. When an expression is evaluated in an evaluation pane it goes to a class called Compilerfor processing. This class has in its class protocol the message evaluate: which takes a string, corresponding to an expression, as argument and executes it. So

expressionString := '(Frog new) position: 2; ColorValue brown'

followed by evaluating the expressionaFrog := Compiler evaluate: expressionString

will create a frog called aFrog whose position is 2 and colour is Brown. The compiler evaluates the string as an expression, the result of which is a Froginstance, colour Brown positioned at 2, which is then assigned to aFrog.

So, let us now bring all this together and store the object sam, that is assumed to have the state given above, in a file called frog.obj.

frogFile := Filename named: 'frog.obj'.frogStream := frogFile writeStream.sam storeOn: frogStream.frogStream close

We then read this file and reconstruct the frog to a variable called theFrog:frogStream := frogFile readStream.expressionString := frogStream contents.theFrog := Compiler evaluate: expressionString.frogStream close

The storeOn: message would not in fact create the string used in the illustration above. Because storeOn: is inherited by all classes, it has to have a very general implementation which is beyond the scope of this course to explain. However, we can give you an idea of how it achieves its task by writing an implementation that could appear in the class Frog and which would produce the string used above. This implementation would then of course override that in Object. Be warned: we do this only to show you roughly how storeOn: works – our class Frog does not override this method. You may, if you wish, skip the explanation.

The technique for creating the string is similar to that used when overriding printOn:. Mostly we use nextPutAll: messages but when we need the state of an instance variable we use storeOn:. Study the code and then read the commentary that follows.

Remember that sam will almost certainly not be in its initial state when we want to store it.

In fact evaluate: is inherited from the class SmalltalkCompiler.


CHAPTER 49 PAGE 31

storeOn: aStream "Append to the argument aStream a sequence of

characters that is an expression whose evaluation creates a frog that is similar to the receiver."

aStream nextPutAll: '(', self class printString,' new) '.aStream nextPutAll: 'position: '.self position storeOn: aStream.aStream nextPutAll: '; colour: '.self colour storeOn: aStream

The nextPutAll: messages should be self-explanatory, but take care with selfposition storeOn: aStream. The receiver of this storeOn: message is an integer, and so it is the storeOn: method defined in Object that is referred to here and not the one we are currently defining! So a textual representation of an integer is appended to aStream.

Review Question 10 What string would be generated by sam storeOn: aStreamif previously the state of the sam object had been changed by the following?

sam position: 5; colour: Red

Note that the storeOn: message not only stores the object on the stream, but also stores any object that the object references through its instance variables. It also stores any objects that are referenced by the referenced object’s instance variables, and so on.

Unfortunately, the default storeOn: message does not give us the ability to store arbitrary objects, because it cannot deal with circular references. For example, imagine an object representing a dog that has an instance variable that references its owner. Also, imagine that the owner object has an instance variable that references the dog. Now imagine trying to store the dog object on the stream using the storeOn: message which the Dog and Owner classes inherit from Object. Not only does storeOn: attempt to store the dog, but it also attempts to store all the objects referenced by the dog – that is, it tries to store the owner object on the stream as well. But, in storing the owner, it tries to store the objects referenced by owner – that is, the dog! We are caught in a loop: the dog is stored, the owner is stored, the dog is stored, the owner is stored, the … . You could write a storeOn: method for dog and owner which breaks this circularity, but there is another way of storing objects, including those with circular references. LearningWorks provides the binary object streaming service (BOSS).

4.2 BOSS

BOSS, the binary object streaming service, has other advantages over storeOn:, besides being able to break circular references. Remember that storeOn: stores an object as the Smalltalk expressions needed to create an exactly similar object. This means that the Smalltalk environment importing the object must have a compiler to translate and evaluate the expressions. BOSS provides a means of passing objects between Smalltalk environments without having to compile expressions.

Smalltalk is often used to create specialised applications such as medical information packages or financial packages. Obviously, the creators of the Smalltalk system do not want to give away the whole system when, say, a hospital


CHAPTER 49 PAGE 32

is sold a medical information package. This would be like a carpenter giving away all his tools when he sells a table.

So before an application is sold, all the Smalltalk classes which are of no direct use to the customer are stripped out of the system. This includes the compiler, inspector, and debugger classes.

But this leads to a problem if, as undoubtedly it will, the medical information package requires the capability of importing objects from files on hard disk. Objects, as you have seen, can be stored as the expressions needed to ‘recreate’ them. But to recreate the object means that the Smalltalk compiler must be present in the Smalltalk system being used.

Remember, the Smalltalk expressions are translated by the Smalltalk compiler into byte codes. If you save the byte codes to disk then Smalltalk applications that do not possess a compiler cannot import the objects. BOSS saves objects in a binary file so that a Smalltalk environment without a compiler can import the objects. The binary files produced by BOSS are, unlike their textual counterparts, small and quick to read from and write to. Because BOSS is very efficient, it is used even if the Smalltalk environment does have a compiler.

An example of a group of objects saved to disk using BOSS is a LearningBook.

BOSS is quite complex, and we have discussed the main points of storing objects using the simpler storeOn: message, so we do not discuss BOSS in any detail. Just remember that any time you save a LearningBook you are using BOSS.

4.3 Filing out and filing in classes

BOSS is used to save objects to a file in a binary format, but classes are saved as text files which contain the full details of the class – class name, class comment, variable names, method names, and the code in the method bodies. This process is called the file out mechanism. In a similar vein, there is a file in mechanism for reading the classes into a Smalltalk environment and recompiling them. More details are given in LearningBook LB-49, Session 4.

Note that in saving a LearningBook, as well as BOSSing out objects, classes are filed out. The classes and objects are combined in one LearningBook file.


Review Question 11 Compare printOn: and storeOn:.



CHAPTER 49 PAGE 33

ReviewIn this chapter we have introduced you to the following concepts.

> The notion of saving information on a non-volatile medium so that it can persist unchanged.

> The way in which text files can be used to represent other non-textual information, particularly objects.

> Text and binary files and how they can be created from a Smalltalk environment.

> How to save objects and characters to hard disk from a Smalltalk environment, and read them again.

Objectives

After studying this chapter you should be able to:

> write software to create and read text files on disk ;

> write software to create and read files on disk that allow you to store information that represents objects;

> understand what it means to store an object on disk;

> understand and explain in general terms the use of both internal and external streams;

> change the print-string for a class.


CHAPTER 49 PAGE 34

Solutions to ExercisesExercise 1(a) Executable applications are binary files: they are not meant to be read by humans

and their contents have to be efficiently loaded into RAM for execution on a

particular computer system.

(b) Information files are meant to be read, usually by the simplest of text editors or word

processors and so (as their extension suggests) they are text files, usually plain text files.

(c) Because configuration files usually need to be easily changed and are meant to be

read they are (plain) text files.

(d) As the term ‘bit map’ suggests, bit map picture files are binary files.

Exercise 2 .lw is used for LearningWorks files and so the likelihood is that the file has something to do with the Smalltalk system.

Exercise 3 The classes are only loaded when the corresponding LearningBook is opened. When LearningBook LB-22 is opened, Frog, HoverFrog and Toad inherit from the abstract superclass Amphibian. When LearningBook LB-13 is opened Frog, HoverFrog and Toad inherit from the superclass Object.

Exercise 4 The restriction is imposed to avoid the problems of attempting to have possibly two different versions of a class loaded at once and of knowing which

LearningBooks to update if a class is changed.

Exercise 5 The following will display the contents of course.dfn.

|defnFile contents|defnFile := Filename named: 'course.dfn'.contents := defnFile contentsOfEntireFile.Dialog warn: contents

Alternatively, without any variables:

Dialog warn: (Filename named: 'course.dfn') contentsOfEntireFile

Exercise 6 The following code checks that the file exists and deletes it if it does.

|tmaFile|tmaFile := Filename named: 'tmas.txt'.(tmaFile exists)

ifTrue:[tmaFile delete. Dialog warn: 'File deleted']

ifFalse:[Dialog warn:

'File does not exist, so nothing deleted']

Exercise 7 Again the Boolean condition block executes (because aLine is the string 'Milton Keynes') but this time we are at the end of the stream and so upTo: $*answers with everything from the current position to the end of the stream; that is, it

answers with the empty string. This is displayed, the Boolean condition block now evaluates false and the iteration ceases.


CHAPTER 49 PAGE 35

Exercise 8 We have used two temporary variables to do the required calculations. The first line obtains the position of the end of the stream. The second line then calculates

how many addresses there must be. Note how it is necessary to reset the stream so that we start reading from its beginning.

lastPosition := addressesReadStream setToEnd position.numberOfAddresses := lastPosition/50.addressesReadStream reset.display clear.

NumberOfAddressestimesRepeat: [

anAddress := addressesReadStream next: 50.display show: anAddress; cr]

Exercise 9(a) You would need to search the hierarchy starting from ExternalReadStream

working up the hierarchy until you found the method.

(b) The method compares its argument with the next character in the stream (that is, the one after the position given by position). If they match it answers true otherwise it answers false.

(c) We have used the message atEnd to detect the end of the stream. Note how,

when a match is found, we have to move back one position because nextMatchFor: has moved us forward one position. When there is not a match

we simply skip 49 positions, which when added to the one generated by nextMatchFor: moves us forward 50 positions.

addressesReadStream reset.display clear.

[addressesReadStream atEnd] whileFalse:[

[(addressesReadStream nextMatchFor: $A) ifTrue:

[addressesReadStream skip: -1.anAddress := addressesReadStream next: 50.display show: anAddress; cr.]

ifFalse:[addressesReadStream skip: 49]

].addressesReadStream close

Exercise 10 The version of printOn: for DietingFrog is given below. It is based on the version used by Frog. Expressions to add the string 'weight' and the value of the instance variable weight need to be added; they are the third- and second-last expressions respectively. Nothing else needs to be changed.

printOn: aStream "Create a description of the receiver on aStream (an

instance of Stream). Answer the receiver."aStream nextPutAll: 'An instance of class '.

aStream nextPutAll: self class printString.aStream nextPutAll: ' (position '.self position printOn: aStream.aStream nextPutAll: ', colour '.self colour printOn: aStream.aStream nextPutAll: ', weight '.self weight printOn: aStream.aStream nextPutAll: ')'


CHAPTER 49 PAGE 36

Solutions to Revision QuestionsRevision Question 1 You will recall that when you close a LearningBook in the

LearningWorks environment you are prompted by a dialogue box asking you if you want to save that book or not. You generally save a LearningBook if you have developed classes of your own, or have created objects, in that book.

Revision Question 2 OpenWord is a simple word processor from which you can save text in a plain text format, or in a special format that maintains style information. In the former case you can open the file in any other word processor, or many other

applications, because it is a basic text file which has no formatting information special to OpenWord. In the latter case you can only open the file in OpenWord because only OpenWord can read files that have used the styles available in OpenWord.

Revision Question 3 A database of information about yourself could contain anything from your name to your medical record, from your address to your CV. Though, it is to be hoped, no one is contravening your data protection rights in keeping such databases on

you.

Solutions to Review QuestionsReview Question 1 All files consists of a sequence of bytes. Text files contain bytes encoded to represent characters and their content is intended to be read by humans.

Binary files consist of bytes which are not intended to represent characters; the fact that they can be interpreted as characters is irrelevant. Binary files are intended to be readable only by a program, or run directly on a computer. They are structured in a way

that is easy for particular software or a particular computer to read.

Review Question 2 The first column is the name, the second the purpose, the third whether the file is text or binary, and the fourth whether the file is executable or not.

(a) a.im image binary document

(b) a.sou sources text document

(c) a.lw LearningBook binary document

(d) a.exe engine binary executable

Review Question 3 Only the class Filename is used. A stream class is not needed.

Review Question 4 The expression series creates a file containing the following:

Information-----------When you arrive please proceed to Reception.

Notice how the number of hyphens used to underline the word Information was obtained by using size to determine the length of that string and using the result as the first argument to next:put:.


CHAPTER 49 PAGE 37

Review Question 5 We have included a test to see if the file is empty; if so a message is output to say that there is no last line. The external file is assumed to have the name

m206.txt.

m206File := Filename named: 'm206.txt'.m206Stream := m206File readStream.(m206Stream isEmpty) ifTrue: [lastline :=

'The file is empty so there is no last line'] ifFalse:[ lastline := m206Stream upTo: (Character cr). [m206Stream atEnd] whileFalse: [lastline := m206Stream upTo: (Character cr)] ].display show: lastline

Review Question 6(a) position, collection.

(b) ExternalReadStream, ExternalWriteStream.

Review Question 7 No. An ExternalWriteStream object uses nextPutAll:to provide strings to be sent to the hard disk, but they do not get saved as string objects. They are saved on the hard disk in text files. Text files are not composed of objects. String objects, as well as possessing state (like 'Peace in our time'), also possess a number of messages that act on that state. Neither characters nor strings in a text file contain messages. However, you might say that if you could read in a part of a text file and give it to a string object, then you can claim that the text file is as good as

having a persistent string object. This is a reasonable viewpoint, which is why ExternalWriteStream is often used to create text and another class, ExternalReadStream, is often used to read it back in again and create string objects from the text.

Review Question 8 No, it is an abstract class and therefore none of its methods need to have a concrete implementation. Nor does it need to define any concrete instance

variables.

Review Question 9 You must override printOn:. The message printString is understood by all objects as it is inherited from Object. The message printStringsends printOn: with a stream argument to its receiver. It is printOn: that generates the print-string as a stream of characters describing the receiver, although printString answers with the contents of this stream as a string.

Review Question 10 sam position: 5; colour: Red; storeOn: aStream would write a stream whose contents would be:

(Frog new) position: 5; colour: ColorValue red

Review Question 11 The message printOn: is used by printString (from the class Object) to generate the string returned by printString. The message storeOn: is used to generate a series of expressions needed to create an object of the same class and state as the receiver.


CHAPTER 49 PAGE 38

GlossaryASCII All characters typed on a keyboard are essentially numbers. Whenever you press a key it transmits a number to the microprocessor. Characters you can type on a keyboard that can be part of a text file are represented by ASCII (American Standard Code for Information Interchange) numbers.

binary file This consists of a sequence of bytes; not all of these bytes need represent ASCII characters. Binary files are readable only by a program, or run directly on a computer. They are compressed or structured in a way that is easy for a particular program or machine to read.

BOSS This stands for Binary Object Streaming Service; it allows you to save objects in a binary file.

external stream This consists of a character string, a position indicator and a reference to the file on disk. Messages are provided that allow characters beginning at the current position in the collection to be accessed (read) or added/replaced (written) while simultaneously moving the position indicator.

file A distinct and identifiable piece of stored information on a hard disk, or other persistent medium. All files have a name.

pattern recognition message a message that will match strings according to a pattern where the character * can represent zero or more characters; for example, 'A*B' would match 'AHOHHOHB' and 'AB'.

persistence The ability of objects to continue existence outside the programming environment.

position In relation to streams, this is the place at which an object can be extracted or placed into a stream.

stream An object used whenever streams of other objects need to be sent somewhere – to a printer, file, network, and so on.

This consists of a sequenceable collection of objects and a position indicator. Messages are provided that allow objects beginning at the current position in the stream to be accessed (read) or added/replaced (written) while simultaneously moving the position indicator.


CHAPTER 49 PAGE 39

Appendix of protocolsThe parts of the protocols studied in this chapter are listed here together for your convenience.

Filename

contentsOfEntireFile – answer with a string that is the contents of the file represented by the receiver.delete – delete the file or directory represented by the receiver.directoryContents – answer with an array of strings that are the file names of the directory represented by the receiver.exists – answer with true if the file represented by the receiver exists.fileSize – answer with the number of bytes in the file represented by the receiver.makeDirectory – create in the current directory a new directory represented by the receiver.

readStream – answer with a new stream, an instance of ExternalReadStream, connected to the receiver file that is newly created such that bytes can be added to the end of the file; the file is read only, it cannot be written to.writeStream – answer with a new stream, an instance of ExternalWriteStream, connected to the receiver file that already exists such that bytes can be written to the file; the file is write only, it cannot be read.

ExternalWriteStream

cr – store a carriage return in the stream.nextPutAll: – store the string argument in the stream.nextPut: – store the character argument in the stream.

next:put: – store the first argument the number of times given by the second argument.space – store a space character in the stream.

tab – store a tab in the stream.

ExternalReadStream

atEnd – answer true or false depending on whether or not position is at the end of the stream.contents – answer with the entire contents of the stream.do: aBlock – evaluate the argument for each of the remaining characters after position.isEmpty – answer true or false depending on whether or not the stream is empty.

next – answer with the next character in the stream.next: anInteger – answer with the next anInteger character in the stream.

nextMatchFor: aCharacter – answer true or false depending upon whether or not the next character is equal to aCharacter. Note that if amatch is found this message increments position by 1, moving beyond the match.


CHAPTER 49 PAGE 40

upTo: aCharacter – answer with a string from position + 1 to the occurrence of aCharacter. Reset position to the position of aCharacter.reset – reset position to 0.setToEnd – reset position to the end of the stream.

skip: anInteger – set position to position + anInteger.skipToAll: aString – skip forward to the beginning of the next occurrence (if any) of aString.


CHAPTER 49 PAGE 41

IndexASCII, 6atEnd, 19binary digits, 5binary file, 8binary object streaming service. See BOSS

bits, 5Boolean condition block, 19BOSS, 31, 32buffer, 15circular references, 31close, 15contents, 19contentsOfEntireFile, 12, 19, 39current directory, 12delete, 12, 39directoryContents, 13, 39engine, 10exists, 13, 39external stream, 14ExternalReadStream, 17, 20file in, 32file out, 32Filename

instance protocol, 12instance protocol for streams, 14 file name extension, 9

.exe, 9

.gif, 9

.htm, 9

.html, 9

.jpeg, 9

.jpg, 9

.txt, 9fileSize, 13, 39

format, 4HTML, 7, 9image file, 10information, 5internal stream, 14isEmpty, 18makeDirectory, 13, 39next, 19next:, 21nextPut:, 16open, 14pattern recognition, 22persistence, 5persistent, 4persistent memory, 4plain text file, 6position, 19, 20primitive, 29print-string, 26RAM, 4random access memory, 4ReadStream, 26, 39setToEnd, 22skip:, 22skipToAll:, 22sources file, 10space, 17stream, 12tab, 17text editor, 6upTo:, 19virtual machine, 10volatile, 4whileFalse:, 19WriteStream, 26, 39


CHAPTER 49 PAGE 42

CHAPTER 50

The Internet, Security and Network Computing

ContentsIntroduction 41 Local area networks 52 Communication systems and methods 73 Local area network communications 94 The Internet 95 Other networks 13

5.1 A dedicated network 135.2 Servers 155.3 Commercial transactions on the Internet 16

6 Security 18Review 19Solutions to Exercises 20Solutions to Questions 21Glossary 25Index 29

COMPUTING: AN OBJECT-ORIENTED APPROACH

CHAPTER 50

CHAPTER 50 PAGE 3

Concepts revisitedThis chapter assumes that you have practical experience of using the Internet. Ideas and concepts are revisited from Chapter 3 Using the Networks (and hence Chapter 8 of the Course Book) and from various TV programmes on network computing.

The following ideas and concepts are discussed further in this chapter.> Protocols are languages computers use to communicate with other

computers and with devices such as printers and modems.> Electronic data can be encrypted. Public key encryption provides a secure

method of sending messages over the Internet.> A firewall is the software that keeps unauthorised users or intruders outside

a network.> A server is the software on a central computer (the ‘server’) that manages

and provides services to a network of users. Client software is installed on each user’s computer for a particular network function. (The distinction between the software and hardware is often blurred in the use of the words server and client.)

> The Internet can be crucial to collaboration between companies that work on the same project. They use the Internet for daily communication and software exchange.

> Email is electronic correspondence. An email message is transmitted from the sender’s computer and stored on a network server (a computer). It is forwarded either to another computer on the same network or to a remote computer via the Internet.

> A commercial information service (also called an online service), such as Which? Online and AOL, allows access to some Internet services as well as the enterprise’s own private services.

Try answering the following questions to test your understanding.

Revision Question 1 Name some of the different ways by which you could communicate information via a network to another person. Consider the various advantages and disadvantages of each method; for example, its efficiency.

Revision Question 2 What can you do to ensure that if someone accidentally intercepts an email message from you only the intended recipient can read it?

Revision Question 3 For you to be able to send an email from your computer to someone using another computer, do the two computers need to be connected physically; for example, by a continuous piece of wire?



CHAPTER 50 PAGE 4

New conceptsThe following ideas are revisited or introduced. > The Internet works by breaking up messages into small pieces (‘packets’)

and transmitting these independently via a route that is rarely known to either the sender or the receiver.

> A standard protocol suite (TCP/IP) is used by the Internet and also by many other networks.

> Malicious computing involves people intentionally breaking into computer systems to damage or steal data. It can also involve people writing programs such as viruses, Trojan horses and worms as a prank or with the aim of causing damage to systems.

> Connecting computers together requires the encoding of digital signals, transmitting them over some medium and then decoding them at the other end. The choice of medium and the ‘noise’ affect the range of frequencies and maximum volume that a communications ‘channel’ can carry.

> Creating a local area network involves connecting computers via network interface cards and cabling, and choosing a network standard (a set of communications protocols) for all the devices to use.

Planning your studyThis printed text gives course-specific context for the readings in the Course Book. We expect this printed text and the associated Course Book work to take about 4 hours’ study time.

If your computer is on a network, you may want to try and find out some details about the network.

Section 4 directs you to a practical exercise available on the chapter’s Web page on the M206 Web site. Allow about 1 hour for this work – your connected time should be much less.

IntroductionThe main purpose of this chapter is to draw together the various strands of network computing you have met already through practical use and through studying previous chapters. Five aspects of network computing are covered here:> local area networks (LANs);> communications technology;> the Internet;> network applications;> security.

CHAPTER 50

CHAPTER 50 PAGE 5

Throughout the course you have been taking advantage of a number of services available ‘on’ the Internet (such as email, conferencing and the Web). To do this you connect your computer temporarily to the Internet and have it send information and requests to other computers. In this chapter we examine how the Internet works, a little of its history (at least what is generally accepted), and some of the risks associated with sharing data, applications and computers with people you do not know or trust. We also consider what is often described as ‘protecting your computer’. In fact you are protecting your most valuable electronic assets: your data. You can always replace your computer, your applications and your operating system if they become damaged or corrupted, but the information that you have acquired cannot be replaced except by retyping the data or reverting to an archived copy.

Obtaining information generated by others is one of the primary reasons for using a computer, whether it takes the form of documents which you might add to and pass on, or from services which you access via the Internet such as email. However, once you connect to the rest of the world via an ‘open’ system like theInternet, you could become prey to malicious computing. Maliciouscomputing involves people targeting you as an individual, or indiscriminately choosing a wider population, with the intention of stealing or damaging data. Just as you could protect yourself from catching a cold by becoming a hermit and so avoiding human contact, you could protect your computer – your data and software – from attack by not allowing it to be connected to any other computer either directly, or indirectly via disks and CD-ROMs. However, this would be an extreme measure and would limit the use of your computer. The power of computer systems is most evident when it is used to multiply the efforts of an individual by the efforts of many others. So connecting your computer to those of others is necessary. By understanding the risks and how to minimise them, you will be able to exchange data and connect to other computers with confidence.

On this course you have accessed the Internet and the course conferences from your office or home, perhaps using a corporate or third party Internet service provider (ISP). The experience of setting up your connection to the Internet may have been tedious and the performance may be slower than you would like. If you spend a lot of time every day working with the same group of people in the same building, then this connection process would probably not have been worthwhile and you would more likely have connected your computers permanently in your own private local area network (LAN). In the last part of this chapter we examine the use of the Internet for commercial activities where a fast and secure response is essential.

1 Local area networksIn setting up your computer, you probably had to learn about devices, such as a printer and a modem, that are attached directly to your computer. In this chapter you will learn about adding devices that you cannot see. We shall consider the hardware and software that enables you to connect a computer to a network, and the devices and services that this can make available.


CHAPTER 50 PAGE 6

You should now read Sections A and B of Chapter 7 (Local Area Networks and Network Hardware) and User Focus: E-mail in the Course Book, returning to this printed text at this point when you have finished.

Review Question 1 What is the difference between a LAN and a WAN?

Review Question 2 Name two of the services a LAN can provide.

Review Question 3 Why is it important for two users not be able to modify the same file at the same time?

Some additional points to note from the Course Book readings are listed below.> The Internet has added a new dimension to the way people think about

LANs, since Internet technology can make any computer anywhere in the world appears as if it is on your local network.

> If the protocols of the Internet are applied to a LAN, the LAN can offer Web services. Such a LAN is called an intranet. It works just like the Internet and, as far as a user is concerned, feels just like the Internet. However, its use is restricted to within an organisation.

> An extranet is a private, secure extension of a corporate intranet that allows business partners to use standard Internet technology to set up secure data communications over the public Internet.

> In Section B of Chapter 7 of the Course Book the different types of server are described. They all provide a service, and hence the name, although sometimes it is the hardware that is referred to as the server and sometimes it is the software. This was mentioned in Chapter 3 where you first met the term Web server to mean a server software that responds to requests for Web pages.

> A method of connecting two computers not considered in Section B of Chapter 7 of the Course Book, but covered at the end of Chapter 8 (which you studied in Chapter 3), is dial-up access via a modem. You probably connect your modem to the remote access server of an Internet service provider (ISP) through a protocol called point to point protocol (PPP). Once the connection is established, your computer is attached to your ISP’s local network which has a connection to the Internet. Subject to security or access constraints, you can then access any Internet resource.

Exercise 1 Suppose that a large database is stored on an application server. What would be the advantage of having the application server search for information as opposed to having your workstation do the search?


Review Question 4 Name two network standards used for LANs.

Review Question 5 What is the advantage of a network with a dedicated server as opposed to one with a server having peer-to-peer capability?

Course Book

CHAPTER 50

CHAPTER 50 PAGE 7

2 Communication systems and methodsIn the same way that files cannot simply be placed on a hard disk, but have to be encoded and organised into sectors and tracks, data cannot simply be sent along a wire or other communications medium. It must be encoded as binary code in a signal which a receiver must subsequently decode. Furthermore, because the communications medium may introduce errors into the transmitted signal, and because a receiver may need to be told what to do with the information in the signal, communications protocols must be used. Such protocols, often ‘layered’ one on top of another, are at the heart of LANs, WANs and the Internet. Indeed, later in this chapter we consider in some detail the Internet’s TCP/IP protocol suite and the involvement of layers of protocol.

You should now read Sections A, B and C (up to, but not including the subsection entitled ‘Ethernet and Token Ring LANS’) of Chapter 11 in the Course Book, returning to this printed text at this point when you have finished.

A few points arising from these readings are listed below.

> Leased lines come in a variety of bandwidths starting at 64 Kbps over ordinary copper wire (telephone) cable, with 128 Kbps and above requiring fibre-optic cable. It may be possible for an ISP to lease a business multiple copper wire feed to avoid the expense of laying fibre-optic cable.

> ADSL (Asymmetric Digital Subscriber Line) is a technology for transmitting digital data at high speeds using an ADSL modem with the existing copper wire telephone lines. ADSL works by dividing the available bandwidth of the copper wire telephone lines into three different frequency ranges, known as carriers. One of these carriers is reserved for Plain Old Telephone Services (POTS), a second is used for the downloading of data, while a third is used for the uploading of data. This means that you can still make and receive phone calls while your computer is attached to the Internet. ADSL is asymmetric in that it is specifically designed to exploit the one-way nature of most multimedia communication in which large amounts of information flow towards the user and only a small amount of interactive control information is returned. Using this technology download speeds of 8Mbps and upload speeds of 1Mbps are theoretically possible.

However in the UK, ISPs typically offer download speeds of 512Kbps, 1Mbps or 2Mbps (depending on how much you pay) all with upload speeds of 256Kbps. Since the majority of traffic is downloading rather than uploading, this difference between upload and download speeds is usually insignificant. ADSL connections are only available in areas of the UK where British Telecom has enabled the local exchange and, until recently, only to homes up to around 3.5km from the exchange (the connection deteriorates progressively with distance from the exchange). A recent advance in the technology – Rate Adaptive Digital Subscriber Line (RADSL) – has increased the range of ADSL to 5.5 km.

ADSL is a contended service: this means that the connection to a particular ISP from a BT exchange is shared with a maximum number of other local customers of that ISP (called the contention ratio). Most cheaper ADSL services are contended at a ratio 50:1 (more expensive/business packages have a contention ratio of 20:1). This means that from the local exchange, a customer of a particular ISP will be sharing the bandwidth of the connection

Course Book

Mbps is shorthand for mega bits per second. In computing mega- is 1,048,576, not 1,000,000 as in mathematics. The symbol is capitalised in both contexts. There is a distinction in the symbols for kilo-, where k is used in mathematics and is 1000, whereas K is used incomputing and is 1024 (210).


CHAPTER 50 PAGE 8

to the ISP with up to 49 other customers. Thus the performance of an ADSL connection will vary according to time of day and day of the week, depending on how many contended users happen to be online at that moment. However, it is very unlikely that all 50 people that are contended would be online at the same time, and even then it’s likely that most would be simply downloading email and browsing the web which takes up relatively little bandwidth. Only if all 50 users were using all of their available bandwidth (for example downloading very large files) would they notice a speed difference.

> Cable companies can now offer customers who have cable TV a high-speed connection to the Internet. This high speed access is provided by means of a cable modem at the user end which is attached to the company’s TV cable. You might think that a television channel would take up quite a bit of bandwidth on the cable, but the coaxial cable used to carry cable television signals can carry hundreds of megahertz of signals – all the channels you could want to watch and more – so there is still plenty of spare capacity to provide access to the Internet. Although this technology can theoretically provide access speeds of up to 30Mbps, UK cable companies typically offer download speeds of 512Kbps, 1Mbps or 2Mbps (depending on how much you pay). As with ADSL the service is asymmetric, upload speeds are lower – typically 128 or 256Kbps. These download and upload speeds are maximum speeds and are not guaranteed. This is because just like ADSL, cable is a contended service, but in a different manner to ADSL. Because the cable industry’s network has a bus topology, a number of households in the same area share the bandwidth of the same physical piece of cable. So the number of simultaneous users could affect actual speeds. Most cable services have a contention ratio of between 15:1 and 20:1.

Review Question 6 Name two common media found in most houses that could be used by a computer to receive data.

Review Question 7 Describe Shannon’s communication system model, and how it applies to:

(a) a voice telephone conversation;

(b) a computer connected to the Internet via a modem, where the Internet access is obtain from an Internet service provider.

Review Question 8 Describe packet switching. Can you predict through which countries data will travel, or through which countries part of a message will travel?

What implications for regulation and policing of the Internet result from the way packets are routed?

Review Question 9 What is the difference between a communications channel and a communications medium? Name three common media for computer networks, and rank them from lowest to highest bandwidth.

CHAPTER 50

CHAPTER 50 PAGE 9

3 Local area network communicationsSo far you have learnt about the media that are involved in computer communications, about some of the hardware required, and about dealing with ‘noisy’ communications channels. In this section we shall go one step further and consider the concepts needed when designing a network. We return to the subject of LANs prior to considering these ideas as they are used in the Internet.

You should now read the remaining pages of Section C of Chapter 11 in the Course Book, the subsection labelled ‘Ethernet and Token Ring LANS’, returning to this printed text at this point when you have finished.

A few points from the reading are discussed below.> The computers of the Internet adhere to a common set of protocols (rules).

The most important protocol suite that allows networks to cooperate with one another and exchange information is TCP/IP. This is a packet-switching protocol suite.

> An intranet is the name given to a network based on the TCP/IP protocol suite, accessible only by the organisation’s members and others with authorisation. The intranet’s Web sites will look and act just like any Internet Web site.

> The basic components of an extranet (an extension of an intranet) are a constant Internet connection via a router, an HTTP server, a firewall and the essential data and files.

> LANs can be based on a variety of technologies, but as long as the software layers that access the network hardware implement the TCP/IP protocol suite, a LAN can be widened to exchange information with the Internet.

Review Question 10 What hardware components are required to create a LAN? What software components are required?Review Question 11 Name the two most popular local network methods. Describe how each works and discuss the relative advantages and disadvantages of each.

Review Question 12 Which network protocol suite is the basis for communications on the Internet?

Exercise 2 Once a LAN is operational how might the choice of TCP/IP as the network communications protocol suite affect how the network could connect to other networks?

4 The InternetWe now briefly review how packets are routed through the Internet. This is a prelude to the practical work you will carry out when you are connected to the Internet.

Two characteristics of the Internet that provide its robustness and the way it operates are listed below.> There is no central controlling computer – each computer has essentially the

same authority.

Course Book

TCP stands for transmission control protocol. IP stands for Internet protocol.

Your Course Book reading in Chapter 3 included Chapter 8 (The Internet).


CHAPTER 50 PAGE 10

> It should be possible to deliver information between any two computers, even if some of the intervening computers have failed. In other words, there must be a number of routes between any two computers. It is therefore acceptable for data to travel by a non-direct route to avoid problematic links.

Figure 1 depicts a simple Internet-style network of six linked computers. Each computer is linked to at least two other computers. Each of the computers (and any other device such as a printer) is known as a node. None of the computers needs be close to any another: they might be in the same building or in different cities.

-*0,

Figure 1 A simple Internet-style network

The Internet itself is made of several hierarchies of network. There is a central backbone which is dedicated to moving information at very high speeds across large distances; below this level are various slower networks that serve smaller geographical areas and are themselves ‘parents’ to local networks such as those in offices and universities. However, all these networks use a standard suite of communications protocols so that information from one network can be passed across to another network.

Consider the case of email on the Internet. In Figure 2 and Figure 3, person A (bottom left) sends a message to person B (top left). The message is split up into small units called packets. These are labelled with the addresses of the sender and the recipient, and with information telling the recipient’s computer how to reassemble the message from the packets. Packet switching allows multiple users to send information through the same physical network.

?y#4&c*2*&06$

>y#5&0%&4$

Figure 2 A route through the network

See Figure 8-3 in the Course Book.

Email in FirstClass does not work this way. FirstClass is a local network system. It is a closed system in that it only knows about a specific group of users. It can be accessed from the Internet via a ‘gateway’ and has been set up at the OU so that mail can be sent to and received from users not registered on FirstClass.

CHAPTER 50

CHAPTER 50 PAGE 11

?y#4&c*2*&06$

>y#5&0%&4$

Figure 3 An alternative route through the network

In Figure 2, the packets are shown following the most direct route between the two computers, but they travel through a third computer which has nothing to do with A or B. Not all the packets of the message need travel by the same route; indeed, it is quite usual for each to take a different path through the network. However, they should all eventually reach their destination where they are recombined into the original message. These are just a couple of the possible routes a packet could take through this simple network. Any route that begins with the sender and ends with the recipient could be used.

The route chosen is determined by ‘name servers’ and ‘routers’. Name servers, as the term suggests, determine if the destination address of a packet is valid. The addresses used (such as www.open.ac.uk) are chosen purely for convenience. Computers use numeric IP addresses (also known as IP numbers)for communication, the name server translates the name typed in by the user into a suitable IP address. For example, consider a packet being sent to Bob who works in the coffee bar at Big University in America. Bob’s address is [email protected]. The address is ordered from the most general part to the most specific. First, the name server on the sender’s computer makes a request across the Internet to the computer that holds the addresses of all American universities (which all use .edu domain name at the end of their addresses) asking for the IP address of big.edu. Assuming that big.eduexists, the name server for the .edu domain responds with the IP address for the name server at Big University. The sender’s machine then makes a link to the name server at big.edu and requests the IP address of the coffee shop computer used by Bob. The big.edu name server will then respond with the IP address of the coffee shop. The packets can then all be addressed correctly and sent onto the network.

Although this may seem complicated and inefficient, most name servers store frequently used addresses on the sender’s computer and so may not have to make requests from the .edu and big.edu name servers.

Routers are used to direct packets across the networks. Each network has at least one router which is connected to two networks – the local one and an adjoining one. A router first determines whether or not the message is being sent to another computer on its own network and, if it is, the router sends it straight to the destination computer. If the packet is being sent to another network, the router must examine the destination address of the packet to work out what to do next.

The term name server is often written as one word –nameserver. The full name is domain name server or DNS server, where DNS is the abbreviation for domain name system. You may see the abbreviation DNS in error messages if your ISP’s name server malfunctions. The domain name system is the naming scheme of the ‘dotted’ decimal IP addresses.


CHAPTER 50 PAGE 12

A router stores the IP addresses of known computers in a special structure known as the routing table, which not only contains the routes to various sites on the Internet, but also information about the speed and status of the routes. The routing table is maintained by the router computer and is constantly updated to ensure that it holds the most frequently used addresses, and information about changes in the external network.

If the address of a packet appears in the routing table, the router allocates the full path – that is, all the nodes it will traverse from the source to the destination –and places the packet on the network for delivery. The routes are programmed to pass the packet to another router at a higher level in the network if the destination address is not in the table. This higher level router will then consult its routing table for the destination, and if the address is there, the packet is sent on; if it is not, the router sends the packet to an even higher level router.

If, for any reason packets cannot be passed from one computer to another, they can either be stored or passed to another router by means of a different path in the network. If the packets are stored, the original router will make an attempt later to send the information.

Packets have an expiry time beyond which no further attempts are made to transmit them. When the packets expire, they are returned to the sender along with an explanation of why the message was not delivered – such as the destination computer had failed or the given destination address was incorrect.

The power of the Internet lies is its ability to reroute packets. They can be rerouted either because a link has failed or to relieve congestion on a part of the path. As Figure 4 shows, it is still possible for A to send to B despite the loss of one of the computers in this network. In the case of the Internet, many of the nodes can be out of service with minimal disruption to traffic.

?y#4&c*2*&06$

>y#5&0%&4$

Figure 4 A possible route through the network when a computer has failed

Before the online practical session, we shall very briefly sketch the role of the protocol suite that makes the Internet work. They are so bound together that they are usually thought of as one protocol: TCP/IP. They are not, and TCPand IP do different jobs: IP is responsible for packets and their routing, whereas TCP is responsible for ensuring that information gets from one point to another. However, we usually ignore the different roles they play.

There is a clear distinction between TCP/IP and HTTP. TCP/IP is the suite of communications protocols that allows packets of information to be delivered across the Internet, whether the information to be transferred is a file from an FTP site, an email message or a Web document. HTTP is the protocol for Web

CHAPTER 50

CHAPTER 50 PAGE 13

documents. HTTP relies on TCP/IP. Every hyperlink and some graphical images require an HTTP request to be sent to a Web server via TCP/IP and the result (typically a Web document) to be delivered to the client Web browser that issued the request.

You should now go to the M206 Web site and to the Web page for this chapter, returning to this printed text at this point when you have finished.

Review Question 13 List three reasons why you might retrieve a Web page from a computer in the same country at less than 1 Kbps, even though you have a 33.6 Kbps modem? Is it possible that you might be able to retrieve a similar page from a distant country at 3 Kbps during the same time period?

Review Question 14 What is the function of a name server? If none of the name servers is working, could you still connect to a Web page? If so, how?

Review Question 15 What protocol or protocols are involved when you click on a hyperlink in a Web page that leads you to another Web page?

5 Other networksJust as not all computing problems can be solved by personal computers or even by linked personal computers, so not all networking problems can be solved by attaching computers to the Internet. Let us return to our example of the airline introduced in Chapter 48 to examine the networking requirements for such organisations and how those requirements can best be met.The perception (and, to a large extent, the reality) of the Internet is of many individual computers linked together in LANs which are, in turn, linked into WANs. The services provided (such as email, bulletin boards and Web sites) are primarily public services. Email can be considered to be an analogue of the telephone (a one-to-one connection for communication by electronic means) or of the postal system (a one-to-one asynchronous communication). Bulletin boards, as you would guess from the name, are analogues of public notice boards for the free posting and display of information. The Web is more complex, but could be seen as the analogue of a public library with many reference works ranging from telephone directories to the kind of directory that contains advertisements, contacts and informative articles for members of a particular organisation, such as the International Air Transport Association (IATA).

5.1 A dedicated network

Our airline, though, is in the business of selling seats on flights. To do this, it has reservations agents at sites around the world. The job of each agent is to answer telephone enquiries of the nature of: ‘I need to fly myself and my business partner from London to Los Angeles on February the 3rd, returning to London on February the 11th – what have you got?’ In addition, there are reservations agents at the major airports served by that airline. There are also staff who run the check-in desks – accepting tickets, labelling baggage and issuing boarding passes. The sole focus of all these people is to process the enquiries and

M206 Web site


CHAPTER 50 PAGE 14

transactions necessary to find out about and sell seats, assign seats and board baggage. The exchange between a customer and a reservations agent should take only a few minutes, and be conducted smoothly and efficiently. Few customers would use an airline for a second time if their records were lost or they were kept waiting unnecessarily. Thus, the equipment agents use has a single purpose.

The equipment is not a computer, but a dumb terminal: a visual display unit (VDU) and keyboard connected by a ‘dedicated network’ to a host computer at a central (probably head office) site. The terminal is ‘dumb’ in that it has limited processing power; it can only process simple commands for updating the screen and cannot run any other software. The keyboard is specialised to accept short commands to initiate a particular transaction. The agent is usually trained to act as an adjunct to that part of the enquiries and reservations system he or she causes to execute – able to fill out verbally the information that is probably presented on the screen in a rather terse coded form. Why might such a system need a dedicated network?

No doubt you have had the experience of waiting for access to a Web site or for the data that makes up a Web page to be transferred to your computer. While there are many schemes for speeding up these processes, they can still be slow at times for the following reasons.> At least one link in the interconnection of the networks is, for technical

reasons, slow (many users blame their modems, but the slow link can be anywhere in the system).

> A site can accept only a finite number of simultaneous users and the site itself is popular and has many users accessing or waiting to access it. (This is a bit like getting an engaged tone on the telephone, but, instead of hearing the engaged tone, your call is queued until someone ahead of you disconnects.)

> At least one link in the interconnection of the networks has a large amount of traffic traversing it (there is congestion; this is similar to a section of road being congested with traffic).

The airline’s reservations agents and its customers cannot wait for slow links, for congestion to clear, or for a desired service to become free. So the airline eliminates these potential and often real problems by having a dedicated network: one that only it uses. Therefore an airline will lease communications lines (permanent connections) that only it can use. Within the limits of the physics of communications lines, the airline is then in complete control of its network and the speed of responses that agents will get to their transactions.

Later we use the word terminal to mean dumb terminal.

A dedicated network is a network that is private to the company using it and has leased (permanent) communications lines.

CHAPTER 50

CHAPTER 50 PAGE 15

The airline’s terminals are geographically clustered: some at airports and some at reservations centres. And no matter how quick an agent is, one agent cannot keep a communications line fully occupied. So a scheme used in situations like this is to ‘concentrate’ traffic locally. Each cluster of terminals (up to a given number) is attached to a concentrator, a device that combines the transmissions from a cluster of nodes. The concentrated data is sent from the concentrator to the host computer in a single burst of traffic. Such a device has minimal processing abilities in that it collects the transmissions as they arrive and sends them without any further processing. The software for the system runs on the host computer where replies are formulated. These are sent in a single burst of traffic back to the correct concentrator, which sorts out which responses go to which terminal. This all happens over and over again within fractions of a second. Hence a transaction between one agent (and his or her terminal) and the host computer should take no more than a few seconds. The amazing thing is that this is happening with perhaps 15,000 terminals and agents concurrently. The speed of the host computer’s processor and the communications links means that each agent at a terminal works as though he or she has sole use of the host computer.

Review Question 16 What do you think is the most important factor in deciding to run a dedicated network?

If time is important to an airline’s reservations system, it can be literally vital in any application that is safety-critical. Any network involved in safety-critical applications must be able to transmit information without failures and extremely quickly. For example, an air-traffic control system at an airport will, for purposes of safety and backup, have two or more computers linked together in such a way that, if one fails, the other can take up the task immediately. (The connection between them must be of a nature to provide almost instant transfer of data, and support instantaneous fall back to the reserve system. The connection must also be duplicated so that if one channel of communication is disrupted – for example, by being disconnected by accident – the other will remain intact and functioning). But air-traffic control systems in neighbouring zones may also have to pass information between themselves; for example, as a plane leaves one zone and enters another. Thus such a network must be dedicated, robust and not introduce unnecessary delays.

Exercise 3 Can you think of any other examples where a dedicated network might be important?

5.2 Servers

Airlines also sell their seats through travel agencies. The situation of a travel agency is different from the airline’s own reservations system. For one thing, most people who use a travel agency visit in person (since dealing directly with an airline usually involves using the telephone to call a reservations number, although some airlines do have reservations offices in major cities that are similar to a travel agency). Most travel agencies encourage their visitors to feel relaxed and to browse – you may go to one to enquire about and purchase a business

The Course Book does not distinguish between a hub, a concentrator and a MAU. In the context of a LAN they are synonymous. Strictly, a concentrator combines multiple channels onto a single transmission medium in such a way that all the individual channels can be simultaneously active. For example, ISPs use concentrators to combine their dial-up modem connections onto faster T1 lines that connect to the Internet.

Concurrency refers to events that overlap in time. You met this idea in the context of multitasking in Chapter 48.


CHAPTER 50 PAGE 16

flight, but you might also pick up a holiday brochure or buy a bargain holiday advertised in the window. There are chairs so that the customer can be seated; and the customer is not paying for any telephone calls. The travel agency deals with many airlines and many holiday companies. Therefore the computer systems that you are likely to see in a travel agency are somewhat different in look and in purpose from those that you would see in an airline’s own reservations centres.

Again, the travel agent is likely to work with a dumb terminal, but it is likely to be connected to a server (probably located at the agency), providing a service to the terminals by sending requests for information to the appropriate central site computer (which is likely to be one of several similar computers such as one belonging to an airline and one belonging to a holiday company) and by formatting the information that comes back as responses to the requests. The server is fulfilling the same function as the servers referred to in Chapter 3, but here there are terminals connected to the server rather than personal computers. The travel agent cannot be as specialised as the reservations agent, so the system has to provide all the information without depending on the agent’s specialist knowledge. The terminal’s display is often colourful and may include simple graphics (such as the silhouette of a palm tree). The design of the user interface of the terminal is dominated by the need for ease of information location, especially as it is likely that the customer is going to look at it too.

Because the customer is seated and is encouraged to relax and browse, speed of response from the particular central site computer is somewhat less important than it is in the airline’s dedicated reservations system. So the server in the travel agency may not be connected by dedicated lines to the various holiday companies and airlines. The travel agency may choose to pay for time on a telecommunications company public data network (a network like the telephone network, but it carries very little voice traffic and is intended primarily for data traffic), and send information back and forth in small packets, rather than in bursts. This type of network is not dedicated and may be used by any company that applies for access. It is public and is shared with other businesses sending and receiving digital data traffic. The network is managed by the telecommunications company and the agency will pay for the time that it uses. The main links are likely to be through fibre-optic cable, but local links can be of any type (for example, radio, fibre-optic or unshielded twisted pair). It is slower than a dedicated network, but generally the service such a network provides is not subject to unplanned congestion, site unavailability or very slow links.

Review Question 17 Why does the system that a travel agent uses differ from that used by a reservations agent?

5.3 Commercial transactions on the Internet

Recently, airlines have also made it possible for users of the Internet to make enquiries about flights and to make reservations using a credit or a debit card. The user may first use a search engine to find the appropriate URL. This is then used to connect to the airline’s Web site. The pages of many of these sites look like a form with blanks to fill in. The user fills in the two cities he or she wishes

CHAPTER 50

CHAPTER 50 PAGE 17

to fly between, the date on which he or she wishes to fly and a time (arrival or departure). This enquiry is then submitted to the airline’s central site computer by way of Internet connections, and a response is formulated and returned to the enquirer. If the airline allows the user to reserve a flight (rather than simply make an enquiry), the user then enters a confirmation of the acceptability of the proposed flight and the number of seats. A price should be given, at which point the user enters a credit or debit card number, an address to which the tickets are to be sent, and any other requested details.

The airline (or indeed any other company selling products or services on the Internet) is making a number of assumptions here. One is that the user is not in a great hurry to obtain tickets and to fly. Another is that the customer will not be unduly bothered by the delays inherent in using the Internet as a medium for communication with the airline. A third is that the customer can pay by a means that is acceptable for transmission over a network: at present this is usually by credit or debit card. A less obvious assumption is that the transaction, involving as it does the exchange of credit or debit card details, can be made secure. It is necessary to encrypt such sensitive data so that a snooper cannot readily obtain the number and use it in making unauthorised transactions. The techniques available for encryption and decryption caneasily occupy a whole textbook, but some aspects of security are covered in the next section. Briefly, encryption is the process whereby a message is encoded into some form to make it unreadable; decryption is the process whereby an encrypted message is processed to make it readable.

Increasingly companies like airlines are finding that offering services on the Internet, whether limited to information or also allowing selling of goods and services, is good business. However, there are some problems.> There is the need to provide security through encryption and decryption,

and perhaps through authentication (the technique by which one partner in a communication verifies that the other partner is not an impostor) before any exchange of sensitive information. With the invention of public key cryptography another process known as a ‘digital signature’ is possible. A digital signature is much like a hand signature in that it provides proof that you are the originator of the message.

> Public access to your Web site, if you are a large company, can swamp your staff with additional work. United Airlines allows anyone to email it, or to connect to its Web site and leave a message, with the result that the company deals with around 800,000 such messages every week. These have to be read, sorted, and forwarded to the appropriate department. After replies have been sent, the messages must be archived for future reference (for example, when a customer telephones to say that he or she has not received a reply to an email message sent a week ago).

> Traffic on the Internet is increasing very rapidly as more people use it and more computers are connected to it. In places in the United States it is now becoming increasingly difficult to get a line to make a telephone call because of the congestion caused by digital traffic from the Internet and particularly from multimedia sites on the Web. The infrastructure of network connections, routers and lines needs to be robust enough not to fail, and capacious enough not to suffer from congestion. The ability to cope physically with the increase in traffic is rapidly falling behind the growth in use, creating frustration in users.

Most Web browsers and ISPs support one of the major security protocols such as secure socket layer (SSL). For example, on Netscape Navigator, if the transmission between browser and server is not secured, the key icon is split in half or the padlock icon is open.

If you want to sign a message with a digital signature, you pass the message through a mathematical function (a hash function) which provides a summary (hash code) of the message. This summary is unique for every message and is much like a fingerprint. You then encrypt this hash code with your private key and attach the code to the end of your message. This attached code is known as a digital signature. The addressee can then verify that the message was sent by you by decrypting the digital signature, using your public key, to get the hash code. The addressee then passes the received message through the same hash function. If the two hash codes are the same, then the message was sent from you and was not altered.


CHAPTER 50 PAGE 18

A number of credit companies and banks have introduced digital money(ecash) systems. These systems purport to make online purchasing safer and easier than using credit cards as the transactions involve the transmission of money tokens that are to the value of the product or service requested rather than the transmission of the buyer’s credit card details. To use digital money both the shopper and the seller must have online accounts with a bank that issues digital money. The shopper can then download virtual notes or coins (more generally tokens) from their electronic banker and hold them in a virtual wallet on their computer’s hard-disk. To make a purchase, the shopper encrypts, and electronically sends tokens to the correct amount to the seller who then forwards them to the issuing bank. The bank then checks that they are valid tokens and that they have not already been spent. If the tokens pass both tests, the corresponding amount is immediately credited to the seller's account. An advantage to these systems over credit card transactions is that ecash is just like using bank notes or coins, your identity is not necessarily revealed when you make purchases, it has the same privacy as paper cash. This means that a shopper's buying patterns and details cannot be stored in some database for later use in a direct marketing exercise by some company.

Review Question 18 What are the requirements for using the Internet for a commercial transaction?

6 SecurityMany users believe that computer security involves protecting oneself only from vandals and criminals (often incorrectly and pejoratively referred to by the term hackers in the popular press, although the programming community prefers the term crackers and reserves the term hackers for skilled programmers). However, computer security extends to ordinary people as well: sometimes you need to be protected from yourself (accidentally deleting all your data), from so-called ‘acts of God’ (fire, flood and earthquake), and from other unforeseeable events such as a power failure, a power spike or a power surge when you have not saved your document.

You should now read Chapter 9 in the Course Book, returning to this printed text at this point when you have finished.

Review Question 19 What is the difference between a Trojan horse, a virus and a worm? If you are not connected to a network, which of these are you at risk of receiving?

Review Question 20 How can you protect your sensitive data from being read by others, even when they gain access to your computer?

Review Question 21 Name three ways in which you can lose data other than by the malicious behaviour of others.

Course Book

CHAPTER 50

CHAPTER 50 PAGE 19

ReviewThis section summarises the concepts introduced in this chapter and those revisited from previous chapters.> A local area network (LAN) is a network of computers located within a

relatively limited area. A network covering a large geographical area is a wide area network (WAN). For reasons of speed, security and reliability, a business may set up a private communications system within the company with dedicated lines to outside partners.

> Each device on a network requires a network interface card (NIC) of a chosen type. Connection can be via cables or be wireless. Server computers provide services to the client workstations.

> Protocols are languages (rules) computers use to communicate with other computers and with devices such as printers and modems. Protocols can be stacked (layered) on top of one another.

> The Internet is a worldwide network of networks. An intranet is a local private ‘Internet’ that uses Internet protocols.

> An Internet message is broken up into packets or segments, transmitted and finally reassembled. This process involves name servers and routers. Every computer on a network has a unique IP address. Routers store addresses in a routing table.

> Data is transmitted across the Internet using a suite of communications protocols called transmission control protocol/ Internet protocol (TCP/IP).

> Hypertext transfer protocol (HTTP) is the protocol for Web pages and specifies operations strictly for the Web (such as hyperlinking) and uses TCP/IP. Notionally, below HTTP is TCP, which enables the reliable sending and receiving of blocks of data. Below TCP is IP, the language that allows computers to communicate over the Internet, addressing the small data packets so that routers know where to send them.

> Vandals may try to corrupt data and make a computer inoperable with tools that include viruses, Trojan horses and worms. Every computer user should be aware of these problems and take all possible measures to protect data and other important items of software.

> Encryption and decryption can be used for the transmission of sensitive data across a network.

Objectives

You may like to check that you can meet the objectives listed below.> Describe how an Internet message is broken up, transmitted and then

reassembled, including the roles of name servers and routers.> Take all possible steps to prevent loss of data.> Explain how congestion might occur on the Internet.> Give some of the reasons why the Internet is not suitable for all commercial

services, and outline the operation of commercial networks.


CHAPTER 50 PAGE 20

Solutions to ExercisesExercise 1 If your workstation had to search the database it would have to send requests across the network to the server to retrieve each record to see if it matched the desired criteria. If the search request were sent to an applications server, then the search could be performed locally and only the result returned across the network. Since accessing a local disk is always faster than access over a network, this would result in a much faster return of the answer.

Exercise 2 Once a LAN is operational, you may want to connect it to other networks (that use the same or different protocols) or even to the Internet. If you choose TCP/IP as your network protocol suite then you can connect your local network to the Internet by attaching a ‘router’ to sit between you and the other network.

Exercise 3 Power station control, plant control and factory systems are a few of the many examples.

As the name suggests, a router decides where to send the traffic and directs the transmissions towards their destinations. Routers are discussed in detail later in the chapter.

CHAPTER 50

CHAPTER 50 PAGE 21

Solutions to Revision QuestionsRevision Question 1 The different ways by which one person can communicate information to another are:

> sending an email message;

> putting a Web page on the Internet;

> posting to a conference;

> posting to a mailing list.

The most efficient way to communicate information to an individual is to send the person an email message – provided the recipient has an email account.

If the person with whom you wish to communicate does not have an email account, but can access the Web, then you could put the information on a Web page and advise the person of the URL. However, unless access to the page is restricted, the information would be available for everyone to read.

A relatively inefficient way to communicate the information would be to post it to a conference. This would not be private because it would be available to everyone who had access to the conference. It would also be considered poor etiquette as other users might waste their time reading the message.

The most inconsiderate way to send the information to an individual would be to post it to a mailing list of which the person is a member, since you would be almost forcing each member to read your message. (On a conference, users have an option to read conference postings and typically skip those on subjects that do not interest them.)

Revision Question 2 If you were to encrypt your message before sending it and were to make sure that only the recipient had the key to decrypt it, then anyone intercepting it would not be able to read it. The most secure way to encrypt a message is to use public key cryptography.

Revision Question 3 No, there are many possible ways to connect two computers without having one continuous wire, including modems and telephone lines, cellular data modules, satellites, or an Internet connection where the method by which two computers are connected may not be known.

Solutions to Review QuestionsReview Question 1 A LAN (local area network) is usually restricted to a small geographic area such as a building or a campus. A WAN (wide area network) covers a much larger geographic area and may encompass cities, countries or the world.

Review Question 2 A LAN can allow users to share data on a file server, run applications from a file server, share printers and other expensive devices, and use a terminal to access a host computer.

Review Question 3 If two users write to the same file at the same time, the data from one user might be combined with the data from the other user, resulting in a corrupt file.

Review Question 4 Network standards for LANs include Ethernet, Token Ring, ARCnet, FDDI and ATM.

In one sense the Internet is a WAN, but such a vast one that we think of it more as a ‘network of networks’.


CHAPTER 50 PAGE 22

Review Question 5 A dedicated server can give the users fast access to files on its hard disk(s) and does not spend time doing other data processing. A server with peer-to-peer capability may be doing other tasks besides file serving and thus result in slower serving performance.

Review Question 6 The two most common media that computers could use are telephone lines and radio waves, although television cable is another possibility.Noise is defined as any interference that disrupts the communication process.

Review Question 7 Shannon’s model has a message source sending data to an encoder that transmits the data across the channel (noise sometimes disrupts a transmission) to a decoder. The decoder transmits the message to the receiver.In the case of a telephone, the sender is a person, the telephone encodes the speech and sends it along the telephone line (the channel), the telephone at the other end decodes it and presents it to the ear piece.

For a computer on the Internet, the sender is the computer, the encoder is the modem, the channel is the telephone line, the decoder is a modem at the office of the person’s ISP, and the receiver is one of the ISP's ‘remote access’ computers which will be on the ISP’s network that is attached to the Internet.

Review Question 8 Packet switching involves breaking a message into smaller pieces called packets. Each packet is sent independently to the destination, usually following different routes, the actual routes depending on congestion and available paths. Thus some parts of a message could be routed through one country and some parts through another.

Policing and regulating the Internet is not possible, since no one owns it and no one can guarantee the path data will take.

Review Question 9 A communications channel is a path along which signals can travel. A communications medium provides the link between the transmitting and receiving devices and so may contain one or more channel. For example, a coaxial TV cable is a communications medium that can carry many communications channels for separate TV transmissions, and an unshielded twisted pair telephone cable is a communications medium that can carry only one channel providing a path for audio signals.

From lowest to highest bandwidth: twisted pair and coaxial cable<radio waves and microwaves<fibre-optic cable.

Review Question 10 To create a local area network (LAN) you first require computers (both as workstations and file servers) and devices such as printers. Each of these computers and devices must have a network interface card (NIC). You then need something to connect the computers via their NICs, such as unshielded twisted paircable (UTP) or coaxial cable. Finally, you need to ensure that each computer and device on the network uses the same network protocols.

Review Question 11 The two most popular network access methods are Ethernet and Token Ring.

The way in which a network transmits the packets depends on the network access method used.

For Ethernet, before an NIC sends a packet it checks the communications lines of the network and waits if another NIC is sending a packet. When the network is not busy, the NIC broadcasts the packet to every device on the network – but it is only accepted by

Twisted pair is the type of wire used by a telephone company to wire a telephone between a house and its central office. Ordinary twisted pair copper wire telephone lines are not shielded.

CHAPTER 50

CHAPTER 50 PAGE 23

the device with the same destination address. If an Ethernet network is configured withseveral computers sharing the same piece of wire and two NICs send a packet at the same time, a collision may occur. Each computer must wait a random interval before retransmitting the packet, thus slowing down transmission.

For Token Ring, a special message (the token) travels continuously round the network. The token carries a signal to indicate whether or not it is busy, that is, whether or not it is available to carry a packet. Thus token-passing networks do not suffer from collisions.

Ethernet interface cards are cheaper than Token Ring interface cards, but an Ethernet has a maximum transmission rate of 10Mbps, while Token Ring maximum is 16Mbps.

Review Question 12 The suite of protocols transmission control protocol/Internet protocol (TCP/IP) is the most widely used and is the basis for communications on the Internet as well as on many LANs.

Review Question 13 You may get a poor transfer rate either because the server is overloaded by too many users trying to obtain a page from it, or the server is inherently slow and so retrieves and returns the Web page slowly, or routing of packets across the Internet has caused delays. Conversely, a distant server may be faster if it deals with fewer requests and the packets sent to the client may arrive by more direct or faster routes.

Review Question 14 A name server translates the human-friendly Internet name, such as www.open.ac.uk, to the IP address (number) such as 137.108.143.38. You could still connect to a Web page if you knew the IP number of a Web server (for example, using http://137.108.143.38/).

Review Question 15 When you click on a hyperlink you normally invoke the HTTP protocol. (Sometimes a hyperlink uses the FTP protocol and will initiate a file transfer, but the question asked about a link to another Web document.) The HTTP request will be to the server named the hyperlink (for example, www.open.ac.uk). It will be transported to the server using the TCP/IP protocol suite.

Review Question 16 Time. The dedicated network avoids those problems that cause delays; particularly delays from sharing a communications medium with other users when congestion may occur, routing packets and accessing the host computer.

Review Question 17 The reservations agent is dealing with the specialised system for a particular airline and has been trained to use that particular system. The travel agent may deal with many types of travel request and cannot be expected to be an expert in all the different systems.

Review Question 18 You, as a user, must be prepared for a delay in carrying out the transaction. You must be able to pay electronically, and there must be a secure way for you to do this. Review Question 19 A Trojan horse is a self-contained computer program which masquerades as a useful program, but contains malicious code. A Trojan horse spreads by users inadvertently passing the program on, believing it to be useful. A virus is more common than a Trojan horse and is much less visible. Viruses attach themselves to programs, or to operating or file system components. They can also exist in word-processor documents, if the word processor has a macro programming language. Viruses rely on program files (or data files in the case of word-processor macros) being passed on to other computers via network connections or floppy disks. Thus you can receive a Trojan horse or a virus even if you are not connected to a network. A worm is much rarer and exists as a self-contained program whose main goal is to find other computers connected to the network and transfer a copy of itself to these other

HTTP is not used when you link HTML documents on the hard disk of your personal computer.

Microsoft Word is a word processor that has a macro language in its most recent versions.


CHAPTER 50 PAGE 24

computers. Thus you are not at risk of receiving a worm unless you are connected to a network and running programs that allow remote users access to your computer. The most famous worm to date was the Great Internet Worm of 1988 launched by Robert T. Morris, Jr.

Review Question 20 If your data is encrypted using a code that only you know, then, even if someone steals your computer, the thief will not be able to read your data. The more digits used by the encryption scheme, the harder it is for the code to be cracked. The strongest encryption method currently known is public key cryptography which was largely invented by Whitfield Diffy. Although many governments and the military organisations do not want strong cryptography in public hands (because they cannot crack it), there is a version available for most types of computer called pretty good privacy (PGP) written by Phil Zimmerman.

Review Question 21 You could lose data through your own fault (operator error), such as by accidentally deleting a file or keying incorrect information; this is why it is important to design human–computer interfaces to minimise the possibility of data entry error. You could also lose data if the power supply fails before you save your document, although using an uninterruptible power supply (UPS) can prevent this by maintaining power for several minutes after a failure (this still does not protect you from someone accidentally pulling out the plug). A power spike or power surge could not only cause you to lose data as the power fails, but could also damage the computer itself when the very high voltage is applied for a short time. You might also suffer a hardware failure where a component fails (wears out) or malfunctions. If a component other than the hard disk fails while you are working, then the effect is similar to a power failure and you may lose only the data on which you are currently working. If the hard disk fails, you could lose all the data on the disk. Finally, you could lose data through unforeseeable events such as accidentally dropping the computer, theft or an ‘act of God’ (fire, flood and earthquake).

CHAPTER 50

CHAPTER 50 PAGE 25

GlossaryThis glossary should be used in conjunction with the Course Book glossary.

authenticate To verify the identity of a user or computer or person sending an email. Also used as a qualifier: an authenticated Web site requires a surfer to register his or her name and email address before entering. The related noun, authentication, refers to the technology that guarantees the recipient of an electronic message that the email came from a certain person.

backbone A part of a network that interconnects other parts of the network. In the context of the Internet, the high-speed communications links connecting the major hosts.

bandwidth A measure of the capacity of a network to carry data, usually expressed in bits per second (bps). Expressed as the difference between the highest and lowest frequencies in a communications channel.

client/server An architecture in which the workstation (client) is the requesting machine and the server is the supplying machine, both of which are connected via a LAN or WAN. A client can share files and access data stored on the server.

concentrator A device that combines multiple channels onto a single transmission medium in such a way that all the individual channels can be simultaneously active. For example, ISPs use concentrators to combine their dial-up modem connections onto faster T1 lines that connect to the Internet. Concentrators are also used in LANs to combine transmissions from a cluster of nodes. In this case the concentrator is often called a hub or MAU.

digital signature An electronic signature that cannot be forged. It is a computed digest of the text that is encrypted and sent with the text message. The recipient decrypts the signature and recomputes the digest from the received text. If the digests match, the message is authenticated and proved intact from the sender.

domain name The term may refer to any type of domain within the computer field, since there are several types of domains. However, it often refers to the address of an Internet site.

electronic cash Many schemes are in the trial stage. ‘Digital money’ is either downloaded as ‘digital coins’ from a participating bank into the user’s personal computer or a digital money account is set up with the bank. Either the digital coins or the transactions that debit the account are transmitted to the merchant for payment. All these transactions are encrypted for security.

encryption A method of keeping networks, databases and files private and secure. Mathematical algorithms are used to scramble and unscramble digital messages so that only their intended recipients can read them. An encryption algorithm mathematically transforms the bits into a stream of digits that seem random. Performing the transformation requires a secret key – which is also a random-seeming string of 1s and 0s. The more digits in this key, the more secure the protection.

Ethernet A high-bandwidth network standard. Maximum transmission rate is 10 Mbps. Collisions can occur, which necessitates packet resending.


CHAPTER 50 PAGE 26

extranet A private secure extension of a corporate intranet that allows an organisation to build a persistent network link with customers, vendors and business partners.

file server A computer set up so that other computers can access its hard disk as if it were their own.

file transfer protocol (FTP) A protocol in the TCP/IP family for copying files from one computer to another.

firewall The ‘wall’ of software that keeps unauthorised users or intruders outside a network. Sometimes it also keeps company users on an internal network from browsing the Web.

gateway A ‘translator’ computer that links two networks speaking different protocols. For example, email sent from an ISP (for example, U-NET) to the Internet passes through a mail gateway.

host computer Chapter 6 of the Course Book restricts this term to a computer servicing dumb terminals. A more general use of the term is for a machine on a network that provides a service or information to other computers.

hostname The logical name assigned to a computer. On the Web, most hosts are named www; for example, www.open.ac.uk. If a site is composed of several hosts, they might be given different names such as watt.open.ac.uk.

hub A common connection point for devices in a network.

hypertext transfer protocol (HTTP) The communications protocol of the Web. Web browsers use HTTP to connect to Web servers, and servers use HTTP to ‘talk among themselves’. The initials form part of a URL for a Web server; for example, http://www.open.ac.uk/.

Internet The worldwide information highway composed of thousands of interconnected computer networks. It is made up of large backbone networks and smaller networks that link to them. It functions as a gateway for electronic mail between various networks and online services. The Web is a facility on the Internet that makes possible almost instantaneous exchange of information. Internet computers use the TCP/IP protocol suite. The Internet is connected to computer networks that use various message formats and protocols; gateways convert these formats between networks so that the Internet functions as one large network. Utilities such as FTP and telnet are widely used to access the Internet.

Internet protocol (IP) Usually paired with transmission control protocol as TCP/IP. IP is the language that allows computers to communicate over the Internet, addressing the small data packets so that routers know where to send them.

Internet protocol address (IP address) A physical address of a computer attached to a TCP/IP network. Every client and server station must have a unique IP address. Client workstations have either a permanent address or one that is dynamically assigned to them each session. IP addresses are written as four sets of numbers separated by full stops; for example, 204.171.64.2. Each set of numbers is from 0 to 255. A router uses the destination IP address to direct a packet.

CHAPTER 50

CHAPTER 50 PAGE 27

Internet protocols For example, TCP/IP, PPP, SLIP and HTTP. Protocols can be layered. On the Web, for instance, notionally the top layer is HTTP which specifies operations strictly for the Web, such as hyperlinking. Below HTTP is TCP which ensures reliable sending and receiving of blocks of data. Below TCP is IP which provides the service of moving data packets across the Internet.

Internet service provider (ISP) The Course Book uses the term Internet access provider. An organisation that provides access to the Internet. Small ISPs provide service via modems and ISDN, while the larger ones also offer dedicated line connections. Some major commercial information services, such as Which? Online and AOL, provide Internet access but are still known as commercial information services or online services not ISPs. They generally offer databases, forums and services that they have originated, in addition to Internet access.

intranet A network that is contained within an enterprise. An intranet uses TCP/IP and other Internet protocols, and generally appears to be a private version of the Internet.

LAN A local area network (LAN) consists of two or more computers (probably including a server) or other devices (such as printers) connected (usually permanently) over some medium, all using the same protocols. A LAN may be connected to another network through a special network device, such as a gateway or router.

malicious computing The release of an intentionally harmful or misleading program, such as a worm, Trojan horse or virus. It can also include breaking into a computer to cause damage or to steal data.

media access unit (MAU) Either a hub on an Ethernet, or a device on a Token Ring network that physically connects computers in a star topology while retaining the logical ring structure. Similar to a hub.

name server Domain name system (DNS) software lets users locate computers on the Internet by hostname. A DNS server maintains a database of hostnames and their corresponding IP addresses. For example, a name server will convert the hostname www.open.ac.uk to 137.108.143.38. The domain name system is properly known as the ‘fully qualified domain name system’.

protocol A language computers use to communicate with other computers, and with devices such as printers and modems. Humans speak to their computers in programming languages. Computers talk to each other using protocols.

public data network A network established and operated by a telecommunications company for transmission of data.

public key cryptography A complex mathematical method to encrypt or secure digital communications. In contrast to single key encryption schemes, public key encryption uses two algorithmic keys: a public one to encode the date and a private one to decode it.

router A device that routes data packets from one LAN or WAN to another. Routers see the network as network addresses and all the possible paths between them. They read the address in each transmitted packet and make a decision on how to send it based on the most expedient route (such as traffic load, speed and line information).


CHAPTER 50 PAGE 28

server A computer that ‘serves’ stored data, files and/or processing power to other machines (clients) on a network.

Token Ring A network standard that has workstations wired in a ring, where each workstation constantly passes a special message (the token) on to the next. The workstation having the token can send a message. Transmission speeds are either 4 or 16 Mbps.

universal resource locator (URL) The address that defines the route to a file on the Web or any other Internet facility. URLs are typed into the browser to access Web pages, and are embedded within the pages themselves to provide the hypertext links to other pages. The URL contains the protocol prefix, domain name, subdirectory names and file name. For example, http://www.open.ac.uk/OU/CandW.html. To access a home page on a Web site, only the protocol and domain name are required. For example, http://www.open.ac.uk/ will access the home page of the Open University.

WAN A network covering more than one geographical site.

CHAPTER 50

CHAPTER 50 PAGE 29

Indexasynchronous, 13authentication, 17backbone, 10cable, 19

coaxial, 22fibre-optic, 16unshielded twisted pair, 22

coaxial cable, 22concurrency, 15congestion, 14cracker, 18decryption, 17dedicated network, 14digital signature, 17dumb terminal, 14encryption, 17extranet, 6fibre-optic cable, 16hacker, 18Internet protocol, 9Internet service provider, 5, 8, 22intranet, 6IP. See Internet protocolIP address, 11IP number. See IP addressISP. See Internet service providerLAN. See local area networklocal area network, 5, 21malicious computing, 5name server, 11nameserver. See name servernetwork interface card, 22NIC. See network interface cardnode, 10packet, 10, 16packet switching, 22path, 12

PGP. See pretty good privacypoint to point protocol, 6power spike, 24power surge, 24PPP. See point to point protocolpretty good privacy, 24protocol, 3

IP, 12PPP, 6TCP, 12TCP/IP, 4

public key cryptography, 24PGP, 24

router, 11, 20routing table, 12safety-critical, 15satellite, 21search engine, 16server, 16spike. See power spikesurge. See power surgeTCP. See transmission control protocolTCP/IP, 12terminal. See dumb terminaltransmission control protocol, 9Trojan horse, 23uninterruptible power supply, 24universal resource locator, 16unshielded twisted pair cable, 22UPS. See uninterruptible power supplyURL. See universal resource locatorUTP. See unshielded twisted pairvirus, 23WAN. See wide area networkwide area network, 21worm, 23

BLOCK VII - Arab Open University

Documents

Transcript of BLOCK VII - Arab Open University