Understanding web technology

You don’t need to be a genius to understand web technology, it’s simply a matter of understanding a few basic concepts

You don’t need to be a genius to understand web technology, it’s simply a matter of understanding a few basic concepts

Many businesses are forced to recruit web designers and programmers without really knowing what they want. They may know that they need someone to configure their server, but they may not know what the server does or what configuring it actually entails.

In order to ensure you are getting the right calibre of staff, you need to have at least a basic grasp of the subject. This solution will attempt to give you a basic understanding of various aspects of web publishing languages such as HTML, ASP, JavaScript, CGI and server configuration.

To view the Internet most people use a browser, your browser window is probably where you are viewing this page. There are many different browsers: the two most popular being Netscape Navigator and Microsoft Internet Explorer. Web browsers work by connecting over the Internet via modem or ISDN via a server or ISP to remote machines, asking for a particular document (or page) and then formatting the documents they receive for viewing on a computer.

To do this, web browsers use a special language called HTTP (HyperText Transfer Protocol). The remote machines containing the documents run HTTP servers. When an HTTP server receives a request for a page, it sends it back to the local computer for viewing through the browser.

Each document on the Web has a particular URL (Uniform Resource Locator). This tells the browser which server to go to to get the document. The syntax of the URL is simple to understand. The HTTP signifies that the language to be used is HyperText Transfer Protocol. The host name is the name of the server. For example, http://www.itnetwork.com would look for the IT Network's server. The path is the document requested from the server. This is not the same as the file system path; the server defines its root.

A more complex syntax is Protocol://host/path/extra path-info?query-info. Protocol being the protocol that connects to the site. So for FTP sites, the protocol would be ftp, for websites, the protocol would be HTTP. The latter part of the URL refers to optional information used by Common Gateway Interface (CGI) programs.

Web documents can take any form, however, the universal standard is HTML. HTML is a tag-based language that encodes the documents that make up the World Wide Web. HTML can be used to create formatted text that will retain its formatting once it is reproduced by a computer's browser. HTTP can also include images, sound animation and video clips. HTTP weaves together all the relevant elements of the page and describes how it should be represented on your browser. It can also link to other pages or sites (called hyperlinks) on the Web.

HTML 4.0 is the most recent and widely accepted version of HTML and includes what are called Cascading Style Sheets (CSS). CSS allows web developers to specify many of the repeated style characteristics (e.g. font, colour, and spacing) with a particular piece of HTML code (called a tag). This enables a quick, but consistent, look throughout the site. An organisation called the World Wide Web Consortium (W3C) develops HTML standards to ensure that they are uniform across the world.

HTML is made up of text, which is the content of any web page, and tags, which define the appearance and layout of that page. HTML is simply text with an outer <html> tag at the start and end of the document:



<title> Very Basic HTML document </title>



This is a very </i> basic <i> HTML document:




Each document consists of a head (signified by <head>) and body (<body>) tags. You use the <head> tag to give the document a title and to indicate other stylistic parameters that the browser should use when displaying the page.

Page contents are placed within the <body> tags, including the document control markers that advise the browser how to display the text. Graphics, sounds animation effects and hyperlinks can also be placed within the tags. HTML's simple tag structure makes it easy to understand and use. Each element consists of a tag name, which may or may not be followed by a list of attributes, all placed between open and closed brackets (< >). The simplest are <head>, <body> and <i>.

Related to HTML, is XML (Extensible Mark-up Language.) XML is a meta-language that allows you to develop your own document tags.

Due to the increasing demands for entertainment by the web community, several languages have sprung up which allow users to bring animation and dynamism to their pages. These include CGI, JavaScript and PHP. The first, CGI, is used to create user-driven applications. CGI allows the web server to communicate with other programs already running on the server. This enables web pages to be created 'on the fly' based on the data given by the user. This means that you can create search engines and surveys where, according to the data inputted by the user, the end result must be individually created.

JavaScript is an object-based scripting language. It is embedded in the latest versions of the popular browsers and allows executables to run from the browser. This means that rather than waiting for an animated page to download from the server, the animation can run from your browser, creating dynamic HTML content. JavaScript works with your browser to detect and react to instructions that happen as a document is being loaded, rendered and used. These are signified within HTML as <script> tag.

Unlike JavaScript, which is a client-side language, PHP is a server-sided cross platform scripting language. It is a way to put instructions in your HTML files to create dynamic content. Your web server then follows these instructions. This happens before the page appears on your browser. The web server sends the PHP code with the content that the code was written to produce.

Other programs used to create web pages include Macromedia's Flash and various different packages which all create HTML code to save users having to do so. Flash is an animation suite that allows users to play animation through their browser. It creates low bandwidth multimedia effects with vector and bitmap graphics, motion, MP3 audio and form input.

There are also a multitude of WYSIWYG ('what you see is what you get') page editing programs that allow people who don't have programming skills to create web pages by dragging and dropping elements (such as text boxes, pictures and Java beans) onto the page. WYSIWYG page editing programs then create the HTML code and export it as pages through a publishing setting.

The most commonly used software package is Microsoft Front Page, but there are a wide variety of other software packages that perform the same role. The advantage of using WYSIWYG page editors is that practically anyone can contribute to creating a company site. This means that businesses can involve more staff and, more importantly, users can update the site more often.

In between a PC accessing the Internet (generally referred to as the "client") and the server, is the network. The network uses Transmission Control Protocol (TCP) and Internet Protocol (IP) to transmit the data and find the relevant servers and clients. Clients and servers also use HTTP.

TCP and IP are both protocols. A protocol is a set of rules that govern the way two or more computers communicate with one another. Protocols have a duel existence. First, they are a text form for programmers to understand and can be used to develop communication between computers. Secondly, they exist as a code that only computers understand. Both forms have the ultimate purpose of specifying the precise interpretation of every part of every message exchanged across the Web or network.

We use protocols every time we need to communicate with another computer. If you use a networked printer, you will have used protocols to print this document. If you saved your work on a networked drive, you are using protocols.

TCP is a connection-oriented transport protocol that sends data as an unstructured stream of bytes. By using sequence numbers and acknowledgment messages, TCP can provide a sending node with delivery information about packets transmitted to a destination node. Where data loss occurs in transit, TCP can re-submit the data until it is successfully delivered or the operation times out. TCP can also discern multiple identical messages and discard them. TCP can monitor the flow of data from the sending computer and slow it down, as required, to avoid data loss.

IP (Internet Protocol) describes how servers recognise each other. IP transmits what are called datagrams over the network and reports errors in transmission. IP is responsible for fragmenting and re-assembling data with different maximum data unit sizes using IP addresses, globally unique 32-bit numbers that identify a particular server. These addresses are assigned by the Network Information Centre. Their uniqueness ensures that any IP Network can communicate with another, just by knowing its IP address.

An IP address is divided into three parts. The first part designates the network address, the second part designates the subnet address and the third part designates the host address. IP addresses are written in dotted decimal format, e.g. When you load a web page onto a browser, it is the IP address that appears in the bottom left hand corner of the screen, even though a URL, made of letters, is entered.

The information to create web pages, whether they are simple HTML or more complicated CGI scripts, JavaScript or PHP, is all held on the web server. There are several different types of web server. The majority of Unix-based web servers use Apache software. Apache was developed to provide optimum compatibility with different clients. When you configure your web server, you do so to give it the maximum web performance. This includes optimising the level of requests for pages it can handle without returning error codes. The latest servers are ten times faster than their predecessors.

Internet technology is a complicated business, but by understanding the basic concepts, businesses should be able to work closely with their staff to create great web projects. Understanding what a web server is, how web pages are produced and how computers communicate can help companies visualise their challenges and work productively towards solving them.

Rachel Hodgkins

Read more on Operating systems software