"mirror" DATABASE FOR PAPERS IN FTP ARCHIVES (1 March 1993) Thank you to those people who replied to Charles Wells' and my request for information about FTP archives of papers. He has handed his information over to me, and everything I have is in /theory/FTP-sites at theory.doc.ic.ac.uk Here again is my own entry in the database by way of a template. The meaning of the fields will be explained further down this message. package=TaylorP comment=Papers by Paul Taylor at Imperial site=theory.doc.ic.ac.uk remote_dir=/theory/papers/Taylor local_dir+papers/TaylorP # email=<pt@doc.ic.ac.uk> # phone=+44 71 589 5111 x 5057 # fax=+44 71 581 8024 # address=Dept of Computing, Imperial College, London SW7 2BZ, UK There follow some answers (not necessarily definitive) to some of the questions people have asked me about this. MACHINE READABLE. I would like to stress first the importance of observing the syntax. It is proposed that this database should be used by a program in batch mode, so if the syntax isn't right the program will choke - at night. Even when data is to be used by me as a human, I find it increasingly hard to keep track of its volume unless people make an effort to set apprpriate "Subject:" lines on their mail, keep information on one line, etc, so that I can use "grep" to search for things in my 6Mb of email per year. WHAT IS FTP? Having a personal FTP archive for your papers is like having a personal journal with no referees and immediate publication. This means that the constraints on publication, though very loose, are like those on publishers: if you make it hard for people to subscribe (eg by allowing your filespace to become disorganised or too large) then they'll stop. Subscribers can get single issues by manual interactive use of FTP, or standing orders by using the mirror program. MY SITE DOESN'T HAVE AN ARCHIVE. Then use someone else's. There are people from Cambridge, Darmstadt, Aarhus and other places who use the IC archive. WHAT'S THE DATABASE FOR? Even if automatic mirroring is not used, having a database in a machine readable format will make it much easier to find files. Maybe you will want to keep track of your immediate colleagues' work: in this case you can extract their entries from the database and set just those up to be mirrored. Although the information is of no use to the mirror program, this is also a sensible place to keep address information. A TYPICAL FTP SESSION. Here's how you can get my Cambridge PhD thesis, for example. machine% ftp theory.doc.ic.ac.uk Connected to beauty.doc.ic.ac.uk. 220 beauty FTP server (Version 6.14 Mon Nov 18 17:45:21 GMT 1991) ready. Name (theory.doc.ic.ac.uk:pt): anonymous ["ftp" will often do] 331 Guest login ok, send e-mail address as password. Password:pt@doc.ic.ac.uk 230-<some welcome message> 230-<put "-" at the beginning of your password> 230-<if these messages cause problems> 230- 230 Guest login ok, access restrictions apply. ftp> cd theory/papers/Taylor [set directory on remote machine] 250 CWD command successful. ftp> bin [binary mode for dvi (etc) files] 200 Type set to I. ftp> lcd ftp-imports/Taylor [set directory of your machine] ftp> hash [tells you how it's going] Hash mark printing on (8192 bytes/hash mark). ftp> get thesis.dvi [the file to fetch] 200 PORT command successful. 150 Opening BINARY mode data connection for thesis.dvi (907024 bytes). ##################### [five lines of these] 226 Transfer complete. local: thesis.dvi remote: thesis.dvi 907024 bytes received in 4.1 seconds (2.1e+02 Kbytes/s) ftp> quit 221 Goodbye. machine% WHAT ABOUT SOFTWARE? A lot of mirroring of software already goes on; for example the program which I propose to use was written to maintain a large archive of general software at Imperial (not to be confused with the TeX & Computing Theory archive which I maintain). For this reason I don't really consider it my job to get into the business of archiving software. Nevertheless, it does seem a good idea to add "papers/" to the local directory entries, so that there can also be a "software/" tree and a "conferences/" tree. CONFERENCE ANNOUNCEMENTS? It does seem reasonable to include those too, so I have added "papers/" to the local directory names for papers, so that we can have "conferences/" and "software/" too. Use the conference name where the author's name is appropriate for papers. JOINT PAPERS? Where do you put joint papers in your filing cabinet? I suggest choosing one of the authors, and then putting a cross reference (ie a short file which says "see BloggsJ for Bloggs & Smith") in the other directories. Personal bibliography databases should have complete entries under each individual author. If most of your work is done within a particular stable group, that can have an entry in the database as if it were a single person. ACCENTS & NAME CLASHES? It's not a good idea to use punctuation in filenames, or to make them too long. Clashes of both surname and initials within our community should be resolved by personal negotiation, ie agreeing to adopt initials, if necessary fictitious ones. Choose a mixed case alphabetic version of your surname and initials and stick to it. WHAT IS THE "mirror" PROGRAM? It was written by Lee McLoughlin in perl and uses the FTP protocol in much the same way as you would interactively, except that it's automatic and should be run outside the normal working hours of the sites concerned. You can get it from src.doc.ic.ac.uk as /packages/mirror/. It runs under Unix and is a bit shy of non-Unix FTP archives. Nevertheless, if some Unix site near you is maintaining an FTP archive of a wide range of papers and software, this is still of benefit to you even though you can't maintain your own copy on your non-Unix machine. HOW WILL THE DATA BE MAINTAINED? In the first instance I shall maintain it in the directory /theory/FTP-sites at theory.doc.ic.ac.uk. You may wish to mirror this directory, but I would advise against setting up anything too automatic for the time being. When I am satisfied that the database format is suitable and the data is correct for the mirror program, I shall ask the sites concerned to maintain their own databases, and use mirror to keep them up to date in my archive. In the long run, however, I think it would be better for each author to maintain this information in his/her own bibtex bibliography file (see /theory/bibliography/TaylorP.bib for instance) and then the mirror database can be extracted from this automatically. WHY THIS FORMAT? Per-author entries mean that you can mirror just the people you want rather than whole sites. As long as site managers make up their minds about directory names and stick to them, the information should only change radically when people move to other institutions (though phone extension numbers may change). It's sorted by site so that sites can maintain it and to make mirror/FTP mor efficient (it only logs in to each site once). A version sorted by author would be possible if someone wrote the program. Sorting by topic is completely impractical, though if people maintained their own bibtex files it would be possible to use "grep" to search by keyword. WHAT DO THE FIELDS MEAN? See "mirror" itself and the "defaults" at the end of the Imperial database for further explanation. "package" is a handle for the mirror program: you can ask it just to get a particular piece of software or, in this case, author's files. "comment" is copied to the mirror program log "site" is the Internet address of the FTP archive "remote_dir" is the directory on that archive "local_dir" is the directory into which you want to copy. I made a mistake in my original message: there should be a "+" instead of an "=". This means that this field is appended to the "local_dir" setting in the defaults so that you can set the root of your own archive (copy) tree as you think appropriate. Also, in order to allow conferences and software in the same databse, I've added "papers/" to these settings. The remaining fields are not recognised by "mirror" and so are commented out. "# email" is your email address in Internet format. Please put the actual address in angle brackets <> then you can add your name or whatever in front. "# phone" is your direct office phone number including extension and, where appropriate, secretary's name and extension. Please use international and not America format, for example London, New York and Paris are +4471, +1212 and +331 respectively, not 071, (212) or (1). "# fax" likewise. PLease specify if this is in a public area. "# address" postal address including country & code. Paul Taylor ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++