October 2022

Ordnance Survey 25 Inch Edinburgh Transcriptions

Project information page

decorative graphic illustrating this particular set of maps

The Edinburgh Map Transcription project took place between May-September 2022. Collectively, over 21,950 textual entries on the Ordnance Survey’s 25 inch to the mile maps of Edinburgh environs (1890s) were transcribed and categorised by volunteers.

This was a collaborative project between the Alan Turing Institute and the National Library of Scotland, as part of the Machines Reading Maps project.

The following sections provide further information on the Edinburgh Transcription Project:

Search the Transcriptions

illustration of the search interface

As well as being able to search on any street names, building names, and other place names, the following provide examples of real-world features which can also be searched and viewed as geographic distributions (with the number of features in brackets): acreage (5488), asylum (5), bench mark (2313), boundary (740), brewery (25), cemetery (14), chapel (30), church (216), conduit (11), crane (33), cricket (15), curling (14), dairy (11), dovecot (11), drinking fountain (92), flagstaff (14), football (11), foot bridge (145), foot path (72), gas station/works (22), grave yard (14), guide posts (76), hall (39), hospital (32), hotel (45), icehouse (10), inn (29), iron (45), laundry (12), letter box (73), lodge (233), man hole (18), manse (30), mile post (37), mile stone (64), mill (57), mooring (43), north british (38), nursery (plants) (38), old shaft (mines) (38), parcel number (4825), pavilion (25), pillar letter box (48), pit (coal, gravel, sand) (27), police station (10), post office (54), public house (116), pump (482), quarry (103), reservoir (26), school (131), sheepfold (76), signal post (366), sluice (148), smithy (38), spring (109), station (86), statue (15), sun dial (45), sunday school (18), surface level (1870), target (14), tennis (17), timber yard (19), tramway (36), urinal (5), waterfall (15), weighing machine (45), weir (47), well (175), works (89).

Please note that many features which today may be written as one word, were written as two e.g. Bench Mark, Mile Stone, Sun Dial, Foot Path, Grave Yard, Man Hole, so searching for 'Graveyard' doesn't find anything.

Purpose

This project aimed to transcribe all of the text on the Ordnance Survey's 25-inch to the mile mapping for Edinburgh environs (1892-94). Volunteers used a simple interface (Recogito) to draw boxes around names on the OS 25-inch mapping, transcribe the text, and tag or categorise it following a simple set of terms.

The primary aim of the project was to provide a detailed gazetteer of streets, buildings and names in Edinburgh from a century ago, to assist local and family historians. A secondary aim was to create an easy search interface of written features on the maps such as baths, drinking fountains, mills, public houses, signal posts, pumps, or wells, viewing the results as geographic distributions on a map. An important related aim has been to provide a test dataset for AI/machine learning approaches to identify text on maps, which are being actively developed. The OS 25 inch maps cover all inhabited areas of England, Scotland and Wales, and we are keen to encourage the wider extraction of text from these maps.

In order to cover the whole extent of the wider built-up area of Edinburgh today, our geographic area in this project was based on the County of the City of Edinburgh, which was in use as an administrative unit from the 1890s through to the 1970s. This area is shown clearly by the historic map coverage in our Edinburgh Transcriptions Map Viewer.

For details of how to add this OS 25 inch historic map layer into other software, please see our Download section.

Transcription Workflow

a. Initial Recogito transcription phase with volunteers

We used a customised version of Recogito to record the transcriptions. Recogito is widely used in the cultural heritage community as a collaborative platform for document annotation / transcripton. The Recogito 10 minute tutorial gives a good overview of Recogito; the Annotation Guidelines show our particular interface and transcription process in this project.

The initial 30,043 transcriptions - virtually all the textual content on the maps - were recorded by volunteers in the first two weeks.

These transcriptions were downloaded as .CSV files from Recogito, and combined into one file - the ‘raw dataset’.

As we felt that it was difficult to edit and correct the data in Recogito in bulk (edits could only be done at the individual transcription level), we took the decision to revise and update the dataset outside of Recogito.

The original raw dataset as a CSV file may be more useful for certain Machine Learning tasks, as the boxes are drawn around individual words (or individual letters where the spaces between the letters are wider than the height of the letter). However, the raw data has not been reviewed or corrected. A proportion of the original transcription fields are wrong, many of the 'expanded_transcription' entries are blank or incorrect, and tags may be incorrect too.


b. Revised dataset (main Gazetteer) for the NLS Edinburgh Map Transcriptions Viewer interface

We are very grateful to Vasilis Karaiskos, who wrote a Python script to convert the .CSV exports from Recogito into GeoJSON. The script also grouped words to together into one transcription where they had been grouped in Recogito (ie. converting the separate transcriptions for ‘GEORGE’, 'IV' and ‘BRIDGE’ in Recogito into ‘GEORGE IV BRIDGE’).

Comparing the raw dataset to the revised dataset
Comparing the transcription(s) for 'GEORGE IV BRIDGE': as 3 separate transcriptions in the raw dataset (left), compared to one combined transcription in the revised dataset (right).

This process was only partially successful, generating some grouped transcriptions without spaces between the words, and also only dealing with those transcriptions which had been grouped in Recogito. The editing stages below have corrected these grouping problems.

We are very grateful in turn to Richard Meats who revised and corrected this converted data:

  1. Grouped names without spaces between them had spaces added. Some double-spaces were removed.
  2. A few tiny polygons with no transcriptions (probably created in error) were deleted.
  3. Names running across sheet boundaries had new polygons traced and the text joined.
  4. Many mis-transcribed entries were corrected when spotted.
  5. Some incorrectly grouped names were split and others grouped.
  6. Some incorrect upper / lower case entries were corrected.
  7. expanded_transcription forms were standardised to allow keyword searching, including abbreviations and milestones. Names were changed to be Proper Case, not UPPER CASE.
  8. During this corrections process, 25 missed transcriptions were found and added into the Recogito and NLS datasets.
  9. Numbers were expanded to label them more specifically into Benchmarks, Surface Levels, Acreages, and Parcel numbers. As a large proportion of the Parcel Numbers and Acreages were grouped in Recogito, these were labelled as 'Parcel Number and Acreage' tags.
  10. Tags were completed and standardised.

Field List

There are two datasets: a raw and a revised dataset. There are a core set of fields in both datasets, and some fields unique to each dataset. Please read the Transcription Workflow for details of how the raw and revised datasets differ.

Raw dataset (Recogito download)

Field NameDescription
uuidThe unique ID for the transcription, generated by Recogito
transcriptionThe text transcription from the map
commentsA comment or expanded form of the transcription - for example, spelling out an abbreviation
tags'area', 'street', 'building', 'natural' or 'other', using WikiData URLs - see Tags below
sourceThe map section that the transcription was from. For performance reasons in Recogito, we split our geographic area into ten separate map layers.
anchorPolygon coordinates as lat/lon (EPSG:4326) in Scalable Vector Graphics (SVG) format
group_idThe unique ID for grouped transcriptions
group_orderSequential number of the transcription within a particular group_id

Revised dataset (main Gazetteer)

Field NameDescription
uuidThe unique ID for the transcription, generated by Recogito. For items generated from a group of raw transcriptions, this is the Recogito group ID
transcriptionThe text transcription from the map
expanded_transcriptionExpanded or standardised form of the transcription - for example, spelling out an abbreviation
tags'area', 'street', 'building', 'natural' or 'other' - see Tags below
sourceThe map section that the transcription was from. For performance reasons in Recogito, we split our geographic area into ten separate map layers.
WKTPolygon coordinates as lat/lon (EPSG:4326) in Well Known Text (WKT) format
NLS_IDThe unique NLS ID in the corrected dataset
areaArea of the transcription polygon in square metres
text'Y' to indicate the transcription was a textual one. Blank if numeric.
urlLink to the transcription in the Edinburgh transcriptions map viewer

Tags

We were keen to keep the tagging as simple as possible, and we used just five major categories. (In the raw dataset / Recogito download, WikiData tags were used for these categories, as indicated below)

Download

Gazetteer

We have made available the Edinburgh Gazetteer for onward re-use and download. These files use UTF-8 for character encoding. There are some extended characters - for example, a few "Depôts" and some middle-dot "·" decimal points.

The Gazetteer data is available in CSV and GeoJSON formats:

Please read the Transcription Workflow for details of how these two datasets differ, and read the fields section to explain the field names and content.

These datasets were taken on 18 October 2022.

Each of these links will open a file that can be saved locally. For the GeoJSON files, please change the file extension from *.js to *.geojson. Please see our guide on Opening Map Datasets in QGIS for advice on how to use the Revised dataset CSV and GeoJSON files within the free QGIS desktop software.

You can also view related details of the Edinburgh dataset and onward re-use, on our Data Foundry Edinburgh OS 25 inch transcriptions page.

Background mapping

You can also bring the Ordnance Survey 25 inch mapping into other software, following our Re-using georeferenced maps guide. The tileset URL for the Ordnance Survey 25 inch to the mile 1890s Edinburgh layer is:

https://geo.nls.uk/mapdata3/os/25_inch/edinburgh_1890s/{z}/{x}/{y}.png

This tileset URL allows the zoomable mapping to be added into QGIS, ArcGIS, or geojson.io so that the transcriptions can be seen on top of a background layer of Ordnance Survey 25 inch to the mile 1890s historic mapping.

Corrections

If you would like to suggest edits or corrections to any of the content of the Edinburgh Transcriptions dataset, please contact us at maps@nls.uk with the subject line 'Edinburgh Transcriptions Correction'. We need to know, at the very least, the uuid and the specific field content that needs to be corrected, following the Field List.