Ordnance Survey 25 Inch Edinburgh Transcriptions
Project information page

The Edinburgh Map Transcription project took place between May-September 2022. Collectively, over 21,950 textual entries on the Ordnance Survey’s 25 inch to the mile maps of Edinburgh environs (1890s) were transcribed and categorised by volunteers.
This was a collaborative project between the Alan Turing Institute and the National Library of Scotland, as part of the Machines Reading Maps project.
The following sections provide further information on the Edinburgh Transcription Project:
Search the Transcriptions
As well as being able to search on any street names, building names, and other place names, the following provide examples of real-world features which can also be searched and viewed as geographic distributions (with the number of features in brackets): acreage (5488), asylum (5), bench mark (2313), boundary (740), brewery (25), cemetery (14), chapel (30), church (216), conduit (11), crane (33), cricket (15), curling (14), dairy (11), dovecot (11), drinking fountain (92), flagstaff (14), football (11), foot bridge (145), foot path (72), gas station/works (22), grave yard (14), guide posts (76), hall (39), hospital (32), hotel (45), icehouse (10), inn (29), iron (45), laundry (12), letter box (73), lodge (233), man hole (18), manse (30), mile post (37), mile stone (64), mill (57), mooring (43), north british (38), nursery (plants) (38), old shaft (mines) (38), parcel number (4825), pavilion (25), pillar letter box (48), pit (coal, gravel, sand) (27), police station (10), post office (54), public house (116), pump (482), quarry (103), reservoir (26), school (131), sheepfold (76), signal post (366), sluice (148), smithy (38), spring (109), station (86), statue (15), sun dial (45), sunday school (18), surface level (1870), target (14), tennis (17), timber yard (19), tramway (36), urinal (5), waterfall (15), weighing machine (45), weir (47), well (175), works (89).
Please note that many features which today may be written as one word, were written as two e.g. Bench Mark, Mile Stone, Sun Dial, Foot Path, Grave Yard, Man Hole, so searching for 'Graveyard' doesn't find anything.
Purpose
This project aimed to transcribe all of the text on the Ordnance Survey's 25-inch to the mile mapping for Edinburgh environs (1892-94). Volunteers used a simple interface (Recogito) to draw boxes around names on the OS 25-inch mapping, transcribe the text, and tag or categorise it following a simple set of terms.
The primary aim of the project was to provide a detailed gazetteer of streets, buildings and names in Edinburgh from a century ago, to assist local and family historians. A secondary aim was to create an easy search interface of written features on the maps such as baths, drinking fountains, mills, public houses, signal posts, pumps, or wells, viewing the results as geographic distributions on a map. An important related aim has been to provide a test dataset for AI/machine learning approaches to identify text on maps, which are being actively developed. The OS 25 inch maps cover all inhabited areas of England, Scotland and Wales, and we are keen to encourage the wider extraction of text from these maps.
In order to cover the whole extent of the wider built-up area of Edinburgh today, our geographic area in this project was based on the County of the City of Edinburgh, which was in use as an administrative unit from the 1890s through to the 1970s. This area is shown clearly by the historic map coverage in our Edinburgh Transcriptions Map Viewer.
For details of how to add this OS 25 inch historic map layer into other software, please see our Download section.
Transcription Workflow
a. Initial Recogito transcription phase with volunteers
We used a customised version of Recogito to record the transcriptions. Recogito is widely used in the cultural heritage community as a collaborative platform for document annotation / transcripton. The Recogito 10 minute tutorial gives a good overview of Recogito; the Annotation Guidelines show our particular interface and transcription process in this project.
The initial 30,043 transcriptions - virtually all the textual content on the maps - were recorded by volunteers in the first two weeks.
These transcriptions were downloaded as .CSV files from Recogito, and combined into one file - the ‘raw dataset’.
As we felt that it was difficult to edit and correct the data in Recogito in bulk (edits could only be done at the individual transcription level), we took the decision to revise and update the dataset outside of Recogito.
The original raw dataset as a CSV file may be more useful for certain Machine Learning tasks, as the boxes are drawn around individual words (or individual letters where the spaces between the letters are wider than the height of the letter). However, the raw data has not been reviewed or corrected. A proportion of the original transcription fields are wrong, many of the 'expanded_transcription' entries are blank or incorrect, and tags may be incorrect too.
- The results of this stage was our raw dataset (Recogito download), available in the Download section below.
b. Revised dataset (main Gazetteer) for the NLS Edinburgh Map Transcriptions Viewer interface
We are very grateful to Vasilis Karaiskos, who wrote a Python script to convert the .CSV exports from Recogito into GeoJSON. The script also grouped words to together into one transcription where they had been grouped in Recogito (ie. converting the separate transcriptions for ‘GEORGE’, 'IV' and ‘BRIDGE’ in Recogito into ‘GEORGE IV BRIDGE’).

This process was only partially successful, generating some grouped transcriptions without spaces between the words, and also only dealing with those transcriptions which had been grouped in Recogito. The editing stages below have corrected these grouping problems.
We are very grateful in turn to Richard Meats who revised and corrected this converted data:
- Grouped names without spaces between them had spaces added. Some double-spaces were removed.
- A few tiny polygons with no transcriptions (probably created in error) were deleted.
- Names running across sheet boundaries had new polygons traced and the text joined.
- Many mis-transcribed entries were corrected when spotted.
- Some incorrectly grouped names were split and others grouped.
- Some incorrect upper / lower case entries were corrected.
- expanded_transcription forms were standardised to allow keyword searching, including abbreviations and milestones. Names were changed to be Proper Case, not UPPER CASE.
- During this corrections process, 25 missed transcriptions were found and added into the Recogito and NLS datasets.
- Numbers were expanded to label them more specifically into Benchmarks, Surface Levels, Acreages, and Parcel numbers. As a large proportion of the Parcel Numbers and Acreages were grouped in Recogito, these were labelled as 'Parcel Number and Acreage' tags.
- Tags were completed and standardised.
- The results of this stage was our revised dataset (main Gazetteer), available in the Download section below.
Field List
There are two datasets: a raw and a revised dataset. There are a core set of fields in both datasets, and some fields unique to each dataset. Please read the Transcription Workflow for details of how the raw and revised datasets differ.
Raw dataset (Recogito download)
Field Name | Description |
uuid | The unique ID for the transcription, generated by Recogito |
transcription | The text transcription from the map |
comments | A comment or expanded form of the transcription - for example, spelling out an abbreviation |
tags | 'area', 'street', 'building', 'natural' or 'other', using WikiData URLs - see Tags below |
source | The map section that the transcription was from. For performance reasons in Recogito, we split our geographic area into ten separate map layers. |
anchor | Polygon coordinates as lat/lon (EPSG:4326) in Scalable Vector Graphics (SVG) format |
group_id | The unique ID for grouped transcriptions |
group_order | Sequential number of the transcription within a particular group_id |
Revised dataset (main Gazetteer)
Field Name | Description |
uuid | The unique ID for the transcription, generated by Recogito. For items generated from a group of raw transcriptions, this is the Recogito group ID |
transcription | The text transcription from the map |
expanded_transcription | Expanded or standardised form of the transcription - for example, spelling out an abbreviation |
tags | 'area', 'street', 'building', 'natural' or 'other' - see Tags below |
source | The map section that the transcription was from. For performance reasons in Recogito, we split our geographic area into ten separate map layers. |
WKT | Polygon coordinates as lat/lon (EPSG:4326) in Well Known Text (WKT) format |
NLS_ID | The unique NLS ID in the corrected dataset |
area | Area of the transcription polygon in square metres |
text | 'Y' to indicate the transcription was a textual one. Blank if numeric. |
url | Link to the transcription in the Edinburgh transcriptions map viewer |
Tags
We were keen to keep the tagging as simple as possible, and we used just five major categories. (In the raw dataset / Recogito download, WikiData tags were used for these categories, as indicated below)
- #area - for all county names, district names, cities, towns and hamlets, as well as administrative divisions and populated settlements. Quite often, these names for administrative jurisdictions were written in CAPITALS - their fonts and details are shown on our OS 25 inch Characteristics Sheet. The map uses distinctive large text fonts to identify the main administrative area names, and a 'hollow' font for district names. Descriptions of jurisdiction boundaries (eg. 'Parly & Munl Bdy' or 'Und.' for 'Undefined') were tagged as '#area', as well as boundary labels (e.g. 'Parliamentary Boundary'). However, very detailed boundary location indicators (e.g. 'Centre of Wall', 'Root of Hedge') were tagged as #other. Parks, gardens, and sports grounds were also tagged as '#other'.
WikiData area: https://www.wikidata.org/wiki/Q56061 - #street - for all street and road names, but also for squares, courtyards, crescents or, anything that could be used as an address. Streets with a name in them such as 'Leith Walk' or 'London Road' were also tagged as '#street'. For simplicity, #street was assigned to anything named Terrace (rather than '#building') - in some of these cases the terrace name was used on a later map or the modern map as the street name. There were also some related cases where rows of terrace-like small properties used some other name (e.g. XXX Row or XXX Cottages) which were categorised as '#street' rather than '#building'. Footpaths, tow paths, Rope Walks, and a few named tracks were excluded.
WikiData road: https://www.wikidata.org/wiki/Q34442 - #building - for any built structure, rural or urban: palaces, hospitals, theatres, churches, manufacturies, bridges, workhouses, dovecots, kennels, icehouses but also windmills or warehouses. Named buildings such as "Mansfield Villa" or "Lyndoch Cottage" were also tagged as 'building', as were 'Mine' pit head facilities and 'XXX Works', as well as railway infrastructure such as 'Signal Boxes'. For simplicity, all the Railway Stations were tagged as '#building', even though the building part may be small for some. In subsequent editing, we adopted 'with a roof' as an additional qualification to the '#building' tag, and the 'built structure types' that weren't roof-based were categorised as '#other'. A few of the structure types not included as '#building' were bridges, canals, foot bridges, docks, jetties, harbours, monuments, piers, and yards.
WikiData building: https://www.wikidata.org/wiki/Q41176 - #natural - for all natural features, such as rivers, hills, mountains, creeks, bays, etc. Man-made features such as harbours were excluded, as well as gardens, parks, plantations, mill leads and races, reservoirs, and standing stones.
WikiData feature: https://www.wikidata.org/wiki/Q618123 - #other - for everything else, which includes:
- numeric text on the maps such as surface heights, bench marks, parcel numbers and acreages. These formed the bulk of the transcriptions, with 9600+ parcel-related items, 1880+ surface levels and 2310+ bench marks
- around 600 boundary location indicators ('Centre of Wall' etc)
- land-use cases: farmland, parks, gardens, nurseries, plantations, cemeteries, grave yards, outdoor sports venues such as 'Cricket Ground' or 'Tennis Courts', curling ponds and rifle ranges
- man-made structures not classified under #building, including: aqueducts, bridges, canals, docks, jetties, mill leads/races, monuments, mooring posts, piers, railway lines, railway junctions, reservoirs, signal posts, sluices, timber yards, tramways, valves, weighing machines, weirs, and viaducts
- various other things such as: conduits, drinking fountains, footpaths, fountains, high/low tide marks, man holes, mile posts, mile stones, named tracks, pumps, quarries, rope walks, shafts, sheepfolds, statues, stones, sun dials, taps, tow paths, troughs, wells
WikiData other: https://www.wikidata.org/wiki/Q55107540
Download
Gazetteer
We have made available the Edinburgh Gazetteer for onward re-use and download. These files use UTF-8 for character encoding. There are some extended characters - for example, a few "Depôts" and some middle-dot "·" decimal points.
The Gazetteer data is available in CSV and GeoJSON formats:
- Raw dataset (Recogito download) - CSV (10 Mb) from Recogito.
- Revised dataset (main Gazetteer) - CSV (7 Mb) and GeoJSON (12 Mb) versions of the revised dataset (the GeoJSON will open in a new window in your browser, which you can then save).
Please read the Transcription Workflow for details of how these two datasets differ, and read the fields section to explain the field names and content.
These datasets were taken on 18 October 2022.
Each of these links will open a file that can be saved locally. For the GeoJSON files, please change the file extension from *.js to *.geojson. Please see our guide on Opening Map Datasets in QGIS for advice on how to use the Revised dataset CSV and GeoJSON files within the free QGIS desktop software.
You can also view related details of the Edinburgh dataset and onward re-use, on our Data Foundry Edinburgh OS 25 inch transcriptions page.
Background mapping
You can also bring the Ordnance Survey 25 inch mapping into other software, following our Re-using georeferenced maps guide. The tileset URL for the Ordnance Survey 25 inch to the mile 1890s Edinburgh layer is:
This tileset URL allows the zoomable mapping to be added into QGIS, ArcGIS, or geojson.io so that the transcriptions can be seen on top of a background layer of Ordnance Survey 25 inch to the mile 1890s historic mapping.
Corrections
If you would like to suggest edits or corrections to any of the content of the Edinburgh Transcriptions dataset, please contact us at maps@nls.uk with the subject line 'Edinburgh Transcriptions Correction'. We need to know, at the very least, the uuid and the specific field content that needs to be corrected, following the Field List.