C-squares
C-squares (acronym for the Concise Spatial QUery And REpresentation System) is a system of spatially unique, location-based identifiers (geocodes) for areas on the surface of the earth, represented as cells from a latitude- and longitude-based Discrete Global Grid at a hierarchical set of resolution steps, obtained by progressively subdividing 10×10 degree World Meteorological Organization squares; the term "c-square" is also available for use to designate any component cell of the grid. Individual cell identifiers incorporate literal values of latitude and longitude in an interleaved notation (producing grid resolutions of 10, 1, 0.1 degrees, etc.), together with additional digits that support intermediate grid resolutions of 5, 0.5, 0.05 degrees, etc. The system was initially designed to represent data "footprints" or spatial extents in a more flexible manner than a standard minimum bounding rectangle, and to support "lightweight", text-based spatial querying; it can also provide a set of identifiers for grid cells used for assembly, storage and analysis of spatially organised data, in a unified notation that transcends national or jurisdictional boundaries. Dataset extents expressed in c-squares notation can be visualised using a web-based utility, the c-squares mapper, an online instance of which is currently provided by CSIRO Oceans and Atmosphere in Australia. C-squares codes and associated published software are free to use and the software is released under version 2 of the GNU General Public License (GPL), a licence of the Free Software Foundation. HistoryThe c-squares method was developed by Tony Rees at CSIRO Oceans and Atmosphere in Australia (then "CSIRO Marine Research") in 2001–2, initially as a method for spatial indexing, rapid query, and compact storage and visualization of dataset spatial "footprints" in an agency-specific metadata directory (data catalogue);[1] it was first publicly announced at the 2002 "EOGEO" Technical Workshop held at Ispra, Italy in May 2002.[2] A more complete description was published in the scientific literature in 2003, together with a web-accessible mapping utility entitled the "c-squares mapper" for visualisation of data extents expressed in the c-squares notation.[3] Since that time, a number of projects and international collaborations have employed c-squares to support spatial indexing and/or map production, including FishBase (to map stored data points for any species), the Ocean Biogeographic Information System (OBIS),[4][5] AquaMaps,[6] data analysis to support the designation of marine biogeographic realms,[7] for multi-national fisheries data collation by the Scientific, Technical and Economic Committee for Fisheries (STECF) of the European Commission,[8] and for data reporting by ICES.[9][10] For its application in displaying and modelling global biodiversity data, c-squares was one of four components cited in the award of the Ebbe Nielsen Prize to Rees by the Global Biodiversity Information Facility (GBIF) in 2014.[11] The concept of representing dataset "footprints" as cells of spatial data of this nature and alignment was stated to have been inspired by the data addressing method in the U.S. National Oceanographic Data Center (NODC) "World Ocean Database" product,[12][1] which uses 10 degree World Meteorological Organization squares (the starting point for c-squares hierarchical subdivision) for organising its data content, and the set of 1:100,000 topographic maps issued by the national mapping agency for Australia (coverage and index here); each map covers a 0.5 degree square and, with its associated mapsheet labels, can notionally be used as a unit of spatial identification.[1] The method has been discussed further in texts on georeferencing, including those by Hill, 2006[13] and Guo et al., 2020.[14] [a] The system name "c-squares" was chosen because it can be represented as an acronym (for "concise spatial query and representation system") and also because it signals that this method belongs to a notional group of similarly named, latitude-longitude gridded subdivisions of the Globe that includes World Meteorological Organization Squares and Marsden squares, and contrasts with other tessellations of the Globe that use different shaped basic units such as rectangles, triangles, diamonds, and hexagons (for examples refer e.g. Sahr et al., 2003[16]). It is also intended that any individual component cell of the grid can be referred to as a "c-square" (no initial capitalization required). RationaleSpatial data are inherently (at least) 2-dimensional; without additional indexing, a numeric range query in 2 dimensions (e.g. x and y, or latitude and longitude) is required to retrieve data items within a particular area. Such queries are computationally expensive so it can be beneficial to pre-process (index) the data in some manner that reduces the inherent dimensionality from two to one dimension, for example as labelled cells of a grid. The grid labels can then be indexed by standard, one dimensional methods for rapid search and retrieval,[17] and/or searched by simple alphanumeric text searches. C-squares is an example of such a grid where the cell identifiers are designed to be human- as well as machine-readable, and to be concordant with recognizable and commonly intervals of latitude and longitude. Additional areas where a grid-based approach to spatial indexing can be beneficial can be for the representation of data "footprints" in support of spatial search,[13] data binning to reduce complex and potentially voluminous data into "blocks" which then can be more easily compared and summarised, and the potential for a hierarchical approach wherein finer resolutions of the grid are nested into coarser ones, with a shared notation (common identifiers for the larger portions of the relevant grid cells). A jurisdiction-independent, (global) grid such as c-squares can also be used to integrate data across national boundaries, in contrast to (for example) the national grids of various countries such as those of the United Kingdom, Ireland, etc., which are not the same in their approach and may have differences or gaps where such grids overlap, or fail to meet (for example in marine regions around two areas). A potential disadvantage of "equal angle" grids (the class that includes c-squares), which are based on standardised units of latitude and longitude, is that the length of the "sides" and the shape (and area) of the grid cells is not constant on the ground (the height remains approximately constant but the width varies with latitude), and some particular effects are noticeable at the poles, where the cells become 3- rather than 4-sided in practice (refer illustration). These disadvantages can be offset by the advantages that data transformation in and out of grid notation can be accomplished by relatively straightforward steps, the results are congruent with conventional maps that show intervals of latitude and longitude, and the concepts of (for example) "1-degree squares" and "0.5 degree squares" may have familiarity and meaning to human users, in a way that non-square, purely mathematically derived shapes and sizes (based upon some form of spherical trigonometry) may not. The c-squares global grid notationInitial 10 degree squares10-degree c-squares are specified as being identical to equivalent World Meteteorological Organization (WMO) square codes, refer illustration at right. These squares are aligned with 10-degree subdivisions of the global latitude–longitude grid, which for c-squares use is specified as employing the WGS84 datum. WMO (10 degree) squares are encoded with four digits, in the series 1xxx, 3xxx, 5xxx and 7xxx.[12] The leading digit indicates the "global quadrant" with 1 for north-east (latitude and longitude are both positive), 3 for south-east (latitude is negative and longitude positive), 5 for south-west (latitude and longitude are both negative) and 7 for north-west (latitude is positive and longitude negative). The next digit, 0 through 8, corresponds to the tens of latitude degrees either north or south; while the remaining 2 digits, 00 through 17, correspond to the tens of longitude degrees either east or west (by specification, 0 is treated as positive). Thus the 10 degree cell with its lower left corner at 0,0 (latitude,longitude) is encoded 1000, and acts as a bin to contain all spatial data between 0 and 10 degrees north (actually, 0 and 9.999...) and 0 and 9.999... degrees east; the 10 degree cell with its lower left corner at 80 N, 170 E is encoded 1817, and acts as a bin to contain all spatial data between 80 and 90 degrees north and 170 and 179.999... degrees east. Subsequent recursive subdivisionC-squares extends the initial WMO 10×10 square notation via a recursive series of "cycles", each 3 digits long (the final one may be 1 digit), separated by the colon character, the number of characters (and cycles) indicating the resolution encoded, as per these examples:
(etc.) Cell size is typically selected to suit the nature (granularity and volume) of the data to be encoded, the overall spatial extent of the area in question (e.g. global to local), the desired spatial resolution of the resulting grid (smallest features/areas that can be differentiated from each other), and the computing resources available (numbers of cells to cover the same area increase by either ×4 or ×25 with each decrease in square size, either requiring an equivalent increase in computing resources or possibly slower addressing times). For example, relatively generalised, global compilations may be best suited to aggregate (index) data by 10- or 5- degree cells, while more local gridded areas may favour 1-, 0.5- or 0.1- degree cells, as appropriate. The nominal sizes given above reflect the fact that at the equator, 1 degree of both latitude and longitude correspond to around 110 km, with the actual value for longitude declining between there and the poles, where it becomes zero (latitude actual: 110.567 km at the equator, 111.699 km at the poles; longitude actual: 111.320 km at the equator, 78.847 km at latitude ±45 degrees, 0 km at the poles); at a sample northern hemisphere latitude e.g. that of London (51.5 degrees north), a 1×1 degree square measures approximately 111×69 km.[18] To produce the 1 or 3 digits in any cycle following the initial 4-digit, 10-degree square identifier, first an "intermediate quadrant", 1 through 4 is designated (refer diagram at right), where 1 indicates low absolute values of both latitude and longitude (regardless of sign), 2 indicates low longitude and high latitude, 3 indicates high latitude and low longitude, and 4 indicates high values for both; "low" and high" being taken from the relevant portion of the data to be gridded (for example within the 10 degree cell extending from 10 to 20 degrees, 10 is treated as low and 19 as high). This leading digit in a cycle is then followed simply by the next applicable digit for first latitude and then longitude: thus an input value of latitude +11.0, longitude +12.0 degrees will be encoded as the 5 degree c-square code 1101:1 and the 1 degree code 1101:112. Inspection of this code will show that the input latitude value can be recovered directly from the digits 1101:112 while the longitude is included as 1101:112; the sign for these is both positive, as indicated by the first digit of the leading 4 (1 in this case, indicating the north east global quadrant). From 2002 onwards (still current at 2020), an online "latlong to c-squares conversion page" is available at the website of CSIRO Marine Research (now CSIRO Oceans and Atmosphere) which will convert input values of latitude and longitude to the equivalent c-square code at user selectable resolutions from 10 to 0.1 degree cell size. Alternatively it is a comparatively simple task to program from first principles (or construct as, for example, a Microsoft Excel worksheet) according to the c-squares specification;[19] an example is available here. C-squares strings, and the c-squares mapperA set of c-squares (contiguous or non contiguous) can be represented as a concatenated list of individual square codes, separated by the "pipe" (|) character, thus: 7500:110:3|7500:110:1|1500:110:3|1500:110:1 (etc.). This set of squares can then serve as an indication of a dataset extent, similar in function (but simpler to specify) to a MultiPolygon in the Well-known text representation of geometry, the functional difference being that defined points forming the boundary of a polygon can be continuously variable, while those for the c-square boundaries are constrained to fixed intervals from the grid square resolution in use. If these strings are stored, for example as "long text" within a field of a conventional text storage system (e.g. spreadsheet, database, etc.) they can be used for the operation of spatial searches (see following section/s). C-squares strings can also be used directly as input to an instance of the "c-squares mapper", a web-based utility in operation since 2002 at CSIRO in Australia (under the domain obis.org.au) and also at other global locations. To visualize the position of any set of squares on a map, the current syntax to address an installation of the "c-squares mapper" is (e.g.): It should be noted here that the above call to the c-squares mapper is a simple one, with only a single parameter (a single c-squares string) which produces a simple "default map"; the mapper is in fact quite highly customizable, capable of accepting up to seven c-squares strings concurrently, plotting them in user-specified colours, with a choice of empty of filled squares, user-selectable base map, etc. etc.; a full list of available input parameters is provided on the mapper "technical information" page.[20] A more sophisticated map produced using a larger number of available parameters is the colour-coded example at right (AquaMap, i.e. modelled distribution, for the ocean sunfish). Commencing in 2006, an upgrade of the mapper incorporating the independently-written Xplanet software also allows the plots of supplied c-squares to be displayed on a user-rotatable and zoomable globe, which can offer a more realistic view for either Pacific Ocean- or polar- centred data than are possible with a flat map (e.g. equirectangular) projection.[21] Th c-squares mapper is one of several options currently (2006–present) available for real time mapping of fish point data records in FishBase, as per this example page for the species Salmo trutta (sea trout); similar options are also available for other (non fish) marine species via SeaLifeBase as per this example. Since 2006, the mapper has also produced in excess of 100,000 species maps for the AquaMaps project (33,500 species x 4 "standard maps" per species as at 2021, additional user-generated maps available on demand). Spatial searchingIn a system that uses c-squares codes as units of spatial indexing, a text-based search on any of these square identifiers will retrieve data associated with the relevant square. If a wildcard search is supported (for example in the case that the wildcard character is a percent sign), a search on "7500%" will retrieve all data items in that ten degree square, a search on "7500:1%" will retrieve all data items in that five degree square, etc. The asterisk character "*" has a special (reserved) meaning in c-squares notation, being a "compact" notation indicating that all finer cells within a higher level cell are included, to the level of resolution indicated by the number of asterisks. In the example above, "7500:*" would indicate that all 4 five-degree cells within parent ten-degree cell "7500" are filled, "7500:***" would indicate that all 100 one-degree cells within parent ten-degree cell "7500" are filled, etc. This approach enables the filling of contiguous blocks of cells with an economy of characters in many cases (a form of data compression), that is useful for efficient storage and transfer of c-squares codes as required. Spatial data reporting, assembly, and analysisC-squares has been employed at a range of resolutions for data reporting, assembly and analysis on scales ranging from global to local, also incorporating multi-national data compilations where a gridded data system is required that is not tied to the boundaries of any single jurisdiction. Examples include:
C-squares labelled cells were adopted as the underlying grid for analysis by the European Union-funded MINOUW project (MINimisation Of UnWanted catches in European Waters), via their web application (MINOUWApp), in support of spatial data (notably fishing effort and density patches of potential unwanted catches) supplied by project researchers across different European countries in a range of formats, in combination with layers of spatial information from external sources.[42] Target audience/potential usersAccording to its design principles, the principal target audience for c-squares is data custodians who wish to organise spatial data by latitude-longitude grid squares at any of the resolutions supported by the system, namely any decimal subdivision of either 10×10 or 5×5 degree squares, to support associated data query, retrieval, analysis, representation (mapping), and potential external data exchange and aggregation. Fine resolution c-squares may also be used as a general "location encoder", selected desirable attributes of which are discussed further by the developers of the Google Open Location Code method,[43] since the c-squares method satisfies the majority of the criteria set out in that discussion document. As evidenced by the references cited in this article, principal adopters of the method to date have been concerned with marine data in particular; this most likely stems from the fact that the oceans are trans-national in their governance, therefore otherwise established local or national grids are unsuitable for analysis of ocean or fisheries data on anything other than a local scale. Although initially deployed in marine-related systems (as per its description in the journal "Oceanography"), in essence the system is terrain-agnostic (as is the latitude-longitude grid upon which it is based) and is applicable equally to both marine and terrestrial data. An additional aspect of c-squares noted by Larsen et al., 2009 and either implicit or explicit in other equivalent "data aggregation methods" is the use of such frameworks to "allow general level analyses without exposing the precise coordinates of potentially sensitive information".[44] For example, real time data on the exact location of fishing vessels is frequently considered "commercial in confidence" to avoid release to competitors of the best fishing localities according to the nature of the resource, which may be continually moving, while for biodiversity data, the exact location of individuals or (for example) nests of rare species may again not be desirable to release to the public. The use of grid cells or similar methods to accurately represent the general location of data points without revealing their more exact location, while still rendering the data available for statistical analysis, is a recognised useful approach in such situations, refer e.g. Chapman, 2020.[45] Congruence with other latitude-longitude geocoding systemsAt its maximum scale, 10 degree c-squares are congruent with both World Meteorological Organization squares (whose identifiers are re-used within the c-squares notation) and Marsden squares, which share the same boundaries but use a different notation. Both 1 degree and 0.5 degree c-squares are partially congruent with "standard resolution" ICES Statistical Rectangles, which utilize a grid cell area of 1×0.5 degrees over a restricted portion of the Globe (north Atlantic region): 2 vertically adjacent ICES rectangles are exactly equivalent to a single 1 degree c-square, while if needed, the content of a single ICES rectangle can be apportioned between 2 horizontally adjacent 0.5 degree c-squares for data interchange at that resolution (refer note). A separate system, QDGC or Quarter Degree Grid Cells, has been developed for interchange of some biodiversity data in Africa, and later extended to cope with data across the Equator and Prime Meridian.[44] QDGC cells, at 0.25×0.25 degrees, lie between the 0.5×0.5 and 0.1×0.1 degree resolution steps of the c-squares system, and are thus not exactly compatible with it, although the "parent" squares of the QDGC grid from which they are derived, at 1×1 and 0.5×0.5 degrees, are congruent with equivalent c-squares grid cells, however using a different notation. In their proposal for an "extended" QDGC system, Larsen et al. additionally describe the potential subdivision of 0.25×0.25 degree QDGC cells by a recursive factor of 2, giving cell sizes of 0.125, 0.0625, 0.03125 degrees, etc., which progressively depart further from the "decimal degrees" concept incorporated into c-squares. Licensing and software availabilityThere is no licence required to use the c-squares method, which has been openly published in the scientific literature since 2003. Source code for the mapper, etc., available via the SourceForge website, is released under the GNU General Public License version 2.0 (GPLv2), which provides free use and redistribution, and subsequent modification for any purpose so long as that licence is retained with the product and any subsequent modifications, in other words, that all the released improved versions will also be free software.[46] See also
Notes
References
External links
|