Digestor API

digestor

Scripts and metadata for loading survey data into the Data Lab database.

digestor.base

Base class containing common functionality.

class digestor.base.Digestor(schema, table, description=None, merge=None, pixels=True, random=True, ecliptic=True, galactic=True)[source]

Base class for FITS+SQL to FITS+SQL conversion.

Parameters:
  • schema (str) – Name of the PostgreSQL schema containing table.

  • table (str) – Name of the PostgreSQL table.

  • description (str, optional) – A short description of schema.

  • merge (str, optional) – Name of a JSON file containing existing TapSchema metadata.

  • pixels (bool, optional) – If False, don’t HTM and HEALPix columns.

  • random (bool, optional) – If False, don’t add a random_id column.

  • ecliptic (bool, optional) – If False, don’t add ecliptic coordinates (probably because they already exist).

  • galactic (bool, optional) – If False, don’t add galactic coordinates (probably because they already exist).

_dlColumns()[source]

Add SQL column definitions of Data Lab-added columns.

Returns:

A list suitable for appending to an existing list of columns.

Return type:

list

_getYAML(filename)[source]

Cache reads of YAML configuration files.

Parameters:

filename (str) – Name of the YAML configuration file.

Returns:

The contents of filename, or None if there is no such file.

Return type:

dict

_initTapSchema(description='', merge=None)[source]

Create a dictionary compatible with TapSchema.

Parameters:
  • description (str, optional) – A short description of schema.

  • merge (str, optional) – Name of a JSON file containing existing TapSchema metadata.

Returns:

A dictionary compatible with TapSchema.

Return type:

dict

Raises:

ValueError – When merging, if the schema names don’t match, or if the table is already loaded.

addDLColumns(filename, ra='ra', overwrite=False)[source]

Add DL columns to FITS file prior to column reorganization.

Parameters:
  • filename (str) – Name of the FITS file.

  • ra (str, optional) – Look for Right Ascension in this column (default ‘ra’).

  • overwrite (bool, optional) – If True, remove any existing file.

Returns:

The name of the processed file.

Return type:

str

Raises:

ValueError – If a problem with stilts is detected.

property colNames

List of columns in the table.

columnIndex(column)[source]

Find the index of the column in the list of columns.

Raises:

ValueError – If the column is not found.

classmethod configureLog(filename, debug=False)[source]

Set up logging for the module.

Parameters:
  • filename (str) – Name of the log file.

  • debug (bool, optional) – If True, set log level to DEBUG.

createSQL()[source]

Construct a CREATE TABLE statement from the TapSchema metadata.

Returns:

A SQL table definition.

Return type:

str

customSTILTS(filename)[source]

Add (prepend) custom STILTS commands to the default command.

Parameters:

filename (str) – Name of the YAML configuration file.

fixColumns(filename)[source]

Fix any table definition oddities “by hand”.

Parameters:

filename (str) – Name of the YAML configuration file.

Raises:

ValueError – If the configuration file contains an unknown column.

logName(method)[source]

Get a logger with name method.

Parameters:

method (str) – Name of the log object. Will be appended to the root name.

Returns:

A configured log object.

Return type:

logging.Logger

mapColumns()[source]

Complete mapping of FITS table columns to SQL columns.

This method may need to be overridden by a subclass.

Raises:

KeyError – If an expected mapping cannot be found.

property nColumns

Number of columns in the table.

parseFITS(filename, hdu=1)[source]

Read FITS metadata from filename.

Parameters:
  • filename (str) – Name of the FITS file.

  • hdu (int, optional) – Read data from this HDU (default 1).

processFITS(hdu=1, overwrite=False)[source]

Convert a pre-processed FITS file into one ready for database loading.

This method may be overridden in subclasses with survey-specific requirements.

Parameters:
  • hdu (int, optional) – Read data from this HDU (default 1).

  • overwrite (bool, optional) – If True, remove any existing file.

Returns:

The name of the file written.

Return type:

str

Raises:

ValueError – If the FITS data type cannot be converted to SQL.

sortColumns()[source]

Sort the SQL columns for best performance.

property stable

Schema-qualified table name.

tableIndex()[source]

Find the index of the table in the list of tables.

Raises:

ValueError – If the table is not found.

tapColumn(column, **kwargs)[source]

Create a TapSchema-compatible column definition.

Parameters:

column (str) – Name of the column.

Returns:

A column definition in TapSchema format.

Return type:

dict

writeSQL(filename)[source]

Write the CREATE TABLE statement to filename.

Parameters:

filename (str) – Name of the SQL file.

writeTapSchema(filename)[source]

Write the TapSchema metadata to a JSON file.

Parameters:

filename (str) – Name of the JSON file.

digestor.sdss

Convert SDSS SQL (MS SQL Server) table definitions to Data Lab SQL (PostgreSQL).

class digestor.sdss.SDSS(*args, **kwargs)[source]

Convert SDSS FITS+SQL files into Data Lab-compatible forms.

_photoFlag(column, table)[source]

Handle photometric flags in SDSS data.

Parameters:
  • column (dict) – A TapSchema column definition.

  • table (astropy.table.Table) – Table containing the input data.

Returns:

The combined flags and flags2 data, or None if the column did not match.

Return type:

numpy.ndarray

Raises:

AssertionError – If the required columns are not present in the FITS file.

fixMapping(filename)[source]

Fix any FITS to SQL mapping problems using the YAML configuration file filename.

Parameters:

filename (str) – Name of the YAML configuration file.

fixNOFITS(filename)[source]

Fix any missing data designated by --/F NOFITS using the YAML configuration file filename.

Parameters:

filename (str) – Name of the YAML configuration file.

mapColumns()[source]

Complete mapping of FITS table columns to SQL columns.

Raises:

KeyError – If an expected mapping cannot be found.

parseColumnMetadata(column, data)[source]

Parse the metadata for an individual column.

Parameters:
  • column (str) – Name of the column.

  • data (str) – Metadata string extracted from the SQL file.

Returns:

A tuple containing a dictionary containing the parsed metadata in TapSchema format and a FITS column name, if found.

Return type:

tuple

parseLine(line)[source]

Parse a single line from a SQL file.

Parameters:

line (str) – A single line from a SQL file.

Notes

  • Currently, the long description (--/T) is thrown out.

parseSQL(filename)[source]

Parse an entire SQL file.

Parameters:

filename (str) – Name of the SQL file.

processFITS(hdu=1, overwrite=False)[source]

Convert a pre-processed FITS file into one ready for database loading.

Parameters:
  • hdu (int, optional) – Read data from this HDU (default 1).

  • overwrite (bool, optional) – If True, remove any existing file.

Returns:

The name of the file written.

Return type:

str

Raises:

ValueError – If the FITS data type cannot be converted to SQL.

writePOSTSQL(filename, pkey='objid')[source]

Write additional SQL commands needed after loading the table itself.

Parameters:
  • filename (str) – Name of the SQL file.

  • pkey (str, optional) – Name of the PRIMARY KEY column (default ‘objid’).

writeSQL(filename)[source]

Write the CREATE TABLE statement to filename, along with any SQL commands to pre-load.

Parameters:

filename (str) – Name of the SQL file.

digestor.sdss.get_options()[source]

Parse command-line options.

Returns:

The parsed options.

Return type:

argparse.Namespace

digestor.sdss.main()[source]

Entry-point for command-line script.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

digestor.view

Handle schema metadata for views.

digestor.view.get_options()[source]

Parse command-line options.

Returns:

The parsed options.

Return type:

argparse.Namespace

digestor.view.main()[source]

Entry-point for command-line script.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int