bvbrc.TaxonomyClient#

class bvbrc.TaxonomyClient(api_key=None)#

Data Type : taxonomy

Primary Key : taxon_id

get(id: str, *, return_format: str | ReturnFormat = ReturnFormat.JSON, timeout: float | tuple[float, float] | None = None) BVBRCResponse#

Retrieve a specific record by its unique identifier.

Fetches a single data record from the BV-BRC database using its unique ID. This is useful when you know the exact identifier of the record you want to retrieve.

Parameters#

idstr

The unique identifier of the record to retrieve. The format depends on the datatype (e.g., genome IDs, feature IDs, etc.).

return_formatstr or ReturnFormat, default ReturnFormat.JSON

The desired format for the returned data. See the BV-BRC documentation for allowed return formats.

timeoutfloat or tuple, optional

Timeout for the HTTP request. Can be a single float (total timeout) or a tuple of (connect_timeout, read_timeout).

Returns#

BVBRCResponse

A response object (derived from the requests.Response object) containing the retrieved data and associated response metadata.

Examples#

Retrieve a genome by ID:

>>> import bvbrc as bv
>>> genome_client = bv.GenomeClient()
>>> response = genome_client.get("1313.5458")

Retrieve with custom format and timeout:

>>> response = genome_client.get(
...     "1313.5458",
...     return_format=bv.ReturnFormat.CSV,
...     timeout=30
... )
search(*predicates: RQLExpr, select: Iterable[str | Field] = Ellipsis, sort: Iterable[str | Field] = Ellipsis, limit: int | Literal['max'] = Ellipsis, start: int = 0, return_format: str | ReturnFormat = ReturnFormat.JSON, timeout: float | tuple[float, float] | None = None, **constraints: Any) BVBRCResponse#

Search for records using provided parameters.

Sends a query to the BV-BRC API using Resource Query Language (RQL) expressions and field constraints. This is the primary method for finding records that match specific criteria.

Parameters#

*predicatesRQL.RQLExpr

Variable number of RQL expression objects that define the search criteria. These are combined using a logical AND operation.

selectiterable of str or RQL.Field, optional

Fields to include in the query results. Can be field names as strings or RQL.Field objects. If not specified, fields are returned based on the default behavior of the BV-BRC API (which may vary by return format).

sortiterable of str or RQL.Field, optional

Fields to sort the results by. Can be field names as strings or RQL.Field objects. Multiple fields create a multi-level sort. Sort direction must be specified with ‘+’ (ascending) or ‘-’ (descending).

limitint or “max”, optional

Maximum number of results to return. Can be an integer or the string “max” to return all matching results (up to a maximum of 25,000).

startint, default 0

Starting offset for pagination (0-based index).

return_formatstr or ReturnFormat, default ReturnFormat.JSON

The desired format for the returned data. See the BV-BRC documentation for allowed return formats.

timeoutfloat or tuple, optional

Timeout for the HTTP request. Can be a single float (total timeout) or a tuple of (connect_timeout, read_timeout).

**constraintsany

Additional keyword arguments that specify field constraints or filters. Each key-value pair represents a field name and its required value.

Returns#

BVBRCResponse

A response object (derived from the requests.Response object) containing the query results and associated response metadata.

Examples#

Basic search with field constraints:

>>> import bvbrc as bv
>>> genome_client = bv.GenomeClient()
>>> response = genome_client.search(species="Escherichia coli", limit=10)

Advanced search with predicates:

>>> response = genome_client.search(
...     bv.fld("genome_length") > 5000000,
...     bv.fld("genome_status") == "Complete",
...     select=["genome_id", "genome_name", "genome_length"],
...     sort=["+genome_length"],
...     limit=50
... )

Or searching using the field attributes of the client object:

>>> response = genome_client.search(
...     genome_client.genome_length > 5000000,
...     genome_client.genome_status == "Complete",
...     select=[
...         genome_client.genome_id,
...         genome_client.genome_name,
...         genome_client.genome_length
...     ],
...     sort=[+genome_client.genome_length],
...     limit=50
... )
submit_query(query: RQLQuery, *, return_format: str | ReturnFormat = ReturnFormat.JSON, timeout: float | tuple[float, float] | None = None) BVBRCResponse#

Submit a pre-built RQL query to the BV-BRC API.

This method is useful when you have already built a query using the bvbrc.query method and want to submit it with a client.

Parameters#

queryRQL.RQLQuery

A pre-built RQL query object containing the search criteria, field selections, sorting, and other query parameters.

return_formatstr or ReturnFormat, default ReturnFormat.JSON

The desired format for the returned data. See the BV-BRC documentation for allowed return formats.

timeoutfloat or tuple, optional

Timeout for the HTTP request. Can be a single float (total timeout) or a tuple of (connect_timeout, read_timeout).

Returns#

BVBRCResponse

A response object (derived from the requests.Response object) containing the query results and associated response metadata.

Examples#

Build and submit a query:

>>> import bvbrc as bv
>>> q = bv.query(
...     bv.fld("genome_name") == "Escherichia coli",
...     select=["genome_id", "genome_name"],
...     limit=10
... )
>>> client = bv.GenomeClient()
>>> response = client.submit_query(q)

Specify the return format:

>>> response = client.submit_query(
...     q,
...     return_format=bv.ReturnFormat.CSV
)
cds_mean = Field('cds_mean')#

number

cds_sd = Field('cds_sd')#

number

core_families = Field('core_families')#

integer

core_family_ids = Field('core_family_ids')#

array of strings

description = Field('description')#

string

division = Field('division')#

string

genetic_code = Field('genetic_code')#

integer

genome_count = Field('genome_count')#

integer

genome_length_mean = Field('genome_length_mean')#

number

genome_length_sd = Field('genome_length_sd')#

number

genomes = Field('genomes')#

integer

genomes_f = Field('genomes_f')#

string

hypothetical_cds_ratio_mean = Field('hypothetical_cds_ratio_mean')#

number

hypothetical_cds_ratio_sd = Field('hypothetical_cds_ratio_sd')#

number

lineage = Field('lineage')#

string

lineage_ids = Field('lineage_ids')#

array of integers

lineage_names = Field('lineage_names')#

array of strings

lineage_ranks = Field('lineage_ranks')#

array of strings

other_names = Field('other_names')#

array of case insensitive strings

parent_id = Field('parent_id')#

integer

plfam_cds_ratio_mean = Field('plfam_cds_ratio_mean')#

number

plfam_cds_ratio_sd = Field('plfam_cds_ratio_sd')#

number

taxon_id = Field('taxon_id')#

primary key

string

taxon_id_i = Field('taxon_id_i')#

integer

taxon_name = Field('taxon_name')#

case insensitive string

taxon_rank = Field('taxon_rank')#

string