Quickstart#

This guide will go through some of the basic functionality in bvbrc. If you are unfamiliar with the data in BV-BRC or the BV-BRC Data API, it may be useful to consult the BV-BRC documentation and the BV-BRC API documentation.

Installation#

bvbrc can be installed from PyPI using pip:

pip install bvbrc

If you want to be able to convert the API responses to a pandas or polars DataFrame, then you must also install the appropriate package:


pip install pandas

Basic Usage#

Note

The outputs from the code snippets below were run with BV-BRC version 3.49.1 so the outputs may differ if you run the code snippets with a different version of BV-BRC.

Import the package in your Python code:

import bvbrc as bv

bvbrc has a client object for each of the main data types in BV-BRC. You can retrieve data of a specific type by using the corresponding client. For example, to get data for genomes on BV-BRC, initialize the GenomeClient:

genome_client = bv.GenomeClient()

Each client offers three main methods for retrieving data: get, search, and submit_query. All three of these methods return a BVBRCResponse object.

Retrieving single records#

The get method is for retreiving a single record of the desired data type based on its ID. For example, with the GenomeClient, you can get a single genome entry from BV-BRC if you know its genome_id:

response = genome_client.get("1313.5458")
print(response)

Output:

<Response [200]>

The response code 200 is an ‘OK’ response meaning that your request did not encounter any errors. If you requested the data in the default ‘application/json’ format, then the retreived data can then be accessed in a dictionary by calling response.json().

genome_data = response.json() # Returns a dictionary with the retrieved data
print(genome_data.get("genome_id"))
print(genome_data.get("species"))

Output:

1313.5458
Streptococcus pneumoniae

Searching BV-BRC with queries#

Alternatively, you can use queries to search for and retrieve data on BV-BRC in a more powerful and flexible way using the search method. This method allows the user to provide constraints, select which fields are returned, and even sort the returned data. For example, using the GenomeClient, you can retrieve all genomes for E. coli where the genome_status is “Complete”:

response = genome_client.search(
      genome_client.species == "Escherichia coli",
      genome_client.genome_status == "Complete"
)
start, end, total_results = response.content_range
print(f"{end - start} out of {total_results} total results were retrieved.")

Output:

25 out of 4495 total results were retrieved.

You’ll notice that even though 4495 results met the search parameters, only 25 were returned. This is because BV-BRC has a default limit of 25 on how many results are returned for one query. Consequently, getting the remaining results requires either sending multiple requests and adjusting the starting point each time or simply increasing the limit:

Tip

Whenever you specify the starting index of the returned results, you also should specify the limit even if you want to get the default 25 results.

# Get the next 25 results
next_response = genome_client.search(
      genome_client.species == "Escherichia coli",
      genome_client.genome_status == "Complete",
      limit=25, # Set the limit to return 25 results
      start=25, # Return results starting at index 25
)
start, end, total_results = next_response.content_range
print(f"Results {start}-{end} out of {total_results} total results were retrieved.")

Output:

Results 25-50 out of 4495 total results were retrieved.

You can also specify limit="max" to set the limit to the maximum allowed which is currently 25,000.

response = genome_client.search(
      genome_client.species == "Escherichia coli",
      genome_client.genome_status == "Complete",
      limit="max"
)
start, end, total_results = response.content_range
print(f"Results {start}-{end} out of {total_results} total results were retrieved.")

Output:

Results 0-4495 out of 4495 total results were retrieved.

Accessing retrieved data#

Since searches can retreive multiple results, calling response.json() returns a list of dictionaries instead of a single dictionary. Each dictionary contains the data for one of the retrieved results (similar to the dictionary from the get method).

response = genome_client.search(
      genome_client.species == "Escherichia coli",
      genome_client.genome_status == "Complete",
      select=["genome_id", "species", "genome_status"] # Select fields to return
)
results = response.json()
print("Type of the results:", type(results))
print(len(results), "results returned")

# Print the dictionary for the first result
print("First result:", results[0])

Output:

Type of the results: <class 'list'>
25 results returned
First result: {'genome_id': '562.160986', 'species': 'Escherichia coli', 'genome_status': 'Complete'}

Alternatively, the BVBRCResponse object also provides methods for converting the retrieved results into either a pandas or polars DataFrame. This can be especially useful when trying to work with lots of results and wanting to work with these results in a table-like format.


df = response.to_pandas()
print(df.head())

Output:

    genome_id           species genome_status
0  562.160986  Escherichia coli      Complete
1  562.160987  Escherichia coli      Complete
2  562.160990  Escherichia coli      Complete
3  562.161161  Escherichia coli      Complete
4  562.161188  Escherichia coli      Complete

Need Help?#

If you encounter issues, please open an issue on GitHub or consult the API Reference.