Automated PCR Primer Design

Design PCR primers with Primer3 and BLAST on your local computer.

Published

June 24, 2025

Last month I ran my very first PCR, attempting (and eventually failing) to amplify a region flanking a specific SNP. This meant designing a set of PCR primers, a process that turned out to include a surprising amount of gene-sequence copy-and-paste and waiting for NCBI’s servers to run my job. Being a compulsive tool builder, I had to streamline this process, and see if it could all be done locally on my own computer.

Designing PCR primers

When designing PCR primer pairs, several properties should be satisfied to increase the likelihood of successfully amplifying the region of interest. Primers cannot be too short or too long. GC content should not be too low, but definitely not too high. The melting temperature of both primers should be nearly identical. The Primer3 program considers all of theese factors. You feed it a sequence including your region of interest, and it will spit out multiple primer pairs for you too choose from. To ensure that your chosen primer pair does not inadvertently align with another region, you can run a BLAST search against whatever genome you are working with.

There are multiple web-based tools that integrate Primer3 and BLAST, e.g. NCBI’s Primer-BLAST, but I found them slightly frustrating to work with. First, and this was specific to my case, where I was interested in amplifying a specific SNP, there was no way to say “design primers that include this SNP”. Instead, you had to manually find the sequence (or accession number), and figure out the position of the SNP. Second, the primer design and BLAST search runs as a job on NCBI’s server. Sometimes it is moderately fast, sometimes it is slow. This makes tweaking parameters cumbersome, since the feedback loop is prohibitively slow.

Automating primer design

The question then is: can we run Primer3 and BLAST locally, and will it be fast enough? As it turns out, the answers are yes and yes. The process involves:

Ask the user for an SNP RS number.
Download the SNP metadata.
Download the sequence flanking the SNP.
Run Primer3 against the sequence.
Run BLAST against the primer pairs.

The Biopython package can download SNP metadata and sequences from NCBI’s databases:

# SNP RS number
snp_id = "429353"

# SNP metadata
handle = Entrez.esummary(db="snp", id=snp_id, retmode="xml")
record = Entrez.read(handle)

# Accession number
acc = record["DocumentSummarySet"]["DocumentSummary"][0]["ACC"]

# Chromosome position
chrpos = int(
    record["DocumentSummarySet"]["DocumentSummary"][0]["CHRPOS_SORT"]
)

# Sequence flanking (+/- 1000 bp) the SNP
handle = Entrez.efetch(
    db="nucleotide",
    id=acc,
    seq_start=chrpos - 1000,
    seq_stop=chrpos + 1000,
    rettype="fasta",
    retmode="text",
)
seq_record = SeqIO.read(handle, "fasta")

Next we run Primer3 from Python using primer3-py:

primers = primer3.bindings.design_primers(
    seq_args={
        "SEQUENCE_ID": "PRIMER",
        "SEQUENCE_TEMPLATE": str(seq_record.seq),
        "SEQUENCE_TARGET": [1000, 1],
    },
    global_args={
        "PRIMER_OPT_SIZE": 20,
        "PRIMER_MIN_SIZE": 18,
        "PRIMER_MAX_SIZE": 25,
        "PRIMER_OPT_TM": 60.0,
        "PRIMER_MIN_TM": 57.0,
        "PRIMER_MAX_TM": 63.0,
        "PRIMER_MIN_GC": 20.0,
        "PRIMER_MAX_GC": 80.0,
        "PRIMER_MAX_POLY_X": 100,
        "PRIMER_PRODUCT_SIZE_RANGE": [250, 500]
    },
)

To run BLAST, we need to install the BLAST command-line tools, then download the human reference genome:

update_blastdb.pl --decompress human_genome

This takes a while, and will take up ~ 1.5G of disk space. The blastn command-line tool reads sequences from a FASTA file, and writes results to an XML file. From Python we can write the primer pairs to a FASTA file:

primer_seqs = []

for i in range(5):
    left_seq = SeqRecord(
        Seq(primers["PRIMER_LEFT"][i]["SEQUENCE"]),
        id=f"Primer Pair {i + 1} Left",
    )
    right_seq = SeqRecord(
        Seq(primers["PRIMER_RIGHT"][i]["SEQUENCE"]),
        id=f"Primer Pair {i + 1} Right",
    )

    primer_seqs.append(left_seq)
    primer_seqs.append(right_seq)

SeqIO.write(primer_seqs, "primer_pairs.fasta", "fasta")

Then, run blastn:

blastn \
  -query primer_seqs.fasta \
  -db GCF_000001405.39_top_level \
  -outfmt 5 \
  -out blast_results.xml \
  -task blastn-short \
  -perc_identity 95 \
  -strand both \
  -max_target_seqs 10 \
  -num_threads 4

On my MacBook Air M2 this takes approximately 5 seconds, much faster than the several minutes it may take for NCBI to run a BLAST search. The XML file written by blastn can be parsed with Biopython:

records = Blast.parse(blast_results_file)

Each individual primer corresponds to a single query. We can inspect the result of the first query:

 >>> records[0]
  Query: Query_1 (length=20)
         Primer Pair 1 Left 
   Hits: ----  -----  ----------------------------------------------------------
            #  # HSP  ID + description
         ----  -----  ----------------------------------------------------------
            0     56  gi|568815579|ref|NC_000019.10|  Homo sapiens chromosome...
            1     77  gi|568815581|ref|NC_000017.11|  Homo sapiens chromosome...
            2     52  gi|568815578|ref|NC_000020.11|  Homo sapiens chromosome...
            3    139  gi|568815587|ref|NC_000011.10|  Homo sapiens chromosome...
            4    190  gi|568815593|ref|NC_000005.10|  Homo sapiens chromosome...
            5     99  gi|568815584|ref|NC_000014.9|  Homo sapiens chromosome ...
            6     79  gi|568815580|ref|NC_000018.10|  Homo sapiens chromosome...
            7     38  gi|568815576|ref|NC_000022.11|  Homo sapiens chromosome...
            8      3  gi|568815531|ref|NT_187651.1|  Homo sapiens chromosome ...
            9    254  gi|568815596|ref|NC_000002.12|  Homo sapiens chromosome...

We can take a look at the first hit:

 >>> records[0][0]
 Query: Query_1
       Primer Pair 1 Left 
  Hit: gi|568815579|ref|NC_000019.10| (length=58617616)
       Homo sapiens chromosome 19, GRCh38.p13 Primary Assembly
 HSPs: ----  --------  ---------  ------  ---------------  ---------------------
          #   E-value  Bit score    Span      Query range              Hit range
       ----  --------  ---------  ------  ---------------  ---------------------
          0    0.0081      40.14      20           [0:20]    [44908581:44908601]
          1       7.8      30.23      15           [3:18]    [51029314:51029299]
          2        31      28.25      14           [2:16]      [9296443:9296457]
          3        31      28.25      14           [3:17]    [31178323:31178309]
          4   1.2e+02      26.26      13           [0:13]    [29087873:29087886]
          5   1.2e+02      26.26      13           [2:15]    [34134193:34134206]
          6   1.2e+02      26.26      13           [0:13]    [35972210:35972223]
          7   1.2e+02      26.26      13           [0:13]    [51178062:51178075]
          8   1.2e+02      26.26      13           [3:16]    [56957621:56957634]
          9   1.2e+02      26.26      13           [4:17]    [57327775:57327788]
         10   1.2e+02      26.26      13           [5:18]    [21716673:21716660]

The hit is on chromosome 19, as expected, since the SNP is in the APOE gene, which is located on chromosome 19.

Putting it all together

This is all very well, but you might object, that running all of this in a Python console is not that much better than using the online web interface, and you would be absolutely right. Luckily, we can put it all in a Marimo notebook. Marimo notebooks can run in an app mode, making them perfect for small, purpose-built apps such as this:

The notebook is available on GitHub.

Addendum

This was my first foray into biology-related programming. None of this was terribly difficult to do, but I was surprised at the state of the ecosystem. I had expected an array of well-integrated, coherent packages, but instead I ran into a number of somewhat abandoned libraries, and a lot of duct tape. The scverse packages for single-cell omics look (at least on the surface) like a move in the right direction. Is there any work like this for the broader ecosystem, or am I simply missing something obvious as a newcomer? If you have any insight into this, I would love to hear from you.