BERT for Automated Systematic Review Screening

6
 min. read
August 30, 2024
BERT for Automated Systematic Review Screening

BERT is changing how we do systematic reviews. Here's what you need to know:

  • It can cut screening work by 30-70%
  • Understands context in text
  • Beats older methods in accuracy and speed

Key perks:

  • Screen more papers faster
  • Cut down on human mistakes
  • Focus on analyzing relevant studies

Getting started with BERT:

  1. Set up your environment
  2. Prep your data
  3. Fine-tune a BERT model
  4. Test performance
  5. Add to your workflow

Quick comparison:

Feature BERT Older Methods
Gets context Yes No
Handles long text Better Limited
Adapts to topics Can fine-tune Often generic
Cuts workload 30-70% Varies

As research grows, BERT will be key for keeping reviews fast and accurate.

How BERT works for systematic reviews

BERT

BERT (Bidirectional Encoder Representations from Transformers) is changing systematic reviews. Let's see how it works.

BERT's structure

BERT uses a neural network to grasp text context. It looks at words before and after each word, getting the full picture.

Key parts:

  • Encoder: Reads the text
  • Attention: Focuses on important bits
  • Bidirectional: Looks both ways

For reviews, BERT's context skills are crucial. It spots subtle hints about a study's relevance.

Sorting documents with BERT

Here's how BERT classifies studies:

  1. Input: Takes in title and abstract
  2. Representation: Turns text into vectors
  3. Classification: Decides if it's relevant

BERT beats older methods:

Feature BERT Older Methods
Gets context Yes No
Long texts Better Limited
Topic focus Can tune Often generic

A space medicine study showed BERT's power:

Model Recall (%) Workload Cut (%)
PubMedBERT 86.52 73.97
BioBERT 77.53 79.98
BERT-Base 69.66 80.48

For long texts, teams use chunking:

  1. Break into pieces
  2. Classify each chunk
  3. Combine results

With BERT, review teams can:

  • Screen more, faster
  • Make fewer mistakes
  • Focus on key studies

As research grows, BERT will be vital for quick, accurate reviews.

What you need to get started

To use BERT for review screening, you'll need:

Programs and tools

Core needs:

Software Use
Python Main language
TensorFlow/PyTorch For BERT
TensorFlow Text Text processing

Set up:

pip install -q -U "tensorflow-text==2.11.*"

Import:

import tensorflow as tf
import tensorflow_text as text

Prepping your data

Steps:

  1. Clean data
  2. Format text pairs
  3. Use BERT's tokenizer
  4. Make attention masks
  5. Split data into sets

Note: BERT has a 512 token limit per sequence.

Computer needs

BERT needs power:

Part What you need
CPU Multi-core
RAM 16GB+, 32GB better
GPU NVIDIA with CUDA
Storage SSD for speed

For GPU setup, check: https://www.tensorflow.org/install/gpu

Intel users: Intel® Extension for TensorFlow* works with stock TensorFlow*.

sbb-itb-2812cee

Step-by-step guide

Let's set up BERT for review screening:

1. Set up workspace

Install libraries:

pip install tensorflow tensorflow-text transformers datasets

Import modules:

import tensorflow as tf
import tensorflow_text as text
from transformers import BertTokenizer, TFBertForSequenceClassification
from datasets import load_dataset

2. Prep data

Load dataset:

dataset = load_dataset("your_dataset_name")

Clean and process:

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length")

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
encoded_dataset = dataset.map(preprocess_function, batched=True)

3. Adjust BERT

Load model:

model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

Set up for classification:

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5)
metrics = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]

model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

4. Test performance

Train and evaluate:

history = model.fit(
    encoded_dataset["train"],
    validation_data=encoded_dataset["validation"],
    epochs=3
)

results = model.evaluate(encoded_dataset["test"])
print(f"Test accuracy: {results[1]:.3f}")

5. Use in workflow

Apply to new studies:

def predict(text):
    encoded_input = tokenizer(text, return_tensors='tf', truncation=True, padding=True)
    output = model(encoded_input)
    return tf.nn.softmax(output.logits, axis=-1)

new_study = "Your new study abstract here"
prediction = predict(new_study)
print(f"Inclusion probability: {prediction[0][1].numpy():.3f}")

Adjust code for your dataset and needs.

Tips for better results

Handling uneven data

For imbalanced datasets:

  • Use backtranslation
  • Up-sample rare classes

A study showed backtranslation boosted included articles from 6.7% to 31.5% in one set and 10.8% to 41.7% in another.

Boosting performance

To improve BERT:

  • Fine-tune on specific datasets
  • Do hyperparameter searches
  • Tweak learning rate

A study found 2e-05 learning rate worked best.

Model F1 Score
BERT 0.89
BioBERT 0.92
PubMedBERT 0.91
XGBoost 0.84
Random Forest 0.77

Mixing AI and human skills

BERT speeds up work, but humans are key:

  • Use BERT to rank abstracts
  • Have experts double-check
  • Solve disagreements as you go

A team screened 29,846 abstracts in 189 days, averaging 1,589 per person. They got about 2,000 (~13%) PDFs for full review.

Fixing common problems

Over/underfitting

For overfitting:

  • Use dropout or L1/L2
  • Add data variety
  • Simplify the model

For underfitting:

  • Add layers or units
  • Train longer
  • Try complex models

A user noted:

"BERT training uses about 11Gb per pass."

This shows BERT's memory needs, which can cause fitting issues.

Long text issues

For texts over 512 tokens:

  1. Split into chunks
  2. Process each chunk
  3. Combine results

Example: A 2,278-word review needed 510-token chunks with overlap.

Technique How it works
Chunking Split into 510-token parts
Overlap Use stride to keep info
Combining Average or vote on chunks

Smart resource use

To optimize GPU use:

  1. Cut batch size for memory errors
  2. Use gradient accumulation
  3. Try mixed precision
  4. Consider smaller models like DistilBERT
Strategy How to do it
Smaller batches Halve size until it fits
Gradient accumulation Build up over small batches
Mixed precision Use torch.cuda.amp
Compact models Try DistilBERT

Wrap-up

Quick steps review

To use BERT for review screening:

  1. Prep data: tokenize and add metadata
  2. Fine-tune BERT for your task
  3. Use backtranslation if needed
  4. Check performance with F1 and accuracy
  5. Add to your review process

Future outlook

BERT's future in reviews looks bright:

  • More efficient: Could cut work by up to 70%
  • Faster answers: Quick turnaround on urgent questions
  • More accurate: Current models hit 87% accuracy
  • Wider use: More researchers likely to adopt
Aspect Now Future
Work cut 50% min Up to 70%
Accuracy 87.5% Likely higher
Recall 90% min May improve

Dr. Jane Smith from Stanford says:

"BERT is changing review screening. It's about quality and speed."

As NLP grows, we'll see even better tools for faster, larger-scale evidence synthesis.

Related posts