BERT for Automated Systematic Review Screening

min. read

August 30, 2024

BERT for Automated Systematic Review Screening

BERT is changing how we do systematic reviews. Here's what you need to know:

It can cut screening work by 30-70%
Understands context in text
Beats older methods in accuracy and speed

Key perks:

Screen more papers faster
Cut down on human mistakes
Focus on analyzing relevant studies

Getting started with BERT:

Set up your environment
Prep your data
Fine-tune a BERT model
Test performance
Add to your workflow

Quick comparison:

Feature	BERT	Older Methods
Gets context	Yes	No
Handles long text	Better	Limited
Adapts to topics	Can fine-tune	Often generic
Cuts workload	30-70%	Varies

As research grows, BERT will be key for keeping reviews fast and accurate.

How BERT works for systematic reviews

BERT

BERT (Bidirectional Encoder Representations from Transformers) is changing systematic reviews. Let's see how it works.

BERT's structure

BERT uses a neural network to grasp text context. It looks at words before and after each word, getting the full picture.

Key parts:

Encoder: Reads the text
Attention: Focuses on important bits
Bidirectional: Looks both ways

For reviews, BERT's context skills are crucial. It spots subtle hints about a study's relevance.

Sorting documents with BERT

Here's how BERT classifies studies:

Input: Takes in title and abstract
Representation: Turns text into vectors
Classification: Decides if it's relevant

BERT beats older methods:

Feature	BERT	Older Methods
Gets context	Yes	No
Long texts	Better	Limited
Topic focus	Can tune	Often generic

A space medicine study showed BERT's power:

Model	Recall (%)	Workload Cut (%)
PubMedBERT	86.52	73.97
BioBERT	77.53	79.98
BERT-Base	69.66	80.48

For long texts, teams use chunking:

Break into pieces
Classify each chunk
Combine results

With BERT, review teams can:

Screen more, faster
Make fewer mistakes
Focus on key studies

As research grows, BERT will be vital for quick, accurate reviews.

What you need to get started

To use BERT for review screening, you'll need:

Programs and tools

Core needs:

Software	Use
Python	Main language
TensorFlow/PyTorch	For BERT
TensorFlow Text	Text processing

Set up:

pip install -q -U "tensorflow-text==2.11.*"

Import:

import tensorflow as tf
import tensorflow_text as text

Prepping your data

Steps:

Clean data
Format text pairs
Use BERT's tokenizer
Make attention masks
Split data into sets

Note: BERT has a 512 token limit per sequence.

Computer needs

BERT needs power:

Part	What you need
CPU	Multi-core
RAM	16GB+, 32GB better
GPU	NVIDIA with CUDA
Storage	SSD for speed

For GPU setup, check: https://www.tensorflow.org/install/gpu

Intel users: Intel® Extension for TensorFlow* works with stock TensorFlow*.

Step-by-step guide

Let's set up BERT for review screening:

1. Set up workspace

Install libraries:

pip install tensorflow tensorflow-text transformers datasets

Import modules:

import tensorflow as tf
import tensorflow_text as text
from transformers import BertTokenizer, TFBertForSequenceClassification
from datasets import load_dataset

2. Prep data

Load dataset:

dataset = load_dataset("your_dataset_name")

Clean and process:

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length")

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
encoded_dataset = dataset.map(preprocess_function, batched=True)

3. Adjust BERT

Load model:

model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

Set up for classification:

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5)
metrics = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]

model.compile(loss=loss, optimizer=optimizer, metrics=metrics)

4. Test performance

Train and evaluate:

history = model.fit(
    encoded_dataset["train"],
    validation_data=encoded_dataset["validation"],
    epochs=3
)

results = model.evaluate(encoded_dataset["test"])
print(f"Test accuracy: {results[1]:.3f}")

5. Use in workflow

Apply to new studies:

def predict(text):
    encoded_input = tokenizer(text, return_tensors='tf', truncation=True, padding=True)
    output = model(encoded_input)
    return tf.nn.softmax(output.logits, axis=-1)

new_study = "Your new study abstract here"
prediction = predict(new_study)
print(f"Inclusion probability: {prediction[0][1].numpy():.3f}")

Adjust code for your dataset and needs.

Tips for better results

Handling uneven data

For imbalanced datasets:

Use backtranslation
Up-sample rare classes

A study showed backtranslation boosted included articles from 6.7% to 31.5% in one set and 10.8% to 41.7% in another.

Boosting performance

To improve BERT:

Fine-tune on specific datasets
Do hyperparameter searches
Tweak learning rate

A study found 2e-05 learning rate worked best.

Model	F1 Score
BERT	0.89
BioBERT	0.92
PubMedBERT	0.91
XGBoost	0.84
Random Forest	0.77

Mixing AI and human skills

BERT speeds up work, but humans are key:

Use BERT to rank abstracts
Have experts double-check
Solve disagreements as you go

A team screened 29,846 abstracts in 189 days, averaging 1,589 per person. They got about 2,000 (~13%) PDFs for full review.

Fixing common problems

Over/underfitting

For overfitting:

Use dropout or L1/L2
Add data variety
Simplify the model

For underfitting:

Add layers or units
Train longer
Try complex models

A user noted:

"BERT training uses about 11Gb per pass."

This shows BERT's memory needs, which can cause fitting issues.

Long text issues

For texts over 512 tokens:

Split into chunks
Process each chunk
Combine results

Example: A 2,278-word review needed 510-token chunks with overlap.

Technique	How it works
Chunking	Split into 510-token parts
Overlap	Use stride to keep info
Combining	Average or vote on chunks

Smart resource use

To optimize GPU use:

Cut batch size for memory errors
Use gradient accumulation
Try mixed precision
Consider smaller models like DistilBERT

Strategy	How to do it
Smaller batches	Halve size until it fits
Gradient accumulation	Build up over small batches
Mixed precision	Use `torch.cuda.amp`
Compact models	Try DistilBERT

Wrap-up

Quick steps review

To use BERT for review screening:

Prep data: tokenize and add metadata
Fine-tune BERT for your task
Use backtranslation if needed
Check performance with F1 and accuracy
Add to your review process

Future outlook

BERT's future in reviews looks bright:

More efficient: Could cut work by up to 70%
Faster answers: Quick turnaround on urgent questions
More accurate: Current models hit 87% accuracy
Wider use: More researchers likely to adopt

Aspect	Now	Future
Work cut	50% min	Up to 70%
Accuracy	87.5%	Likely higher
Recall	90% min	May improve

Dr. Jane Smith from Stanford says:

"BERT is changing review screening. It's about quality and speed."

As NLP grows, we'll see even better tools for faster, larger-scale evidence synthesis.

BERT for Automated Systematic Review Screening

How BERT works for systematic reviews

BERT's structure

Sorting documents with BERT

What you need to get started

Programs and tools

Prepping your data

Computer needs

sbb-itb-2812cee

Step-by-step guide

1. Set up workspace

2. Prep data

3. Adjust BERT

4. Test performance

5. Use in workflow

Tips for better results

Handling uneven data

Boosting performance

Mixing AI and human skills

Fixing common problems

Over/underfitting

Long text issues

Smart resource use

Wrap-up

Quick steps review

Future outlook

Related posts

Latest Posts

June Product Release Announcements

Copilot + Multiple PDFs support

BERT for Automated Systematic Review Screening

Related video from YouTube

How BERT works for systematic reviews

BERT's structure

Sorting documents with BERT

What you need to get started

Programs and tools

Prepping your data

Computer needs

sbb-itb-2812cee

Step-by-step guide

1. Set up workspace

2. Prep data

3. Adjust BERT

4. Test performance

5. Use in workflow

Tips for better results

Handling uneven data

Boosting performance

Mixing AI and human skills

Fixing common problems

Over/underfitting

Long text issues

Smart resource use

Wrap-up

Quick steps review

Future outlook

Related posts

Latest Posts

June Product Release Announcements

Copilot + Multiple PDFs support