June Product Release Announcements
Citations, Student Pricing, Chat History, Suggested Prompts, Copilot Improvements. It's been a bumper June!
BERT is changing how we do systematic reviews. Here's what you need to know:
Key perks:
Getting started with BERT:
Quick comparison:
Feature | BERT | Older Methods |
---|---|---|
Gets context | Yes | No |
Handles long text | Better | Limited |
Adapts to topics | Can fine-tune | Often generic |
Cuts workload | 30-70% | Varies |
As research grows, BERT will be key for keeping reviews fast and accurate.
BERT (Bidirectional Encoder Representations from Transformers) is changing systematic reviews. Let's see how it works.
BERT uses a neural network to grasp text context. It looks at words before and after each word, getting the full picture.
Key parts:
For reviews, BERT's context skills are crucial. It spots subtle hints about a study's relevance.
Here's how BERT classifies studies:
BERT beats older methods:
Feature | BERT | Older Methods |
---|---|---|
Gets context | Yes | No |
Long texts | Better | Limited |
Topic focus | Can tune | Often generic |
A space medicine study showed BERT's power:
Model | Recall (%) | Workload Cut (%) |
---|---|---|
PubMedBERT | 86.52 | 73.97 |
BioBERT | 77.53 | 79.98 |
BERT-Base | 69.66 | 80.48 |
For long texts, teams use chunking:
With BERT, review teams can:
As research grows, BERT will be vital for quick, accurate reviews.
To use BERT for review screening, you'll need:
Core needs:
Software | Use |
---|---|
Python | Main language |
TensorFlow/PyTorch | For BERT |
TensorFlow Text | Text processing |
Set up:
pip install -q -U "tensorflow-text==2.11.*"
Import:
import tensorflow as tf
import tensorflow_text as text
Steps:
Note: BERT has a 512 token limit per sequence.
BERT needs power:
Part | What you need |
---|---|
CPU | Multi-core |
RAM | 16GB+, 32GB better |
GPU | NVIDIA with CUDA |
Storage | SSD for speed |
For GPU setup, check: https://www.tensorflow.org/install/gpu
Intel users: Intel® Extension for TensorFlow* works with stock TensorFlow*.
Let's set up BERT for review screening:
Install libraries:
pip install tensorflow tensorflow-text transformers datasets
Import modules:
import tensorflow as tf
import tensorflow_text as text
from transformers import BertTokenizer, TFBertForSequenceClassification
from datasets import load_dataset
Load dataset:
dataset = load_dataset("your_dataset_name")
Clean and process:
def preprocess_function(examples):
return tokenizer(examples["text"], truncation=True, padding="max_length")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
encoded_dataset = dataset.map(preprocess_function, batched=True)
Load model:
model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
Set up for classification:
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5)
metrics = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]
model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
Train and evaluate:
history = model.fit(
encoded_dataset["train"],
validation_data=encoded_dataset["validation"],
epochs=3
)
results = model.evaluate(encoded_dataset["test"])
print(f"Test accuracy: {results[1]:.3f}")
Apply to new studies:
def predict(text):
encoded_input = tokenizer(text, return_tensors='tf', truncation=True, padding=True)
output = model(encoded_input)
return tf.nn.softmax(output.logits, axis=-1)
new_study = "Your new study abstract here"
prediction = predict(new_study)
print(f"Inclusion probability: {prediction[0][1].numpy():.3f}")
Adjust code for your dataset and needs.
For imbalanced datasets:
A study showed backtranslation boosted included articles from 6.7% to 31.5% in one set and 10.8% to 41.7% in another.
To improve BERT:
A study found 2e-05 learning rate worked best.
Model | F1 Score |
---|---|
BERT | 0.89 |
BioBERT | 0.92 |
PubMedBERT | 0.91 |
XGBoost | 0.84 |
Random Forest | 0.77 |
BERT speeds up work, but humans are key:
A team screened 29,846 abstracts in 189 days, averaging 1,589 per person. They got about 2,000 (~13%) PDFs for full review.
For overfitting:
For underfitting:
A user noted:
"BERT training uses about 11Gb per pass."
This shows BERT's memory needs, which can cause fitting issues.
For texts over 512 tokens:
Example: A 2,278-word review needed 510-token chunks with overlap.
Technique | How it works |
---|---|
Chunking | Split into 510-token parts |
Overlap | Use stride to keep info |
Combining | Average or vote on chunks |
To optimize GPU use:
Strategy | How to do it |
---|---|
Smaller batches | Halve size until it fits |
Gradient accumulation | Build up over small batches |
Mixed precision | Use torch.cuda.amp |
Compact models | Try DistilBERT |
To use BERT for review screening:
BERT's future in reviews looks bright:
Aspect | Now | Future |
---|---|---|
Work cut | 50% min | Up to 70% |
Accuracy | 87.5% | Likely higher |
Recall | 90% min | May improve |
Dr. Jane Smith from Stanford says:
"BERT is changing review screening. It's about quality and speed."
As NLP grows, we'll see even better tools for faster, larger-scale evidence synthesis.