NLP Engineer Hiring Guide: Skills, Salary, and How to Screen Candidates
The NLP engineering market has fractured since 2023. Posting the wrong job description attracts the wrong candidates — and costs 4–6 months of hiring time.
Before 2023, "NLP engineer" had a reasonably consistent meaning: someone who built text processing systems using a combination of rule-based approaches, classical ML, and pretrained language models like BERT. The rise of GPT-4 and the LLM ecosystem fractured that definition into two distinct roles that require different skills, attract different candidates, and need different interview processes.
Companies that post a generic "NLP Engineer" job description in 2026 get CVs from both populations — and then struggle to evaluate which type they actually need. The result is either a mis-hire or a prolonged search that filters out strong candidates for the wrong reasons.
This guide covers the two NLP specialisations, what each requires, how to write the job description correctly, and how to run the interview process for each.
The Two NLP Specialisations in 2026
Classical NLP Engineer
Classical NLP engineers build systems for structured text understanding — extracting information from text at scale, classifying documents, normalising and cleaning text data, and building pipelines that are reliable and maintainable without depending on expensive LLM APIs for every operation.
Core work includes:
- Named entity recognition (NER) and relation extraction
- Document classification and intent detection at scale
- Text normalisation, deduplication, and data cleaning pipelines
- Information extraction from semi-structured text (contracts, reports, forms)
- Search and retrieval systems using sparse (BM25) and dense (embedding-based) approaches
- Multilingual text processing pipelines
Core tools: spaCy, NLTK, Hugging Face Transformers (fine-tuning encoder models like BERT, RoBERTa), Elasticsearch, custom training pipelines.
Salary (US, senior): $160K–$220K base
LLM Engineer (NLP-focused)
LLM engineers build systems on top of foundation models — designing the architecture that determines how language models are prompted, retrieved from, fine-tuned, evaluated, and deployed in production. This is the more common "NLP" hire in 2026, though it is better described as LLM engineering.
Core work includes:
- RAG system design: chunking strategies, embedding models, retrieval evaluation
- Fine-tuning foundation models on domain-specific data (LoRA, QLoRA, full fine-tuning)
- Prompt system engineering and systematic evaluation at scale
- RLHF and preference data collection pipelines
- LLM output evaluation frameworks: automated metrics, human eval, model-based eval
- Latency optimisation for generative inference (streaming, caching, batching)
Core tools: OpenAI/Anthropic APIs, LangChain, LlamaIndex, vLLM, TGI, PEFT, Weights & Biases.
Salary (US, senior): $180K–$260K base
Skills Both Types Share
Despite the split, strong NLP engineers of either type share a common foundation:
- Tokenisation and embeddings: Deep understanding of how text is represented numerically — subword tokenisation, embedding spaces, semantic similarity
- Evaluation fluency: Knowing when BLEU, ROUGE, BERTScore, or task-specific metrics apply — and crucially, knowing their limitations
- Production mindset: Awareness of latency, throughput, and cost constraints — the difference between a model that works in a notebook and one that serves 1M requests per day
- Data quality instincts: Text data is messy; strong NLP engineers spend more time on data cleaning and annotation quality than on model selection
How to Write the Job Description
The most common mistake is writing a job description that lists requirements from both types: "experience with spaCy, BERT fine-tuning, RAG pipelines, RLHF, and production NLP systems." This is two different roles in one posting. It attracts no one strong because strong candidates in either specialisation will see requirements they do not match.
Classical NLP JD — what to include
- What you will do: Build and maintain text extraction and classification systems; design annotation pipelines; own model performance in production
- What we look for: Proficiency with spaCy or similar; experience fine-tuning encoder models; strong data pipeline engineering; production experience at scale; evaluation methodology beyond accuracy
- What we are not looking for: LLM API experience is a bonus, not a requirement
LLM Engineering JD — what to include
- What you will do: Design and implement RAG pipelines; build evaluation frameworks for generative outputs; fine-tune models on domain data; own LLM integration architecture
- What we look for: Hands-on RAG system experience; fine-tuning with PEFT methods; LLM evaluation frameworks; production latency and cost optimisation experience
- What we are not looking for: Classic ML or rule-based NLP experience is not required
The Interview Framework
For Classical NLP Engineers
- Text pipeline design: "We need to extract all financial figures and the entities they refer to from 10,000 earnings call transcripts per day. Walk me through your approach." Evaluate: do they consider rule-based vs model-based trade-offs? Do they think about annotation budget and quality?
- Evaluation strategy: "You have built an NER model. How do you know it is good enough to ship?" Look for: span-level precision/recall, error analysis by entity type, production monitoring approach
- Production scenario: "Your document classifier performs well in offline evaluation but has high error rates on a specific document type in production. How do you debug this?"
For LLM Engineers
- RAG system design: "Design a RAG system for a legal firm that needs to query 500,000 case documents. What chunking strategy, embedding model, and retrieval approach do you use, and how do you evaluate it?"
- Evaluation framework: "How do you measure whether a fine-tuned model is better than a prompted baseline for a document summarisation task?" Look for: automated metrics, human eval design, LLM-as-judge trade-offs
- Production optimisation: "Our LLM endpoint is costing $40K per month. Walk me through how you would reduce this by 50% without unacceptable quality degradation."
Red Flags in Candidates
- Cannot discuss trade-offs between approaches. Strong NLP engineers know when a regex or a small fine-tuned model beats a GPT-4 call — and can explain why. Engineers who default to the most complex solution for every problem create expensive, fragile systems.
- Evaluation blind spots. Candidates who report accuracy without discussing class imbalance, or who evaluate generative models only on ROUGE scores, have not worked on production NLP systems where real quality judgements are harder.
- No production experience. NLP is full of researchers who can train models but have never dealt with inference latency, input distribution shift, or the annotation pipeline required to improve a model after deployment.
- English-only experience. For products with international users, engineers who have never worked with multilingual corpora underestimate the complexity significantly.
Where to Find NLP Engineers
- Academic conferences: ACL, EMNLP, NAACL, EACL author lists — particularly authors from industry labs or applied research teams
- Hugging Face: Model hub contributors and authors of popular models in the NLP category; frequent community forum contributors
- GitHub: Contributors to spaCy, Transformers, LlamaIndex, LangChain, Haystack
- arXiv cs.CL: Regular posters with industry affiliations
- Kaggle: Top performers in NLP competitions with detailed solution write-ups
Salary Benchmarks (2026, US market)
- Classical NLP Engineer (Senior): $160K–$220K base
- LLM Engineer / NLP-focused (Senior): $180K–$260K base
- Staff / Principal NLP Engineer: $230K–$300K+ base
- UK equivalents: Classical NLP £85K–£130K; LLM Engineering £100K–£150K
Working with VAMI on NLP Hiring
VAMI places NLP engineers and LLM engineers across fintech, legaltech, healthcare AI, and enterprise SaaS. Before sourcing, we clarify which of the two specialisations your role actually requires — so you get candidates matched to what you need. See our guides on the LLM engineer vs ML engineer comparison and generative AI engineer skills for additional context.
If you are hiring an NLP engineer and want to reach candidates outside job boards, talk to VAMI about your search.
Summary
- NLP engineering has split into classical NLP (extraction, classification, document processing) and LLM engineering (RAG, fine-tuning, generative systems) — these are different roles
- Writing a JD that mixes both attracts no one strong; be specific about which type you need
- Both types share: tokenisation/embedding knowledge, evaluation fluency, production mindset, data quality instincts
- Interview framework differs by type — classical NLP focuses on pipeline design and evaluation; LLM engineering focuses on RAG architecture and generation quality
- Best sourcing: ACL/EMNLP/NAACL proceedings, Hugging Face contributor lists, arXiv cs.CL, GitHub library contributors
- Salary: $160K–$260K+ base in the US; LLM engineering commands 10–20% premium over classical NLP