Document

advertisement
Crawling, Parsing and Semantic Matching
of Vacancies and CV’s
Semantic Recruitment Technology
Jakub Zavrel, Textkernel
InGRID Workshop 11-2-2014
Textkernel:
• Spinoff from R&D in machine learning and language
technology
• Founded 2001, offices in Amsterdam (HQ), Frankfurt,
Paris, 45 employees; strong R&D focus
• Deloitte Fast 50 2007, 2010, 30% YoY growth
• Core technology:
Understanding unstructured text data. Multi-lingual
Market:
• Job boards, Recruitment Software, Staffing and recruitment, Mobility, Large
Employers
• Products:
• Multi-lingual tools (15 languages) to extract CVs and jobs
• Jobfeed: largest real time DB for job market analysis
• Search! & Match! to connect people and jobs
• Customers: UWV, Pole Emploi, Adecco, Randstad, USG, Monster, Stepstone,
XING, SAP, Unisys, Bosch, Axa, Philips, etc. (350 direct, 2000+ indirect),
• Large partner network (HR & recruitment software)
Language gap
I like programming, but I’m interested do take on
more project management responsibility
Is there a job in our organisation that better fits
my degree?
I’d like to work on our mobile strategy. I’ve
helped a friend develop a mobile app.
I’d like to do more with my
organisational talent.
We are looking to hire:
An experienced tech team
team lead
The ideal candidate has:
- min. 5yr of experience
- Certfied scrummaster
- Exp. w/iOS, Android
Completed academic studies
Computer Science or related
30% travel for customer
presentations
The Job ad searches directly in a database
and identifies relevant candidates (or vice
versa) …
Extract! CV/Job
Parsing
Automatically convert each
document into a complete record
Extract!
Extract!
Extract!
Extract!
Extract! – Zero data entry job application
Extract!
Extract!
• Time savings coding CVs and Jobs
• If you accept noise, 100% time savings
• Structured data allows better search:
Semantic Searching and Matching
• Coding enables reporting and statistics
Occupation coding!
•
•
•
•
•
•
•
Coding follows Extraction
Customer specific or standard taxonomies
String similarity based normalization
Lot of synonyms per language
Distance = confidences
Problem cases: ambiguity, context, long tail
More complex models can help
(classifiers, multi-variate models)
• Semantic matching better (occupation coding errors are
counterbalanced by other variables)
Search!
• Semantic search:
„Lets you find what you mean not what you
type“
Impression...
Match!
CV
Parsing
Match!
Job
Parsing
Semantic Matching Technology:
• Natural Language Processing
• Machine Learning
• Semantic Analysis
• Probabilistic Language Model
• Search Engine
• Multi-lingual taxonomies
• Recruitment knowledge-bases
Demo
Jobfeed
Search and analyse real-time
online job ads as well as historical
data
Jobfeed
Jobfeed!
Knowledge of all demand for labour in European
job market
– Sales leads for recruitment and staffing companies
– Real time labour market analytics tools
– Largest database of jobs for matching unemployed
– Perfect data source for text mining
Jobfeed!
• Real time collection of online job ads from any
(unstructured) source
• Available in NL, DE, FR, IT
• Gradually rolling out in rest of Europe
• Richly semantically structured data
Jobfeed!
Jobfeed:
Multilingual Occupation Taxonomy
Occupations >4000 codes
4 languages
3 layer hierarchy
>50K synonyms
Link to other concepts:
- Skills
- Education level
- Sector
- O*NET
- UWV (Dutch Employment Agency)
- ROME
Example:
NL: administratief medewerker,
EN: administrative assistant,
FR: employé administratif,
DE: Verwaltungsassistent (m/w).
Group: administrative personnel
Class: Administration and Customer Service
Synonyms: administrative employee, assistant clerk, office support
Skills: ms office, excel, english language, etc
O*NET: 43-9199.00: Office and Administrative Support Workers, All Other
UWV: 1000402563: Administratief medewerker secretariaat
Based on millions of jobs, years of customer feedback and experience!
Demo
Jobfeed as material for Research
Frequent words for "Java developer"
en
van
de
een
je
met
in
het
Java
of
Je
op
is
voor
te
ervaring
aan
als
and
software
om
team
zijn
kennis
bij
Ervaring
die
the
naar
a
jaar
jij
bent
Developer
HBO
hebt
to
werken
werk
Frequent words for all professions
en
van
de
een
in
het
je
met
op
Je
voor
te
is
of
zijn
aan
bent
naar
bij
om
als
ervaring
die
Het
hebt
deze
werken
zoek
De
wij
functie
onze
ben
tot
over
werk
opleiding
uit
and
werkzaamheden
dat
binnen
u
Als
Voor
zelfstandig
kennis
ook
s
verantwoordelijk
Solution: contrast frequencies
Java
develo
per
jobs
All
jobs
# jobs
where
w
occurs
A
B
Total
# jobs
C
D
•
•
•
•
•
Observed frequency of w:
O(w) = A
Expected frequency of w:
E(w) = C * B / D
Pick words with highest
score:
• score(w) = (O - E)2 / E
Top words for "Java developer"
java
developer
software
spring
scrum
agile
hibernate
ontwikkelaar
u
j2ee
development
maven
applicaties
ervaring
web
de
frameworks
jboss
mbo
senior
wij
xml
jee
o
javascript
you
kennis
ontwikkelen
oracle
ontwikkeling
architectuur
webservices
informatica
werkzaamheden
technologie
developers
eclipse
bezit
het
team
wo
rijbewijs
technieken
tomcat
the
vca
zelfstandig
architect
werklocatie
html
Building rich skills profiles for thousands of
occupations from millions of real time jobs…
… new trends and occupations…
Supply & Demand
• Have: lots of data, technology, ideas
• Want: labor market expertise, students, research
Semantic Recruitment Technology
Thanks!
Download