Generating Recommendations in a Real World Job Market

Vivek Kaushal
6 min readOct 4, 2021
Photo by Clem Onojeghuo on Unsplash

When you’re a job board catering to employers and job-seekers, the recommendations you generate on both ends become critical differentiators for your product. An employer wants the best candidates for job posts, and candidates want the most relevant jobs, where they have a great shot of selection. This blog is an outline of a solution for such real life scenarios, even with limited data. The target is to build a scalable recommendation system, which evolves as the volume of available data grows and is capable of flexibly adapting to changes and additions.

Available Data

For job posts by employers, let’s assume that we have the following data available:

  • The category and sub-category of the job post (e.g., Driver->Limo Driver)
  • Title, description and other metadata for the job post
  • Other quantified criteria (years of experience, salary, visa status etc.)
  • Employer’s record — number of hires, number of shortlists, types of candidates hired, etc.

While for every candidate, we have the following data on hand:

  • Candidate profile (including years of experience, visa status, etc.)
  • Skillset (category and sub-category of skillset selected)
  • Candidate’s qualifications (licenses held, educational qualifications)
  • Resume (PDF/image/DOC)
  • Platform interactions (jobs applied to, jobs shortlisted for, etc.)

Levels of Matching

“The best way to use machine learning is not to use it at all.” Heuristics and rule-based matching often generate much better results, especially when data is scarce, not taking anything away from the ground-breaking solutions that clever ML usage can create when conditions are suitable. This approach has been penned down with an open mind.

The first step is to isolate the different levels for predicting matches. This can be better understood as:

  1. Hard qualifications (Qualifications that a candidate must have)
  2. Soft qualifications (Qualifications that a candidate should ideally have)
  3. Passive attributes (Data collected passively from candidate interactions)

Let’s understand this better through an example. Suppose a Limo hire agency based out of Delhi is looking for drivers with 3–5 years of experience, who are willing to work on weekends on some suggested pay range. The agency does not sponsor visas, so candidates must already have the required work permits.

In this scenario, our hard qualifications would be — relevant driving license, and visa. Candidates that do not meet these requirements will not be considered.

A soft qualification in this scenario would be — candidate’s chosen skillset. If they have created their profile as a “Limo Driver,” they’ll be ranked higher. Failing which, they’d be ranked lower but not omitted from recommendations. Another example is the distance from the agency’s offices/work location. A candidate closer to the work location will be more relevant for the job than someone based in a different city.

An example of a passive attribute here is the number of limo driver jobs that a candidate has been shortlisted for earlier. This data isn’t directly collected from candidates but can be used to generate better recommendations. A candidate that gets shortlisted more often for a job profile is likely to be a better candidate for it.

Hard Qualifications

Hard qualifications are easy to implement. A Boolean expression filter on your database will do the job. For example, if the requirement is for a medical support staff at a local clinic, having the relevant training and certification is a must. Only candidates who have uploaded proof of the required certification need to be considered.

A challenge here is the verification of the authenticity of such documents. A machine learning based approach can be a life savior, e.g., a multi-class classification model trained on a large labelled dataset of official certification can come in handy. Such a system can be augmented by manual verifiers who weigh in to handle edge cases. This will only grow more robust and accurate with time.

Soft Qualifications

For Quantifiable Attributes

Implementing soft qualifications for quantifiable attributes is relatively simple. It needs a set of variables to match— with defined desired quantities, and a set of ranges from which matches need to be sorted. For example, a set of variables can be — salary, distance, and years of experience. Each with a set target (Rs.15000/- per month for salary, 0 km for distance and 5 years as experience). Then profiles can be sorted using a weighted combination of these parameters, e.g., 10% preference to salary match, 60% to distance match, 30% to experience. The weights will be multiplied with a closeness measure (e.g., absolute difference, etc.) and added to generate the final score. Candidates that have the highest final score after this weighted match will rank highest.

The interesting bit here is that the weights themselves can be adaptable. Let’s say, if a specific employer values experience more than anything else, and is willing to pay for relocation, then the weights can be automatically tweaked— provided that this information is captured through employer-facing forms or other interactions.

More so, the weights can also be passively machine learnt over-time form the hiring and shortlisting behavior of an employer and auto-updated. If an employer frequently hires candidates that have very different expected salaries than the job post, maybe the employer is flexible on salaries and it shouldn’t be a major contributing factor for their recommendations.

For Non-Quantifiable Attributes

Matching soft qualifications for non-quantifiable attributes is somewhat trickier. Understanding how a “Cab Driver” is more relevant than a “Garbage Truck Driver” for our “Limo Driver” profile is not that straightforward, when they both have the required license. This is intuitive for humans, as we understand that being a cab driver entails many of the same tasks that being a Limo driver will entail. To transfer this understanding to an automated system, there are two smart approaches:

  1. Semantic Matching
  2. Manual Categorization

Its important to understand that the specific problem we are trying to solve here, is breaking down an abstract human understanding into structured comparisons that can be automated. This, in general, is a good heuristic for identifying good machine learning problems.

Semantic matching implies finding words that have similar meanings. E.g., “Chef” and “Cook” will have a high semantic match. You can find more details on how to implement a basic sematic matching algorithm here.

While semantic matching is useful, it’s also a good idea to seed your matches with some inherent knowledge that you possess. For example, you can imagine a graph with different job categories linked to each other through weighted edges, where you’ve seeded the weights with your own understanding of how closely a job category related to another. This creates a nice base upon which other algorithms can evolve. Though it’s clearly a tedious process, and not essential, if you don’t have a such a categorization on hand.

Richness Through Passive Data

Now that we’ve dealt with hard and soft qualifications, we have a skeletal recommendation system which would work just fine. But to add some spark to it, we can tap into the vast volumes of data that you’ll be collecting as a job board so that you’re recommendation system evolves over time, getting smarter with each new interaction that happens on your platform. In this section, we’ll explore some avenues where passive data can add richness to recommendations.

Non-Quantifiable Soft Attribute Match using History

A great way of understanding how abstract non-quantifiable terms are related to each other is to evaluate their relationship using the interaction data collected from platform users. For example, if a lot of Cab Drivers are frequently shortlisted for Limo Driver profiles, then there is a high probability that the two profiles are closely related. Relations derived from such interactions can enrich recommendations over time and make them more dynamic.

Candidate Facing Recommendations

What’s an ideal job to show to a candidate? There are two answers to this questions.

  1. Jobs that the candidate would like to see and apply to
  2. Jobs that the candidate has the best shot of being shortlisted and selected for

Now its likely that these two sets have a strong intersection, but that’s not a necessity. I believe that the best approach is a weighted combination of these two goals. I ran a Twitter poll for this, which is attached below. In either case, the relation can be learnt based on interaction data accumulated from other similar candidates and their application/selection history. Similar candidates are likely to get selected for similar jobs.

This post is a rough approach for creating a recommendation system for a job-board in non-ideal real-world conditions. Parts of this post have been developed into real systems that are currently deployed and serving thousands of users daily. But this post is more of a superset of the approach I’ve deployed earlier, and paints what a more complete system should look like. Feel free to reach out for help. :)

--

--

Vivek Kaushal

Product | Hacker | Engineer | Building Enterpret | ex-Samsung, IIIT-H | vivekkaushal.com