Harnessing the power of Process excellence, Analytics & NLP to design the content strategy for a data science knowledge portal
The demand for skill-sets in data science and AI has been exponentially increasing over the past few years. However, the supply of skilled data scientists is not increasing at the same pace, thereby leading to a big gap between demand and supply.
In addition to structured training for various disciplines in data science, an all-around knowledge sharing & enhancement fueled by data science knowledge portals will help to a large extent in bridging the demand-supply gap for data scientists.
Success & effectiveness of the knowledge portals depends on the range of content that can be delivered at the best content quality in the shortest possible time. This in turn depends on identifying suitable content contributors and automating the overall content curation process.
Overarching approach of the content strategy
The overarching approach for designing the content strategy is based on harnessing the power of process excellence, analytics & NLP to ensure content & contributor quality by the following:
Building blocks of the Content Strategy
The following will be the building blocks of the content strategy:
Building a content matrix with proposed content types
The portal will have an even mix of various content types and a multi-dimensional content matrix will be developed. The criteria for arriving at the content types will correspond to each dimension of the multi-dimensional content matrix. The criteria proposed are:
Content hierarchy & flow of viewership in the content chain
The content hierarchy is to enable a seamless flow of audience from the simplest form of content that needs a relatively less attention span to the most complex content types that need a lot more attention and interest.
- The first level of the content hierarchy would include content pieces like a daily dose of a concept in analytics, short youtube videos on concepts, etc that would demand very less reading time from the reader. These would help in drawing the initial audience for the portal and could serve as leads to the more intense content types that would need a greater reading time and a higher span of attention from the reader.
- The next level in the hierarchy would be blogs which could be based on the first level and could be of any of the criteria proposed above‚??-‚??for various levels of audience and on various concepts of data science & ML. These will deem greater attention that the first level and will usually emerge from one of the level one content types.
- A selected collection of blogs on a particular topic could then be combined into an e-book‚??-‚??the criteria for selecting articles for the collection could be either blog on similar topics as well as blogs which have been received well by the audience with a high read ratio and viewership.
- The next or the highest level of content type could be extensive reference works on broad areas in Data Science which would have extensive articles, podcasts and industry developments on that area which could be re-purposed as training content for that area.
This way, the content hierarchy would create a chain of viewership with one level leading the reader to the next and so on‚??-‚??thereby creating a seamless viewership funnel.
The content chain based on the content hierarchy
The content chain is an inspiration from the food chain in the ecosystem which shows the ideal flow of traffic from one level of content hierarchy to another.
Proposed flow of traffic across the content chain
The forethought behind creating the content hierarchy is to start with content requiring least span of attention and gradually build up the trajectory across more complex content types needing a higher span of attention.
The flowchart below illustrates the proposed flow of traffic across several levels of content types with colors representative of the level of content hierarchy.
Setting up content metrics
The objective of designing the content metrics is to evaluate the effectiveness of the content both in terms of content quality and audience preferences. This will provide a sense of direction to the editorial team on how to steer the content & contributor selection.
A number of repeated views vs Number of views‚??-‚??This indicates how many users have read the content more than once, which means that the content piece is referred to frequently and a higher ratio here could mean that the content piece could qualify as reference material.
A number of clicks & Clicks to view ratio‚??-‚??This is a direct measure of the content quality since it indicates how many viewers who have clicked on the content have read the complete content.
A number of likes‚??-‚??This is a measure of viewership for the content.
A number of highlights & citations‚??-‚??Similar to the first metric, this is a measure of how often the content piece is referred by other authors. It also gives a measure of other sources which directed traffic to the content piece.
Bounce rate‚??-‚??This is a typical metric from web analytics which indicates the % of viewers who navigate away from the content piece as soon as they enter. This is an indication of low reader engagement.
Sentiment scores from sentiment analysis of the reader comments will provide insights around the most liked and disliked aspects of content.
Impact factor‚??-‚??Finally a comprehensive score could be developed for the content piece based on a combination of several metrics mentioned above which is a measure of content quality and how much it engages the reader.
The key to high-quality content capable of engaging viewers is to have a set of passionate and knowledgeable contributors who can create content masterpieces. The core principle of hiring contributors is to identify candidates who have an equally balanced passion for writing as well as data science. The hiring process should consider the following aspects while looking for prospective contributors:
- Motivational fit to write
- Alignment of contributors' passion with the vision of the knowledge portal
- Domain expertise
- The projected propensity of retention
- Good mix of contributors with diverse demography & domain expertise
The contributors are sourced, based on a combined automated and manual search and typically will include:
- Industry experts/celebrities in various domains of data science
- Current writers who have their blogs or write for other publications
- Data science enthusiasts who want to start writing
- Student writers
Semi-automated approach for Contributor selection & on-boarding (except Industry experts / celebrities)
The approach illustrated below will help in creating a well synchronized and documented process flow for creating a pipeline of contributors and capturing information about the contributors at every milestone of the selection process.
The semi-automated approach will also ensure that the entire selection process is scalable as the number of contributors and the publication size increases without introducing a bottleneck of dependency on people.
Work allotment for contributors
Once the contributors are selected based on the above selection process, it is important to ensure that the contributors have a constant flow of work without being under-occupied or over-occupied. As the publication grows in size, a structured and automated approach of work allotment is again crucial to maintain scalability.
The process of work allotment should be based on the following factors:
Matching content to be created with domain and capacity of contributors
Setting up an automated allotment system based on profile matching so that content is auto-assigned to contributors without any wait time
Create a pipeline for contributors to work on based on contributor availability and publication deadline.
Making the assignment process transparent and measurable
Content created by the contributors should be evaluated on-the-fly and using a multi-layered approach. This will be a combination of rule-based algorithms, NLP and manual editorial layer with a feedback mechanism to the rule-based algorithm to maintain consistent content quality. This will ensure that maintaining content quality is not a bottleneck that hampers with the publication cycle time.
The top three layers are automated NLP based layers to maintain content quality which will considerably filter content before it is passed on to the manual filter thereby reducing manual effort and thereby human dependency. Further the feedback from the manual layer will be used to enhance the learning of the NLP layers thereby strengthening the automated quality layer day by day.
Handling reader review
Any business venture is successful only if it can listen to its customers and work diligently on customer feedback. This holds good for the portal to evolve as an audience-centric portal driven by voice-of-customer. Reading reader reviews and responding to reader questions is important to sustain this aspect. This could be achieved again with a combination of an NLP layer and a manual layer to ensure that this is scalable.
1. Text mining of reader reviews to flag highly critical reviews or unhappy audience and an automated escalation mechanism so that such reviews are brought to the attention of relevant stakeholders
2. A chatbot or Q&A layer based on RNN set up to answer reader questions. The questions that cannot be answered by the chatbot can be forwarded to a human agent and the answer can be fed back to the chat bot for learning.
Proposed high level process flow
Considering all the aspects discussed in the overall content strategy, the following high-level process flow that incorporates all aspects of the content strategy is proposed:
The content strategy that evolves based on the various aspects of process optimization, NLP & data science concepts will lead to an optimized & reliable content flow for a knowledge portal. Finally such a strategy also strengthens the knowledge portal's ideals in terms of "Practicing what is preached".