How to Build a Data Science Foundation

How to Build a Data Science Foundation

How to Build a Data Science Foundation

By Samir Sharma

About two weeks ago, I attended a data science event, which was run by Premier IT (a recruitment consultancy).  The event was titled: “Data Science and the Impact on Society” and the speakers were as follows (in order of appearance):

  1. Victor Hu – Head of Data Science at QBE
  2. Magda Piatkowska – Head of Data Science at Telegraph Media Group
  3. Giles Pavey – Chief Data Scientist
  4. Martin Goodson – Chief Scienctist at Evolution AI

An excellent line up, and a very diverse bunch of people and companies being represented.  From Insurance, to media to retail.  Everyone had interesting views on data science and the areas around it.  Let’s begin with Victor Hu at QBE.  What follows are some of the points that were made and my own take on those points.

Data Science Capabilities in Insurance – Challenges of building a data science foundation from scratch

As part of the preamble started Victor provided a view of QBE and if you don’t know – QBE are one of the world’s largest insurers.  They’ve been trading now for 130 years, have 17000 people, are in 38 different countries, and have a market cap A$18.6bn!  Their objective over the next four years is to deliver $100m in benefits.

What Analytics can do for QBE?

4 pillars that are key for building out analytics:

He started out with the outcomes that QBE wanted to achieve.  They have 4 pillars and yours may be similar.  These are focused outcomes that they business has laid down as a statement that they want to achieve over a period of time to affect their entire business.  From sales, to underwriting, to claims etc.  It’s better to understand the outcomes that you want to achieve before you start any data science activities and need to be a part of your data strategy.   From your outcomes, you can work backwards to understand the initiatives required to achieve these outcomes, as well as the inputs, outputs, activities and tasks that will support the outcomes.  So, they aren’t just derived from thin air, there is a lot of work that is gone into this by working with the relevant parts of the business to ensure there is value and buy-in.  In Victor’s case QBEs four pillars are:

  • Enhance the retention of customers
  • Optimize distribution process, broker management
  • Improve pricing
  • Deliver better risk selection

The focus on delivering the benefits:

The focus for Victor and his team to start achieving the above benefits is to build MVPs (Minimum Viable Products), that can incrementally deliver value asap, using an Agile approach, speed combined with reuse and being able to scale.

The MVP has now become quite the buzz in the data science circles, which has allowed data scientists to build components quickly, to answer the question, which then leads onto further questions.  So, always iterating in small cycles can be very powerful.  It provides the business with a quick sample of what is achievable in a short space of time, and then allows the data scientists to dazzle when they come back with deeper insights.  The test and learn approach is befitting the way data scientists see the world, as without this approach one would feel like they were fumbling around in the dark for their keyboards.  That wouldn’t bode well for the business and no kudos at all for the data scientists!

Going on from there, Victor spelled out the current challenges that they have, these are historical, as no doubt an organisation that big, and been going for that long will have issues of: change, technology that is old and needs replacing, a non-agile method of delivering new systems and applications – just a few of the issues that many organisations deal with.  So, what are the challenges for QBE?

Challenges for QBE

  • Speed – current IT system, data science stack, development cycle
  • Accountability – business support, incentives structure
  • Process – sales, new data acquisition, do they exist?
  • Culture – communication, data driven decisions, agile
  • Capability – using the hub and existing team to best effect
  • Investment – expense challenge across the business, iterations

Fundamentally, those bullet points are typically the challenges that most companies deal with.  Let’s break the points down:

  1. Speed looks at how systems and development activities are performing. Are they lagging?  Do they slow down the cycle from concept to delivery of applications? Is the current technology up to snuff and can it deliver the performance required by a data scientist to get the insights?
  2. Accountability looks at how well embedded and committed the business are in this whole area. Do they understand what is required and can they implement the solutions?  Do they have the right attitude to accepting new technologies?  Are there incentives that will drive the behaviours required to work with new outputs?  All questions that are commonly asked within programmes of all sizes.
  3. Process looks at how you validate what you are doing. Is there a need?  Can you sell it internally as well externally?  Is the data required already available or do we have to go and get it from an external source or create it ourselves?  Something that is driven through the data strategy to ensure we companies can meet their outcomes.
  4. Culture is a large part of this and companies continue to struggle with this. Are we able to work within an agile development framework rather than waterfall?  Can we understand what the business needs internally?  Are we going to use the output that is provided through data science to make the date driven decisions? Do we have a data-driven mindset?  Typical questions that are asked everywhere as many projects of this nature are prone to failure.
  5. Capability looks at how you can start using resources internally that already exist to support data science. Do we have analysts in the business that know the data inside out and can support garnering insights?  Do we have technical capabilities that can be tweaked to provide us with good data scientists?
  6. Investment one of the most important factors. How do we pay for these initiatives?  Can we create a proof of concept and then put forward a business case once results are validated?  Are we able to build iteratively rather than big bang to show value?  Typically, with all data projects we always want to go down the route of small development cycles, to prove that we are creating a product that can drive all the required behaviours, and not consuming all the money in one hit!  I think CFOs like the idea of quicker return on investment!

Project prioritisation principles

Victor went onto elaborate how they prioritise projects and the typical questions they ask themselves to make sure they are making the right decisions about the value roadmap.  It’s a good checklist and something that you can refer to when thinking about how you want to move forward with any data project to keep you honest.  The points all have a direct correlation with the above challenges, and is a way of keeping data teams and the business on their toes.  Using these questions will make all parties think about whether they are doing the right thing for the business, and ensuring vanity projects are not an option.  The list is very simple and I think Victor summed it up very nicely:

  1. Why: Must generate benefits – by changing existing decisions and processes and must be trackable. Reuse & scalability
  2. What? Does data exist in sufficient volume to support advanced models?
  3. How? Business engagement is key – sponsorship, side by side support
  4. When? Lean methodology – fail quickly, agile way of working
  5. Who? Team growth – internal mobilization

In summary – Why is data science important?

Just by having data science isn’t going to change everything instantly with insights suddenly coming from every angle.  It must be carefully implemented; the business must be involved to ensure there is a benefits case applied to what is required.  The data scientists should have a solid understanding about the business and how they will add value.  Both the business and data scientists need to speak the same language and this isn’t python or R!  Companies need to become more data-driven in their focus to accept and implement change quicker.  Projects should be driven in smaller iterations to prove the value to those that are sceptical and have “seen it all before”!

In this case, without the business the data scientist is nothing and vice-versa.  Also, the data scientist isn’t the only person that is required in the technical team – you will need data analysts, business analysts, developers, data architects etc.  There is a whole eco-system that is required, to fuel a data science project or any data project for that matter.  Do start with a data strategy as this will be important to drive the outcomes and initiatives.  Don’t forget there are incredible benefits from working in rapid, iterative cycles to prepare, analyse, test, learn, review and start again. Fail fast and jump back into it with more knowledge and better questions.