Giving Credit Where its Due: Machine Learning’s Role in Lending

Machine learning allows lenders to score credit risk for millions of people who may have had no FICO score before

Unsecured consumer debt in the United States is roughly ~$3.8T dollars, with student loans comprising just over $1.4T of that figure[i]. Undergraduate and graduate students in the United States typically have limited credit history and spotty FICO scores, limiting their options for affordable financing. For the last six months, I’ve served as the co-founder of an organization that negotiates student loan rates on behalf of large groups of graduate students[ii]. I’m heavily invested in the use of alternative data sources to assess the credit worthiness of students, which SoFi, a leading fintech lender, has been innovating on for the past seven years.

SoFi, founded in 2011, provides student loan refinancing and other financial products aimed at higher income individuals who have yet to build material wealth. Many of these potential borrowers have limited credit histories, so figures like FICO scores provide an incomplete view on credit risk. In fact, nearly 20 percent of Americans have no FICO score at all[iii]. SoFi began using machine learning to assess the credit worthiness of borrowers by taking into account a wide array of factors that traditional lenders do not frequently consider – educational attainment, utility payments, insurance claims, and mobile phone usage, among others[iv]. The Philadelphia Federal Reserve argues that “adding alternative data into the mix may make it possible to open up more affordable credit for millions of additional consumers”[v]. This additional data is compared to actual credit payments over time to identify a large set of traits that make one more or less likely to fulfill financial obligations.

SoFi uses machine learning to identify new customers and to learn which ancillary products it should recommend to existing customers. Today, SoFi’s data science team mines large datasets and runs various regression techniques to tease out the relationship, if any, between attributes in those datasets and credit worthiness.

Credit history datasets are unwieldy, comprising millions of rows and thousands of columns merged into a single file[vi]. Storing, cleaning, and analyzing that data can take considerable time, so much of SoFi’s current action plan is aimed at increasing computational capacity and reducing throughput time of analyzing the data. Yan Wu, SoFi’s Head of Analytics recently said, “the most important thing that cloud computing has done is make incredibly high-powered machines available for testing. What happens is cycle times become shorter and iterations become quicker”[vii]. With faster iterations, SoFi can more readily learn if a new data source improves its ability to predict credit risk.

Once SoFi has customers in the funnel, it seeks to cross-sell new financial products and to increase its share of wallet. It is currently releasing a financial planning app, which has tens of thousands of customers on a waiting list[viii]. Through this app, SoFi provides insights to customers on how others in a similar age and income bracket are using their money. It plans to use machine learning to identify financial products that individual customers are likely to benefit from and is developing a recommendation algorithm to market those services to customers.

SoFi’s current strategy is based on collecting data later in a customer’s lifecycle, specifically post graduation. It could benefit by: 1) expanding its customer funnel and lending to students while in school, and 2) publishing data on expected outcomes for students based on major, school, and a variety of other factors.

If SoFi enters the direct student lending market, it could leverage its rich refinancing data to price private loans better than existing players in a highly fragmented market. It could look at its existing dataset to identify the least risky graduate programs, based on repayment rates on refinanced loans from those programs.

SoFi should also consider providing a public service that could build its brand awareness among students before they even choose which school to attend or which major to study. Most schools publish high level data on median starting salary, employment percentage, etc. SoFi could begin publishing much more granular data that allows prospective students to input some basic assumptions like school, intended major, graduate program (if applicable), and desired city/state to see expected financial outcomes based on historical data.

The use of machine learning presents the opportunity to democratize risk and increase access to financing for millions of people who may otherwise be denied it. A broad array of lenders, not just SoFi, are using alternative data sources to assess risk. As these techniques become more prevalent, we must be mindful that historical data is subject to historical biases. Marginalized groups who may have been systemically denied access to capital in the past may have incomplete data to draw from. As we add new data sources, we should always ask whether these new metrics bias against certain portions of the population.

Word Count: 799


End Notes

[i] New York Federal Reserve, Student Loan Data and Demographics [Excel Download], 2017, URL:, Accessed Nov 2018

[ii] LeverEdge, Student Loan Negotiation Group, 2018, URL:, Accessed Nov 2018

[iii] Consumer Financial Protection Bureau, “Data Point: Credit Invisibles”, May 2015, URL:, Accessed Nov 2018

[iv] Peter Rudegeair, “Silicon Valley: We Don’t Trust FICO Scores”, The Wall Street Journal, Jan 11, 2016, URL: ,Accessed Nov 2018

[v] Julapa Jagtiani, “The Roles of Alternative Data and Machine Learning in Fintech Lending: Evidence From the LendingClub Platform”, Philadelphia Federal Reserve, 2018, Pg. 3, URL:, Accessed Nov 2018

[vi] Thomson Reuters, “SoFi’s data science head: Opening the funnel to non-traditional borrowers with machine learning”, October 17, 2018, Pg. 2, URL:,Accessed Nov 2018

[vii] Thomson Reuters, “SoFi’s data science head: Opening the funnel to non-traditional borrowers with machine learning”, October 17, 2018, Pg. 3, URL:,Accessed Nov 2018

[viii] Ainsley Harris, “Are you ready to ditch your bank? SoFi is betting its future on it”, Fast Company, June 19, 2018, URL:, Accessed Nov 2018


Alibaba Has Something to Say about What You Should Buy and Whom You Should Marry


GM and Machine Learning Augmented Design

Student comments on Giving Credit Where its Due: Machine Learning’s Role in Lending

  1. This is a really interesting article.

    SoFi’s lending model was a big component of our thesis to invest in the company in 2016. Traditionally lending firms underwrite based on very similar outdated models (essentially only weighing FICO, salary, net worth) which as you mentioned leaves a demographic underbanked but additionally also misprices loans for individuals with high-level similarities but very different backgrounds and credit risk profiles (for example, a recent graduate from HBS MBA whose income and worth will scale vs. someone who’s been working a stable job). However, it will take time to develop a more sophisticated, more accurate credit model since closing the feedback loop (% defaults, prepayment, etc based on the independent variables tested) takes longer with loans that are for several years to even decades for mortgages. Experiments are also potentially quite costly. When SoFi tried to eliminate FICO all together from its personal loans credit models last year, the cohort data showed disproportionate losses. It is inevitable that the future of lending is going towards an automated, multi-factor model – it’s just a question of how long (and costly) will it be for us to get there.

    Your suggestions about direct lending and public service data are really interesting and also timely given that rising interest rates are making the core business of student loan refinance less profitable – will pass along.

  2. Hey Abkarians – this is a super interesting article and very relevant to what you’re doing!

    I have two topics that are interesting to explore. First, I’m curious how you think so-fi could expand to more severely underserved populations. My impression is that they do target the HENRY segment (high-earning, not rich yet). An interesting dimension here is the risk that measuring based on traditional educational attainment may truncate a massive and increasing part of the market as the education space undergoes its own rapid change (How would a so-fi look at a high-school dropout that is finishing a coding bootcamp or other untraditional degree?).

    I’m also curious about the prospect of them expanding down to first-time student loans. It seems that the majority of their refi models are based on later-stage consumer features (educational attainment, utility payments, insurance claims, income, repayment behavior on the original student loan), so I’m curious how they would actually go about expanding into the market that you’re targeting!

  3. I appreciated your insights on the potential of the SoFi model to make capital accessible to communities and individuals who have been historically denied access to credit and loans. I totally agree with your note at the end that SoFi and its competitors should ensure that the data inputs are free from bias. I see how this technology would work for young people pursuing higher education who are on a trajectory to build wealth, but I wonder how (if at all) it could be applied to consumers who don’t have that same potential for mobility? Can machines take a chance on individuals with “risky” or “unattractive” inputs in the same way staff at community banks or nonprofit organizations can, for example? It seems that the promise of increasing access to capital potentially still requires a human element.

    Also, nice use of “throughput time”!!

Leave a comment