Love in a Hopeless Place: Machine Learning at OkCupid

How one dating company can warn off the creeps and (maybe) help you find the love of your life.

Browsing through the wedding section of the New York Times, the announcements follow a fairly standard formula – wedding date, info on work and family, and lastly, usually a line or two dedicated to how they met [1]. While traditional meet cutes do make for more entertaining reads, it is more often than not that the couple met through an online dating service. But this is hardly the trend for just the glitzy young millennials whose beaming portraits are featured in the NYT. Currently, over 20% of heterosexual relationships and 70% of same-sex relationships in the US start on the internet, with this avenue quickly becoming the predominant method for meeting a significant other [2]. Online dating companies hoping to capture a share of this growing market must therefore have a competitive edge.

The Online Dating Ecosystem

In this $4B industry, a few key players dominate the market. Match Group, the owner of OkCupid (as well as Match, Tinder, and 45 other dating businesses), accounts for roughly one third of the total market [3][4]. Although OkCupid is part of the larger Match Group and its holding company IAC, it still needs to differentiate itself in order to remain relevant in a very crowded marketplace. OkCupid’s longer profiles and Q&As have allowed the company to identify itself as the go to place for users who are not simply looking for just a hookup, but who may also not be ready to get married right away. Newer entrants such as Hinge, however, have begun incorporating elements of OkCupid’s model. But what OkCupid has that isn’t easily replicable is its vast trove of customer data courtesy of the hundreds of questions its users have answered. It has historically used this data to determine compatibility between matches, but with machine learning, OkCupid has the potential to leverage this and other sources of user information to provide even more value.

Data at OkCupid

With a reputation for being a highly data-driven company, OkCupid has relied on its machine learning algorithms to connect people. A higher match percentage means that a couple will have a higher likelihood of clicking, with different weights given to different questions [5]. But beyond simply producing matches, OkCupid also incorporates machine learning as a community improvement tool. Its support & moderation team monitors machine learning alerts that detect harmful or abusive language. With the aid of technology, OkCupid can respond quickly to instances of harassment while bringing in human moderators on an as-needed basis [6].

Currently, OkCupid has not defined a longer-term strategy to incorporate machine learning more deeply into its matching process. The team at OkCupid maintains that there will always be an element of chemistry that cannot be replicated online, and so their main goal is to connect people who already have a lot in common [7]. However, with the advances in machine learning, an argument can be made for more sophisticated algorithms involving even more data points that can maybe tell us what we do not even know about ourselves.

From OkCupid to AmazingCupid

To determine what OkCupid could do in the context of machine learning, we must first understand the pitfalls that current dating apps face. Jeremy Arnold, co-founder of the now defunct dating startup Launch Social, illustrates the struggles that many singles have encountered in the following graphic [8]:

Unsurprisingly, the reason most dating apps fail to eliminate these pain points is due to lying, whether intentionally or inadvertently. By relying solely on the answers to its questions, OkCupid assumes the user knows who they are and what they want. But personal biases and societal pressures can often lead to people answering questions in a deliberately unreliable way. One way to account for this is to link the data that OkCupid already has on a user to data from other social media sources to form a more holistic picture. OkCupid may never purposely call people out for discrepancies between their dating profile and their tweets, but if it can know what to weigh more, it can help to determine which is more likely to be true.

OkCupid’s distinguishing feature has been its focus not just on looks, but this is an area that machine learning can also help with. To do so in a way that still aligns with the company’s values, say that a user says she likes a certain height, but routinely messages people who are shorter. The algorithm can learn that this is not actually as much of a deal breaker for her as she originally thought, and start shifting its recommendations without notice.

But even as our online presence expands and more data becomes available, could machine learning algorithms ever progress to the point that we would trust its results? How far would we go to never go on a bad date again?



(800 words)

[1] “Binge Read Featured Couples.”, 2018. [], accessed Nov. 2018.

[2] “The Irresistible Rise of Online Dating.”, Aug. 17, 2018. [], accessed Nov. 2018.

[3] Mangalindan, JP. “How Match got away with buying 25 dating sites — and counting.”, June 25, 2018. [], accessed Nov. 2018.

[4] “Match Group.” [], accessed Nov. 2018.

[5] “OkCupid Is Teaming Up With ACLU To Connect Like-minded Daters Looking For Love (and Justice).”, June 6, 2018. [], accessed Nov. 2018.

[6] Paul, Kari. “Why OkCupid wants to slow things down.”, Aug. 29, 2017. [], accessed Nov. 2018.

[7] Ibid.

[8] Arnold, Jeremy. “How could machine learning best be applied to online dating apps?”, Dec. 29, 2017. [], accessed Nov. 2018.



Lady Justice and the Machines: An Algorithmic Approach to Criminal Justice Reform


A World Without Checkout Cashiers…But What Are We Giving Up to Save 2 Minutes?

Student comments on Love in a Hopeless Place: Machine Learning at OkCupid

  1. Great read!

    OKCupid co-founder, Christian Rudder released a book called “Dataclysm” in which he reveals trends in user data. Two points I found interesting were on race relations and attractiveness. From Rudder’s analysis, he found a bias against black users on dating sites. Measures that OKCupid deems success, including how people rate black users, how often people reply to their messages, how many messages they get are all reduced in comparison to other races. Additionally, women generally are attracted to men that are relatively their same age, whereas men tend to rate 20-year-old women the highest on the site [1].

    Though both of these trends measure people’s opinions and actions on the website and do not reflect their behavior in reality, machine learning algorithms are being trained from these actions and delivering potential matches with this input data. Are we biasing our algorithms and perpetuating the trends above?

    [1] “Online Dating Stats Reveal A ‘Dataclysm’ Of Telling Trends.” NPR, NPR, 6 Sept. 2014,

  2. Interesting article. With the mention of excessive data collection, especially in the avenue of online dating, the major concern is data protection. Recall a couple of years ago when Ashley Madison, an online website that connects married individuals with others looking for a “fling” was hacked. The hacked information was made public and led to huge scandal that affected multiple homes.
    As OkCupid plans to collect even more data, it should ensure the data is safeguarded as it contains highly personal data.

  3. Fascinating read! Did not know the internet had become so prevalent in creating relationships! I think what is extremely interesting is the notion you mentioned of OkCupid’s algorithms being able to identify things that we like that we did not even know ourselves – I see that as a powerful competitive advantage over traditional dating as our potential matches will less likely be limited by our perceptions and biases and therefore create a wider range of possibilities and potential matches. However, as you eluded to, the effectiveness of this model is predicated on people’s willingness to provide both sufficient and accurate data.

  4. Interesting read. One catch with the algorithm screening for people’s revealed preferences (i.e. messaging people who self-report to be shorter than the user’s stated preference) is that we also don’t know whether the other guy is lying about their height. This calls into question the whole notion of whether self-reporting data is useful, or if people should just post photos of themselves instead (which is what Tinder and the like tend to do).

    The notion that our data footprint can tell us more about who we are than we know about ourselves is extremely interesting. Leveraging that for online data puts us in Black Mirror territory, which is a little scary but definitely not too far off in the future.

  5. Fascinating read! I had heard that OkCupid used data but had never realized the extent. However, I wonder if they are fulling using their data capabilities at this point. How are they following up with couples that meet? How do they learn from good dates and bad dates, and how to they know when matches have actually worked? To me, that is the crux of their machine learning capabilities. I do have some reservations though. For example, there are likely important social implications of matching people who are similar together. By which criteria do they measure similarity (interests, geography, race, socioeconomic status)? Finally, how do they maintain their competitive edge with the network effects they have created in such a growing, fragmented market with many new entrants?

  6. Cool article! In the post, you mention that OK Cupid could use known discrepancies from other platforms to inform how the algorithm would work, but not to “call out” people for their lies. How come you wouldn’t want to use data to call this people out? In my opinion, a big problem in online dating is false statements on profiles. If OK Cupid is able to guarantee a more honest dating app experience, it may have a competitive edge in the crowded market.

  7. What a fun read! Love the idea of using ML to prevent people with nefarious intent from causing harm on social media platforms. It made me think of other applications: identifying bullying, individuals with mental health issues, possible intent to cause harm to ones-self or others. To what extent should social media companies be responsible for preventing the aforementioned issues that occur on its site? Is it their ethical duty to do so? Just some questions that came to mind. Thank you for your thoughtfulness.

  8. Thanks for posting. I think OK Cupid is an excellent application of machine learning. Your point about unintentional misinformation is especially good; if we don’t know ourselves well, we might enter inaccurate preferences, but with enough data, a machine learning algorithm should be able to figure out which stated preferences tend to be reliable. It’s also worth noting that the downside here is relatively limited, which makes dating a good application for a technology still in its early phase. I care less about a bad date than I do about a misdiagnosed disease.

  9. Awesome article on a very interesting topic! This leads me to think about how much information you can gather from a person using social media, and how much of it is actually true. We tend to show our best face in social media, and may be very different in reality. I would be interested to see the accuracy in the delivery of a more holistic image using social media versus what the person truly thinks of himself (ie. what he/she answered in the OK Cupid questions) and how the application weighs what is true and not. It is undoubtedly very useful for the purpose of identifying fake users/creeps and an invaluable benefit to the users – and to society as a whole.

  10. Thanks for sharing MM! Very interesting to know that 1/3 of all the dating apps is actually owned by the same company and that they are essentially creating or for lack of better word acquiring other apps to buy market share. A few thoughts top of mind reading your post… Firstly, I believe that most of the apps are already doing that verification of information with other social media sites as a hurdle to join or match users. These apps include bumble and The League. Also, another was the feasibility of this given privacy laws; for example, unless the individual is required to link their other social media platforms (which might reduce the number of users and we all know that would hurt the app’s network effect!) it might not be possible to access the data.

  11. Thanks MM for the article! There was an interesting Economist article highlighting the value coming from the increased efficiency in the marraige/dating market a few months back. It’s main point was that the more compatible matches one has, the better their odds are that they end up with the best person for them in the long run [1]. Because these dating sites are in a market of sorts, there is feedback from their customers holding them accountable. If they don’t like the matches, they leave. If they do like the matches, they stay. Using data to improve the odds that high quality matches are made seems to increase the odds that any individual finds meaningful romantic relationships. Even if it knows the preferences of its users better than the users admit to themselves because of its ability to analyze user behavior, it seems as though everyone is better off.

    [1] “Modern Love,” The Economist, August 18 2018,, accessed Nov. 2018.

  12. Very interesting article! I actually believe that the use of machine learning in such dating apps could be very useful. Given the massive amounts of scams and fake users on dating apps, I strongly believe that machine learning’s biggest advantage in this context would be to filter out scams and bots/fake users. That in itself, is a great competitive advantage and would, in my opinion, improve dating apps drastically and increase customer adoption. However,I am not a proponent to using machine learning for increasing the efficiency of matches and trying to help people “find the love of their life”. My biggest concern is data privacy and how would that data be used. We have seen how sensitive the issue of data privacy could be and how ugly it could get given Facebook’s recent data privacy scandal. My second issue is that while I believe that machine learning is a great tool in using data and iterating to come up with new models and algorithms to refine the results, a machine, in my opinion, can never be able to process feelings and emotions. Dating apps are mostly based on these “softer aspects” that in many cases can not be rationalized and iterated to find better results. People don’t usually know what they are looking for in a partner or what attracts them in a person, and in my opinion no matter how a machine learning model could be, it wouldn’t be able to capture those “random”, non-rationale, varying factors.

  13. Thanks for sharing! How do you think the business model for OK Cupid (and Match Group overall) relate to their use of machine learning and algorithms? While these websites/apps want to ultimately match successful couples so they can have positive testimonials etc., aren’t they also incentivized to keep paying customers around for a while? If they’re able to improve their algorithm so significantly that they are able to provide customers a strong match very soon after joining, they’ve only hurt their own business prospects. Are they incentivized to create some “noise” and imperfection in their algorithms?

Leave a comment