This Is How An Aspiring Data Scientist Buys A Second Hand Car

Author : Ayruz

Insight by: Ayruz

I’ve been wanting to buy a Honda city ever since I saw a modified one owned by a friend. So I started looking for a good deal, and as any other person would do I went over to Google and started searching. Obviously, I ended up in those second-hand car listing sites.

Those sites are like a jungle, too many listings, too many models, a wide range of prices. How will I decide Which car to buy and what price to pay for it? Lets put to use some of those things we learned in school and college.

Step 1 – Get the raw data

I got a techie friend to write a script to scrape the data from those sites for the car Honda City. The script output an excel file with the following fields

  1. Year of manufacturing
  2. Odometer reading
  3. Price

Since it was a Honda city I need not consider fuel type. Only petrol was available until recently.

Step 2 – Cleaning up the data

Some entries have wrong information, some entries are incomplete. There are 2002 models listed for 10 lakhs, such entries are not genuine entries. So I sorted the data, found all such entries and deleted them. Now I am left with an excel sheet of around 1000 listings.

Step 3 – Visualizing and analyzing

Since it is very hard to generalize information contained in a 1000 listings I started visualizing it. The simplest visualizations are scattering plots. Lets plot year of manufacturing on the X-axis and listed price on the Y-axis.

The price of the car drops sharply during its initial years. The sharp decline in value continues up to 2007 – 2008 and then it starts to level off. New cars lose half of their value in the first 4-5 years of ownership.

Let us look at how the price varies with the odometer reading of the car. For this, I plotted the graph and did a regression to get the best fit curve.

Here also the price of the car drops steeply during its initial time. Then it starts to level off by around 75,000 KMs and stays flat for very long mileage.

Step 4 – Drawing Inferences

Since the price of the car depreciates sharply during its initial years it better to find a car that is around 5 years old. In the case of Honda city, the best deal would be car 2007 or 2008 model that has not done more than 75,000 KMs. You can drive that car for another 5 years and 75,000 KMs without losing a lot of its value.

Disclaimer:- The inferences are purely based on data, I am not responsible if you use this technique and end up in a mess. Always use the help of a good mechanic and inspect the car. These are mechanical devices and are susceptible to a lot of issues.


This was simple, but if you are sitting on tons of data and wanting to analyze it with all the latest tools and visualization techniques contact my friend at who is a proper data scientist. He can do magic with your data.

Notify of
Inline Feedbacks
View all comments