How to Avoid Living in a Glorified Cardboard Box in NYC

The short answer: Data, context, and gainful employment… but read on for more specifics, including which neighborhood I currently believe is the best value overall.

After reading an article about how rents hit an all time high in July 2017, I decided it was not too early to get a jump on my impending move in 2018. Inspired to get a feel for the NYC rental market on my own, I set off to see if there was any way of systematically identifying high value apartments besides spending hours browsing the web.

Naturally (as any normal person does), I decided to start off by web scraping as many of the current rental listings from StreetEasy for the neighborhoods I was most interested in to begin with (I may add more later). Honestly, this was the hardest part of the whole process since StreetEasy is quite obviously worried about competitors scraping their data and using it for their own gain. However, I was able to get it working using python, selenium, and the google chrome driver. If you are interested, I have posted the code on my GitHub page. (I do plan on expanding my search and enriching the data more in the future)

When it was all said and done I had detailed information for ~1,200 one bedroom apartment listings from 14 different neighborhoods spread across Manhattan and Brooklyn. Rather than only dealing with neighborhood summary statistics, I was able to get a real sense of the price, size, and amenity variations within each area. The best way to start digging into this data was to create a Tableau Dashboard (which I have posted on Tableau Public)

What I learned

There is a big difference between ‘Williamsburg’ and ‘East Williamsburg’

Once the hipster capital of the world (no longer the case since they have a Dunkin Donuts), Williamsburg has become one of the most desirable places to live in Brooklyn for all different types of people. I was surprised to see that the median rent of $3,100 made it the 9th most expensive neighborhood of 14 I analyzed, which is slightly misleading. When you separate out East Williamsburg, the median rental price in Williamsburg goes up $200 (making it more expensive than SoHo). However, the silver lining is if you still want to tell people you live in Williamsburg but are on a budget, the median price in East Williamsburg is $2,450 (that is cheaper than any of the other neighborhoods I analyzed). In fact, the maximum amount you should expect to pay for a 1BR in East Williamsburg is $3,300… the same amount as the median price in Williamsburg “proper”.

Be prepared to pay for that doorman

I primarily focused on 6 amenities, which I personally may want in my next place: a balcony, doorman, elevator, washer/dryer in-unit, dishwasher, or rooftop access. (I purposefully left out zipline, an amenity I have not had since middle school)

In general, listings with a doorman are 25% more expensive than those without ($3,771 vs. $2,850). On the other side of the spectrum, balconies generally meant that the listing would only be 7% more expensive. (Note: There are other confounding variables at play here, since these amenities rarely come alone, but it at least provides a ballpark estimate of their value)

In addition, Tribeca had the highest percentage of listings with a doorman (88%), while SoHo had the lowest (0%), which was interesting given their close proximity.

If you are looking to find an apartment with a balcony, I suggest you start your search in Vinegar Hill, where 40% of the apartments have one.

Square footage is consistently and conspicuously absent

In an effort to maximize value, the logical starting point was price per square foot. However, only 36% of listings show the square footage making it difficult to trust any conclusions made on this metric alone.

Interestingly, the most expensive neighborhoods are the ones that report square footage the least often. West Village and Tribeca reported square footage on 14% and 28% of their listings respectively. Apparently, in these neighborhoods they want people measuring value by the experience and culture of the area (priceless) rather than how much space your money buys…

For those of you who are curious, the square footage information that I was able to scrape suggests that your money goes almost twice as far per square foot in DUMBO than it does in the West Village ($7.28 vs $4.60), and DUMBO is the most expensive neighborhood in Brooklyn. Additionally, of the neighborhoods I sampled, your money goes the absolute farthest in Carroll Gardens ($3.05).

So where is the best value?

WARNING: the conclusion below is definitely debatable, and I am hoping to come up with a better way to quantify the value of amenities to make this conclusion less qualitative…

Brooklyn. More specifically, DUMBO/Vinegar Hill. Based upon the data I have seen thus far, you get the most amenities and space for the money. Feel free to play with the dashboard yourself and see if you come to a different conclusion.

One of the best ways to hone in on the Price vs. Size tradeoff is with the chart seen below. The bluer a circle is, the less per square foot it is, and the bigger the circle, the more amenities it has. You can even select a specific apartment and link to the StreetEasy listing!

Wishlist for other metadata (future state)

How good is the view? Beyond text analysis for keywords that would infer there is some sort of view this gets fairly complicated.

How much counter space/closet space/cupboard space is there? This would likely require that there is at least some sort of floor plan available (which is not always the case).

How far is the nearest grocery store (or other important ? This would involve pulling some data from the Google Maps API.

Who “Won” Tomorrowland 2017?

Some of you may have read my last article where I analyzed 59 sets from EDC Las Vegas 2017. This time, I put on my big boy pants and analyzed 236 sets from both weekends of Tomorrowland in Belgium. This means I had more than 14,000 tracks played by ~200 DJ’s over the course of 2 weekends to feed my analysis. Similar to last time, this data was web scraped from 1001tracklists.com and I have made the code available on my github page if you wish to do something similar yourself. In addition, I also created a summary dashboard on Tableau Public if you wish to explore the data in more detail yourself.

*Note: I have added links to some songs along the way for your enjoyment, so please read on!

1. DJ Snake

DJ Snake had the the most tracks played at Tomorrowland of any DJ by far (65), which was 23 more than the next highest artist (Axwell /\ Ingrosso). In addition, his songs were played by a wide variety of artists (38), far and away the broadest reach of any artist playing at the festival (the next highest was Calvin Harris with 23). The main drivers behind his popularity were his two biggest hits “Propaganda” and “Let Me Love You”, which were the #2 and #5 most played songs overall.

For those who may be interested in a new spin on these (already overplayed) hits, the most popular remixes were the “Propaganda (Nom de Strip & TJR Remix)” and “Let Me Love You (Don Diablo Remix)”.

2. Ed Sheeran

As amazing as it would be to see Ed Sheeran working the turntables and screaming for the crowd to “put their hands up in the air”, sadly he was not. However, he still ended up having his songs played (in one form or another) by 22 different artists at the festival. This puts him at tied for third alongside of Axwell /\ Ingrosso and Valentino Khan. Much like Kendrick Lamar was for EDC, Ed is the most popular artist not appearing in person to have his tracks played (given that they are such similar artists this should come as no surprise *sarcasm*).

“Shape of You” led the way as the most common song played, but if anyone is looking to impress their friends with their Ed Sheeran discology, I would recommend checking out the most popular remix of “Castle on the Hill” by Gareth Emery & Ashley Wallbridge.

3. Hardstyle

While trap music may be slowly taking hold of the Americas, its older cousin hardstyle is alive and well in Europe. By using Python scikit-learn K-Means clustering and my limited knowledge of a few hardstyle artists, I was able to decipher which other artists fell into this genre. For me personally, exploring cluster 12 on the Tableau dashboard led to some entertaining artist (“Phuture Noize“) and song (“Destination“) discoveries. Worth noting: hardstyle is very high energy and is definitely not for everyone.

Note: I plan on writing a post that goes into the clustering in more detail, drawing a few more insights from the data and explaining the methodology.

4. Heads Will Roll (A-Trak Remix)

While mining this data set for new and exciting remixes I ran across this track, which was tied for 2nd as the most commonly played remix at Tomorrowland 2017 (12 plays across 11 DJs). Personally, I found this incredibly amusing since it was released 8 years ago (2009, if you cannot find your calculator). Therefore, this track is a winner for its popularity and longevity at a festival well known for revealing tracks never heard before.

It is also be worth noting that Don Diablo Remixes were incredibly popular (as can be seen on the left).

What is Next?

As I alluded to before, I tested out whether I could use python to cluster various DJs based upon the the tracks and artists they played. I plan on providing a more detailed analysis of this output soon.

Spoiler Alert: The clusters are on the Tableau Dashboard already… if you agree/disagree, leave a comment!