About Me

Abdishakur Yoonis

MSc Data Scientist
MSc Software Engineer
BSc Software Engineer

Data Science

Data Analysis

Artificial intelligence (AI)

Machine Learning

Data Engineering

Business Intelligence

Software Engineering

Software Development

Python, R, SQL, C#, Java, JavaScript and many more

Feel free to reach out or follow

A closer look into the data of Seattle’s Airbnb market

The Libraries That I Have Used
My Project Motivation
File Descriptions
Summary Of The Results
Acknowledgements

The Libraries That I Have Used

The Anaconda Python 3.0 distribution was used to accomplish the project. In addition, the following python libraries have been implemented:

Collections
Matplotlib
NLTK
NumPy
Pandas
Seaborn
Sklearn

My Project Motivation

I was curious to look into the AirBnB dataset for Seattle. I needed to discover more about pricing patterns, customer feedback, and pricing forecasting. Some of the questions I’ve looked into are:

Question 1 - PRICE ANALYSIS

What is the peak season in Seattle and how does pricing change with the seasons?
How does pricing differ by neighbourhood, and which Seattle neighbourhoods are the most expensive?
What effect does the kind of property in an area have on the price of the most costly neighbourhoods and the most prevalent property types?

FINDINGS

According to the chart above, the peak months are June through August, with July being the highest. With summer in full swing and low potential of rain, the chart validates my hypothesis that these months in Seattle have the optimum weather.

Furthermore, it looks likely that the year begins gradually, with the minimum average price in January. Prices begin to rise again around April/May respectively, as we approach Spring and the holiday season.and November/December for Winter holiday.

FINDINGS

According to the above analysis, pricing variations between neighbourhoods are unavoidable. With an average price of $231, the Southeast Magnolia area appears to be the most expensive of all.

Followed by Portage Bay at $227.

Rainier Beach appears to be the cheapest, with an average price of $68.

FINDINGS

We concentrated on the top 5 most expensive neighbourhoods from the above analysis, along with Houses and Apartments, because we recognize they make up a significant portion of property types based on the previous analysis.

Houses in Portage Bay are the most expensive, followed by Houses in West Queen Anne and Westlake, as seen above. It’s worth noting that in Westlake, both houses and apartments are almost the same price.

Question 2 - SENTIMENT ANALYSIS OF REVIEWS

How can we classify reviews focused on sentiments?
Can we correlate positive and negative attitudes from reviews to neighbourhoods to see which neighbourhoods have higher positive sentiments and which have higher negative sentiments?
Is it possible to look into some of the worst reviews for additional insights?

Visualize top neighbourhoods based on reviews

Visualize bottom 10 neighbourhoods based on reviews

FINDINGS

Some of the best-rated neighbourhoods include Roxhill, Cedar Park, and Pinehurst. University District, Holly Park, and View Ridge are the neighbourhoods with the lowest rankings.

Investigate the worst reviews

FINDINGS

It’s worth noting that the majority of the reviews with low polarity ratings appear to be written in a language other than English!. Maybe the Sentiment Intensity Analyzer has this limitation.

The other three reviews appear to be genuine complaints, with users lamenting the lack of A/C and fans, the host’s rudeness, construction noise disrupting people’s stay, and the place’s terrible state, among other things.

Question 3 - PRICE PREDICTION

Can we forecast a listing’s price? What aspects of the listing have the best correlation with price prediction?

File Descriptions

The analysis performed in order to investigate the dataset, data preparation and wrangling, and the creation of prediction models in order to answer the questions above are all documented in the Jupyter notebook. Markdown cells are included in the notebook to aid in the documentation of the procedures as well as the communication of findings based on each analysis.

For reference an HTML version of the notebook is also available.

Lastly, the seattle folder contains the dataset from Kaggle (https://www.kaggle.com/airbnb/seattle). Finally, the dataset from Kaggle(https://www.kaggle.com/airbnb/seattle) is contained in the seattle folder.

It consists of three files:

calendar.csv: calendar attainability of listings and price
listings.csv: detail about all the attainability listings
reviews.csv: listing customer feedback

Summary Of The Results

The following are among the most major findings from the analysis:

The summer months of June through August are considered to be the high season in Seattle, with July being the absolute peak.
The most expensive neighbourhood in Seattle was “Southeast Magnolia,” followed by Portage Bay. The best price was Rainier Beach.
When I looked into other neighbourhoods and property types, I discovered that the most costly houses are in Portage Bay, followed by residences in West Queen Anne and Westlake.
I was able to map the reviews to their various sentiments of positive, negative, or neutral using Sentiment Intensity Analyzer. I discovered that 97.2 percent of reviews were generally good, with only 1% being negative and 1.8 percent being neutral.
By looking at review feelings by neighbourhood, I discovered that Roxhill, Cedar Park, and Pinehurst had the most positive ratings, while University District, Holly Park, and View Ridge had the least.
Sentiment Intensity Analyzer combines non-English reviews with negative sentiments, which I discovered while investigating the worst reviews.
I was capable of predicting price based on a prepared and cleaned dataset using Linear Regression, with a r2score of 0.62 on both the training and test datasets.
It was discovered that a combination of host characteristics and descriptive information about the listing had the greatest effect on the price.

Blog on Airbnb-Seattle-udacity-project

I have written a blog on website Github Page about the project and observations. Link is down below https://abdishakury.github.io/

Acknowledgements

Kudos to AirBnB for uploading the dataset and Kaggle for hosting it; the dataset can be found here: https://www.kaggle.com/airbnb/seattle

SentimentIntensity Analyzer Reference: https://www.nltk.org/api/nltk.sentiment.html

Heatmap Reference: https://seaborn.pydata.org/generated/seaborn.heatmap.html

A closer look into the data of Seattle’s Airbnb market

Data Science Portfolio

About Me

Abdishakur Yoonis

MSc Data Scientist
MSc Software Engineer
BSc Software Engineer

Data Science

Data Analysis

Artificial intelligence (AI)

Machine Learning

Data Engineering

Business Intelligence

Software Engineering

Software Development

Python, R, SQL, C#, Java, JavaScript and many more

Feel free to reach out or follow

A closer look into the data of Seattle’s Airbnb market

Table of Contents

The Libraries That I Have Used

My Project Motivation

Question 1 - PRICE ANALYSIS

FINDINGS

FINDINGS

FINDINGS

Question 2 - SENTIMENT ANALYSIS OF REVIEWS

Visualize top neighbourhoods based on reviews

Visualize bottom 10 neighbourhoods based on reviews

FINDINGS

Investigate the worst reviews

FINDINGS

Question 3 - PRICE PREDICTION

File Descriptions

Summary Of The Results

Blog on Airbnb-Seattle-udacity-project

Acknowledgements

A closer look into the data of Seattle’s Airbnb market

Data Science Portfolio

About Me

Abdishakur Yoonis

MSc Data Scientist MSc Software Engineer BSc Software Engineer

Data Science

Data Analysis

Artificial intelligence (AI)

Machine Learning

Data Engineering

Business Intelligence

Software Engineering

Software Development

Python, R, SQL, C#, Java, JavaScript and many more

Feel free to reach out or follow

A closer look into the data of Seattle’s Airbnb market

Table of Contents

The Libraries That I Have Used

My Project Motivation

Question 1 - PRICE ANALYSIS

FINDINGS

FINDINGS

FINDINGS

Question 2 - SENTIMENT ANALYSIS OF REVIEWS

Visualize top neighbourhoods based on reviews

Visualize bottom 10 neighbourhoods based on reviews

FINDINGS

Investigate the worst reviews

FINDINGS

Question 3 - PRICE PREDICTION

File Descriptions

Summary Of The Results

Blog on Airbnb-Seattle-udacity-project

Acknowledgements

MSc Data Scientist
MSc Software Engineer
BSc Software Engineer