NWK-WTC | HOB-WTC | JSQ-33 | HOB-33 |
---|
These statistics were gathered using a simple Python script to scrape the entire @PATHTrain twitter feed at the end of each month.
The tweets were then organized based on the line affected, and then put into a spreadsheet template. The template categorizes tweet categories based on service changes, and ignores the rest of the tweets. It then takes the time difference between the tweets to come up with the amount of time for each delay.
There is error checking in place to prevent duplication of time and ignoring absurd delays. Many times the twitter feed will repeat delay announcements, but additional tweets are ignored and categorized as a single delay. I�ve also noticed that several times delays are announced, and service resolutions were not tweeted. This causes delays of 24 hours+. These types of errors also occur when analyzing the AM and PM rush hour because of the arbitrary cut off times.
This method is not perfect, but it does catch any major errors and the overall trend is accurate. I�m sure there is a better to way to go about this, but I thought it would be interesting to do a �quick and dirty� analysis to see if what appears is supported by the data.
Based on my spot checking of the data, I believe that the flaws are actually understating the amount of delays that occur and I�ve chosen to keep it this way to be on the safe side. It is better than overstating the problems.
There is also the fact that many times after service has been brought back to �normal� it make take an additional 10-15 minutes for service to actually become normal. This is evidenced by the amount of replies to the Twitter feed claiming this exact thing.
I am fully open to any suggestions or comments on the data or the website. Please direct them towards my twitter account.