Parking Slot Detection Deep Learning

Posted on

Highly accurate parking slot detection methods are crucial for Automated Valet Parking (AVP) systems, to meet their demanding safety and functional requirements. While previous efforts have mostly focused on the algorithms’ capabilities to detect different types of slots under varying conditions, i. Would it make sense to try and do traditional object-detection with the free-space being the bounding-box to predict? Or would it be better to instead predict the main boundary around the objects, identify the objects and then subtract (e.g. This is a car-park in the photo, these are the cars so car-park - cars = empty parking spots). As deep learning-based object detection has been actively researched, this technique has also been applied to the parking slot detection task 20- 22. The method in 20 generates entrances of. A smart camera is a vision system capable of extracting application-specific information from the captured images. The paper proposes a decentralized and efficient solution for visual parking lot occupancy detection based on a deep Convolutional Neural Network (CNN) specifically designed for smart cameras.

Abstract:A small example of Opencv + tensorflow~

introduce

Do you often look around the parking lot to find parking spaces? If your cell phone can tell you exactly where the nearest parking space is, is it very cool? Facts have proved that it is relatively easy to solve this problem based on deep learning and OpenCV. The following GIF images highlight all available parking spaces on the parking lot and show the number of available parking spaces. The key is that the process is real-time.

You can find the code I used in Github repo.

Summary of steps

There are two main steps to build this parking detection model:

1. Check all available parking spaces;

2. Determine whether the parking space is empty or occupied;

Because the camera view is installed, we can use OpenCV to map each parking space once. Once you know the location of each parking space, you can use deep learning to predict whether it is empty or not. I have shared an overview of a high-level step involved on my blog. If you are interested in detailed code, please check out my blog.

Detection of all available parking spaces

The basic idea of parking space detection is that all parking points are divided by horizontal lines, and the spacing between parking points in a row is roughly equal. Firstly, Canny edge detection is used to obtain the edge image. I’ve covered up the area where I haven’t parked yet. As follows:

Then I did a transformation on the edge image and drew all the lines that it could recognize. I only choose lines with slopes close to zero to isolate horizontal lines. The output of the transformation is as follows:

As you can see, the Hof transform has done quite well in identifying parking lines, but the output is noisy – several stop lines have been detected many times, and some have been omitted. So how do we solve this problem? Then I start with observation and intuition, and use the coordinates returned by the transformation to gather the X coordinates to determine the main parking lane. Clustering logic is used to identify the interval of lane line x coordinates. So you can identify 12 lanes here. As follows:

If all this looks complicated, don’t worry about 5 — I’ve documented the code step by step on github’s jupyter notebook. Now that I know where all the parking lanes are, I can identify each individual parking space by reasonably assuming that all parking spaces are the same size. I observed the results carefully to ensure that the boundary between points was captured as accurately as possible. At last I was able to mark every parking space. As follows:

After locating each parking space, we can assign an ID to each point, save its coordinates in the dictionary and pick it up so that it can be retrieved later. This is possible because with the camera installed, we don’t need to calculate the position of each point in the view over and over again. For more details, please log on to my blog.

Identify whether the parking space is marked or not

Now that we have a map of each parking space, we can determine whether the parking space is occupied in the following ways:

1. Use OpenCV to check whether the pixel color of parking space in video image is the same as that of empty parking space. This is a simple method, but easy to make mistakes. For example, changes in lighting will change the color of an empty parking space, and this method will not work correctly in a day of light changes. If possible, this logic would treat gray cars as empty parking spaces.

Parking slot detection deep learning system

2. Use target detection to identify all cars, and then check whether the car’s position overlaps with the parking space. I made an attempt to find that real-time detection model is really difficult to detect small objects, and the detected vehicles are not more than 30%.

3. Using CNN to detect each parking space and predict whether someone is parking, this method has the best effect.

To build a CNN, we need images of parking spaces with and without cars. I extracted the image of each parking space, saved it in the folder, and then grouped the images. I also shared this training folder on Github.

Because there are nearly 550 parking spaces in 1280×720 pixel images, the size of each parking space is only about 15×60 pixels. Below are the images of empty and occupied parking spaces:

However, because occupied parking spaces and empty parking spaces look very different, this should not be a challenging issue for CNN.

However, I only have about 550 pictures of these two classes, so I decided to use the first 10 layers of VGG and add a single soft Max layer to the output of the VGG model for migration learning. You can find the code of this migration learning model here, with an accuracy of 94%. See below:

Now, I combine parking detection with CNN predictor to construct a parking detector with high accuracy.

I also recorded the code running this on the video stream in notebook.

conclusion

I was amazed at how easy it was to combine different tools and use in-depth learning to build practical applications, and I finished this work in two afternoons.

Several other ideas for further exploration:

1. if it is possible to extend the parking detection logic to any parking map that may use deep learning, it is great that OpenCV restrictive use of each use case.

2. The VGG model used by CNN is a weighting model. We want to try a lighter model.

Cloud server 99 yuan group buying! Laxin can also win cash red envelope! Three million waiting for you to split up!
One Click to win the red envelope immediately: http://click.aliyun.com/m/1000019899/


Author: [Direction]

Read the original text

This article is the original content of Yunqi Community, which can not be reproduced without permission.

The Era of Drones

Although the idea of vehicle detection is not a groundbreaking one and has been around since the emergence of video cameras and embedded sensors, these methods were often marred by high capital and maintenance costs and a high complexity from having to integrate multiple data sources, each with a limited band of inputs. The prevalence of drones in the commercial market in recent years on the other hand, has brought about a new era of state-of-the-art aerial photogrammetry and a drastic reduction in the cost of obtaining aerial data. With this sudden increase in information, and by combining machine learning with GIS technologies, we are now capable of performing new and insightful analyses on issues of interest.

Existent business problems which stand to benefit from this include customer flow analyses and demographic modelling. This is particularly useful for those in the retail sector looking to monitor peak business hours by counting the number of parked vehicles at a given time and also extrapolate useful customer information (such as income, marital status, household size and even political inclination) by classifying the types of vehicles they own.

Parking Slot Detection Deep Learning Tool

So, can we solve these problems using AI and GIS? The answer is yes, and a good starting point would be to come up with a workflow that tallies the number of cars per unit time as well as infer the vehicle type for every positive detection. In this article, I aim to give a comprehensive overview of a such a workflow — from data acquisition and processing using Drone2Map to performing data inferencing using TensorFlow and ArcGIS Pro, and finally to creating actionable BI visualizations using The Operations Dashboard in ArcGIS Online (AGOL).

Data Collection & Exploratory Analysis using Drone2Map

To obtain some sample data, we flew a drone over a busy parking lot here at our office in Redlands, California and obtained a series of geo-tagged tiff files with geolocation corroborated by 5 ground control points (GCPs) to ensure the result would be accurate enough to identify the correct parking space for each vehicle. These images were captured along a “lawn mower” flight path with an overlap of 70% along flight lines and at least 60% between flight lines. The reason we do this is to facilitate the generation of true orthomosaics. Traditional orthos (or “frame” orthos) suffer from what is known as the “layover effect”, where tall structures such as trees or buildings seemingly “lean” toward or away from the observer as a consequence of stitching together disparate frames that do not capture objects at a true nadir perspective. This effect worsens for objects that are at the edges of a drone’s field of view.

Using Drone2Map, we can take these individual frames and resolve for an object’s true ortho by finding common views between a frame and its 8 adjacent frames in a point cloud and then keeping the views that have a high degree of overlap. The resultant orthomosaic is not only a true ortho, but one that does not reveal seamlines between images typical of frame orthos.

Of course, all of this is automated, and the actual image processing step is simple: Create a 2D mapping project in Drone2Map, pull in your data, add ground control points as needed and then hit start to generate a true orthomosaic.

From this, we obtain 3 classes of output products: A 2D orthomosaic of our parking lot, a digital surface model (DSM) layer and a digital terrain model (DTM) layer. An initial thought was to simply pass the DSM to a detection network to produce bounding boxes on distinctly “car-like” protrusions. However, on closer inspection of the dataset we identified some potential issues with this approach: in this particular parking lot, the coverage of foliage was so extensive as to affect the detection of certain cars partially or completely hidden by overhanging branches and leaves.

The overhanging vegetation affected both the DSM and orthomosaic, but since the edge of each image includes oblique view angles at the image edges, some images were able to view partially or completely underneath the tree canopy. ArcGIS also enables each image from the drone to be orthorectified. Following photogrammetric processing in Drone2Map, each image could be analyzed in its proper geospatial placement, providing multiples views of each parking space.

Processing of these oblique views was beyond the scope of this initial project, but will be the subject of future testing. In addition, parts of the canopy depicted by the orthomosaic are not fully opaque, and the RGB bands may also provide additional channels that would allow correct subdivision of vehicles into categories such as trucks, sedans and SUVs. From a data collection standpoint, it is also much easier to collect pure ortho imagery than outfitting drones with LIDAR sensors for DSM/DTM detection.

Simple Approach: Building a classification model using InceptionV3

Our first solution for tackling this problem was an obvious one: Simply overlay a polygon layer from a mapped parking lot on top of the orthomosaic raster and clip out cars using the Split Raster geoprocessing tool to get our prediction set. This was very easily done.

Then comes the question of which classification model to apply atop which finetuning set. A simple off-the-shelf model that’s available from both TensorFlow Slim and TensorFlow Hub is InceptionV3. Based off the original InceptionNet (Szegedy et al.), this third revision bears much resemblance in terms of core structure to the original with similar component modules. However, it has the addition of factorization methods to reduce the representational bottleneck as well as label smoothing and batch norm operations on the auxiliary classifiers to increase regularization.

As with most TensorFlow Hub models, there is no need to train from scratch when we can apply transfer learning; Luckily, InceptionV3 was pretrained on ImageNet.

As for the finetuning set, The Cars Overhead with Context (COWC) dataset from LLNL was an easy pick for its richly annotated set of 32,716 vehicles as well as hard negative examples (boats, commercial vehicles etc.). Although the dataset doesn’t plug straight into the classification network, the only preprocessing work here involves reading through the list of the annotated text files and cropping the associated jpegs with either OpenCV or ImageMagick. (N.B. the COWC dataset has 4 class labels: Sedan, Pickup, Others and Unknown. With an additional background class that makes 5 classes. I had to play around with class balancing to ensure each class was sufficiently represented, and also mine some samples for background classes which weren’t provided in COWC).

These images have been extracted from the original COWC format which is comprised of large 22000x22000 images with bounding box information contained in ancillary .txt files. One thing to note is that the original bounding box coordinates tended to crop off large sections of the vehicles in question, which ultimately lead to worse overall performance in identifying the pickup class. Therefore these images have all been cropped using larger 34x34 bounding boxes.

By looking at the COWC dataset we can immediately tell that the resolution of the dataset is not ideal. Early training results have shown that the classification network performed exceedingly well in binary classification tasks of identifying occupied/unoccupied spaces, but performed worse in distinguishing between vehicle classes. Part of the reason for this is also due to the fact that the pickup class was severely underrepresented. Testing proved that undersampling the sedan/other/background classes yielded the best validation accuracy (as opposed to oversampling the pickup class).

Once the model was sufficiently trained (on a good GPU this takes a couple hours — for me it was a Tesla K80 on the GeoAI VM for about 1.5 hours at a training accuracy of 0.85 and a validation accuracy of 0.84), we can proceed to apply our previously extracted prediction set to the model.

Output these results into a .csv file (it might be useful to apply some smart naming conventions here to ensure your data items match the OBJECTIDs of each polygon in the parking space feature layer. From here, simply combine the two layers using Add Join and voilà, you have a polygon layer that associates a class probability with each parking space based on an aerial image you took.

Thus far, we have only created some rich geotagged layers that are not yet informative nor intuitive enough give any kind of analytical insight. This is where ArcGIS Online offers us a path forward: we export both the orthomosaic as well as the feature layer to our ArcGIS Online Portal, then optionally in the MapViewer, modify the symbology of our polygon layer to be attribute-driven.

You can then import individual maps and visualize the results interactively on the Operations Dashboard:

We scheduled two drone flights over the same parking lot at different times of the day. To complement this data, we have also generated some mock input to illustrate occupancy info on an hourly basis that simulates customer flow within a typical work day. The Hour Selector cycles through parking patterns over every time segment. To suit your use case, you may also decide to deploy a drone every few weeks or every few months. The Average Occupancy Heatmap is a visual representation of “hot spots” wherein vehicles aggregate. If you like it is also possible to generate a “Turnover Heatmap” that maps regions in which vehicles are likely to park for longer/shorter periods. Both of these views are potentially useful for understanding demographic behavior when crossed-referenced with vehicle types or to detect favorite stores & customer stay-times at shopping venues.

The two useful vehicle categories from COWC: sedans and pickups, are shown in the Vehicle Types visual element, each color coded and also with a transparency level linked to its classification confidence. Finally, pure vehicle counting/detection is presented in the Current Vehicle Classification Analysis view with a gauge to show the current occupancy.

Shortcomings of a classification-only model

Our blind assumption for a classification-only approach is that all cars fit neatly inside each parking space polygon (failing to take into account bad drivers, double-parkers or your regular F150s so easily cropped off by the sensibly-sized parking spaces). Of course, there are other use cases for vehicle detection for which a predefined polygon layer is simply impossible to draw (think roadside parking or parking lots for which there are no guidelines). These coupled with the fact that a simple classification network is simply not “smart enough” prompted us to think of another approach to this problem.

Parking Slot Detection Deep Learning

Better Approach: Building a detection model using Faster-RCNN

The fundamental idea behind a Faster-RCNN network is that it does two things at once: It detects the bounding boxes of objects of interest using a Region Proposal Network (RPN), and performs classification on those detections using a base classifier after region of interest pooling (ROI).

Like our previous attempt, we used COWC for fine-tuning. This time, there is no need to extract individual vehicles for training. To visualize what the dataset looks like, you can test out the following snippet by replacing the (x,y), width and height values in the patches.Rectangle() method with whatever value is shown in an image’s corresponding .txt file.

I also had to manually convert these images into Pascal VOC format to be consumed by the Faster R-CNN model. Unlike the classification model, training this model took roughly 24 hours on the K80. This is because in Faster R-CNN, anchor generation within the RPN forms a bottleneck (potentially generating up to 6000 region proposals per image). Other detection models such as SSD or YOLO (at least the first generation) ameliorate the speed issue at the cost of lower mAP scores. However, seeing as we are not concerned with near real-time detection, it suffices to use Faster R-CNN to produce a good output at reasonable speeds (~7 seconds or so on the CPU for each image tile of size of 600x600 which is rescaled from the original 5000x5000 px). Again, Faster R-CNN is provided in the TensorFlow models library. There you can also find some default configuration files for the base classifier you have chosen (ResNet101 pretrained on the MS COCO dataset in my case). There are in fact a whole slew of pretrained base models from TensorFlow’s detection model zoo you can choose for yourself, each with their own speed/mAP trade-off.

This is part of the configuration file I used, note the height and width values that match the COWC “patch_size”, or size of each input image tile. This can be configured in the CreateDetectionScenes.py script from COWC’s Github page. Additionally you can choose to base your classification model on something other than ResNet101. Depending on the size of your detection it may also be prudent to adjust the scales of your anchor for faster convergence, although I find it much more effective to re-tile your inputs if each object ends up comprising less than 4% of your input size.

You can of course write your own evaluation script to visualize the trained model. There is in fact a very good template on TensorFlow’s Github page. I made some modifications to the following snippet to also allow you to adjust the detection threshold and the number of boxes to draw which I find very useful in visually understanding the performance of your model early on in the training process:

Alternatively, if you wish for a more straightforward approach to performing inferencing, the upcoming ArcGIS Pro 2.3 offers a convenient geoprocessing tool “Detect Objects Using Deep Learning” to perform evaluation on any input raster by passing it a trained model in the form of a frozen inference graph protobuf defined inside a model description JSON file. As well as several evaluation hyperparameters such as padding and detection threshold. Hit run and you get a new feature layer of bounding boxes in return.

Parking Slot Detection Deep Learning Software

Similar to the classification approach, we can visualize these layers much more effectively by exporting them to ArcGIS Online and viewing them in the Operations Dashboard.

All the views from the classification dashboard can also be represented here with the same analyses drawn, but now vehicles parked outside of predesignated spaces (including vehicles in motion) are all detected using Faster R-CNN.

Closing thoughts and future work

This article serves as an exploratory glimpse into machine learning-driven GIS for vehicle detection, and of how important business decisions can be informed from start to finish by leveraging the powerful Esri ecosystem to create a complete BI workflow.

Parking Slot Detection Deep Learning System

The models described above can of course be taken a couple steps further, by training on an input set with richer annotations or cross-referencing the COWC dataset with datasets that tie together vehicle types with income and other demographic info to produce even more powerful analyses, reveal subtler patterns and yield finer customer segmentations. Likewise, these models apply to aerial detection of all kinds (crops, utility poles, animals, forest fires, ships), as long as you have access to a decent GPU and a fine-tuning dataset.

Another potential expansion to this pipeline would be to introduce oblique imagery into the mix in a manner similar to what was done in Esri’s Oblique Viewer App that allows for multiple oblique views to be attached to the same ortho frame. For vehicle detection this effectively increases the number of images from which we can extract high fidelity data and also gives an unobstructed view of cars hidden under tree canopies.

Hopefully this has been an interesting read, please give us a clap and share this post if you enjoyed it, and let us know in the comments what other insights can be drawn from these data and whether you think there’s a better approach to be considered!

This effort was done as part of the Esri GeoAI team. For other cool GISxML projects, check out the GeoAI Medium page here. Feel free to contact Omar Maherfor internship or full time opportunities! ([email protected])

Parking Slot Detection Deep Learning

]]>