Introduction
The post-pandemic world relies heavily on mobile apps to do its shopping. Research suggests that smartphone users spend over 3 hours on their phones every day, and one out of five users spend over 4.5 hours daily looking at their phones. Considering the average time spent on mobile phones daily, it’s no surprise that apps are rapidly replacing eCommerce websites and are becoming the go-to shopping channel for today’s shoppers. To stay abreast with today’s changing retail landscape, Intelligence Node has introduced a cutting-edge mobile app scraping solution to complement its battle-tested, award-winning eCommerce website scraping technology. Let’s take a closer look at the technology and the process behind mobile app scraping:
There are 2 ways to scrape mobile apps:
Scenario 1
- When composite APIs are open (e.g. Amazon) – In such cases, there is a small setup involved, but ultimately the scraping does not differ much to standard websites.
Scenario 2
- When composite APIs are encrypted (e.g. HEB, Dollar General, Stop & Shop, Target etc.). This case is much more complicated and requires specialized mobile device scraping, OCR (Optical character recognition), and other machine learning techniques to be deployed. Please find below a brief explanation of Intelligence Node’s approach:
Methodology
Step 1) Recording a mobile session/navigation (via special visual navigation scrapers)
Technology consideration: To make the process scalable, we leverage real device cloud and emulate a cluster of devices that use our smart proxy network.
Step 2) Identifying product ROI (Region of Interest)
Since there is more than 1 product in a single frame, it is difficult to scrape text for a particular product. If attempted, it can lead to mismatched information. To tackle this, we used Object Detection algorithm as our first step to getting the ROI of each product, irrespective of how many products are in a single frame.
Input files are passed to YoloV5, which is custom fine-tuned to identify ROI of products. The architecture YoloV5 has an advantage over different models because of its fast and accurate inference.
The video contains a certain number of frames in a second, which can lead to the duplication of products. To tackle this, we introduced a deduplication stage where we can remove certain product ROIs that are identical. This further helps to process data in a faster and more efficient way. The ROI from the video is then cached in the form of images.
Step 3) Identifying product components
Components such as price, product information, product image, etc., are identified in this stage (Components may vary based on the app). YoloV5 is applied (different instance vs step 2 above as we need precise level consideration to identify the components)
Step 4) Extracting components using OCR
In this stage, we can complete textual extraction since all details of the products have been identified. A custom-trained OCR framework is deployed to get detailed text extraction.
Step 5) Accessing final output
The text extraction output, along with metadata, is stored in the database. Unit tests and quality checks are applied. At this stage, the information is transformed to the desired client format and is ready to be sent/uploaded/requested via API.
Conclusion
As the retail economy shifts further away from traditional brick and mortar to modern, technologically advanced alternatives like mobile and social commerce and the up-and-coming metaverse, retail needs to be prepared with cutting-edge AI and analytics to pivot fast. Intelligence Node understands this and is on a mission to provide brands and retailers with the most accurate and sophisticated analytics across their retail landscape and is furthering its goals by developing solutions like the advanced mobile app scraping solution. And it hasn’t stopped there. Intelligence Node has extended its proprietary technology to the metaverse – scraping retail stores across the metaverse universe, covering major platforms like Roblox, Meta, Decentraland, Sandbox, etc.
Mobile commerce is gaining mainstream popularity, with users spending hours on their mobile devices every day. For retail, this means increased opportunity and growing competition – making mobile app scraping critical to maintaining competitive prices, assortments, and digital shelf ranking. To stay abreast of today’s changing retail landscape, Intelligence Node has introduced a cutting-edge mobile app scraping solution to complement its battle-tested, award-winning eCommerce website scraping technology.