Abstract

In the 21st century, transportation brought great convenience to people, but at the same time, automobile transportation is the major factor causing greenhouse gas emissions and climate change. Movements of the world towards green environments, there is hike in use and production of electric vehicles (energy vehicles). However, with the continuous growth in the number of energy vehicles, it is necessary for the government to provide strong support in the construction of charging piles. Real-time and effective management has become a practical problem for the relevant departments which needs to be solved. This paper uses the information research method to fuse the huge amount of heterogeneous data generated by the charging pile resultant to the new energy electric vehicle in the vehicle network and introduces cloud computing as its storage module to facilitate the storage and related expansion of the big data. This paper proposes a system scheme of heterogeneous data fusion based on cloud computing for the acquisition, storage, and fusion of heterogeneous data in the vehicle network. After testing the results, it shows that the system is stable and effective in practical application, which can meet the design requirements of the system. What is the significance of analyzing big data of charging point? Considering from the supply side, obtaining the user’s charging behaviour data is helpful to build a digital map of the charging pile of new energy vehicles, connect the service information between the vehicle enterprises and the charging pile enterprises, and provide the most comprehensive and effective real-time charging information covering the widest range of vehicles, which can solve many problems of information asymmetry in the current charging information service.

1. Introduction

The importance of green transportation embodies not only the concept of sustainable development but also the impact of climate stability on human health [1]. China’s new energy vehicles have a strong momentum of development. They have been developed rapidly in terms of model matching, technology research and development, and new energy vehicle consumer market and made breakthroughs in the fields of enterprises, technology, and market. However, they also face great challenges: weak industrial scale effect, high cost and price, short battery life, battery problems, and charging convenience problems in pure electric vehicles [2].

The first thing to be solved for electric vehicles is the battery problem, which lies in the weight and service life. The convenience of electric vehicle charging is also an important factor for mass production. A car with an ordinary tank capacity of 50 liters can fill up a tank of oil in 5 or 6 minutes. Whether the charging time of electric vehicles can be controlled within a few minutes is uncertain. A certain electric vehicle can only be charged 70% in 10 minutes at a special charging station, and it will take 6-7 hours to be fully charged at a household 220 V socket. Many cars of the people in China are parked in the underground parking lot or the parking space of the community, with few matching charging plugs. It is a good way to set up a charging station in the gas station, but the battery charging time should be guaranteed, and the service life of the battery after repeated charging is also unknown. Therefore, the construction of charging piles will play an important role in the normal operation of new energy electric vehicles, and how to reasonably build charging piles has become an urgent problem to be solved by the government [3, 4].

In the long run, big data is expected to change the competitive ecology of the new energy vehicle market. On the one hand, big data is conducive to understanding customers’ consumption preferences and realizing customized product services. In the future, automobile manufacturing is expected to present a new format of “hardware + software + services”; on the other hand, it is expected to connect big data with the power battery traceability and retirement platform, and the full life cycle management of the power battery is also expected to achieve a perfect combination of digitalization and informatization [5].

The establishment of a long-term mechanism for the healthy development of new energy vehicles is inseparable from the integration of industrial chain resources. The real value of big data lies in data analysis, deep mining data value, providing high-quality digital services, and promoting value achievements conducive to the development of the industry, so as to achieve the long-term development goal. Data acquisition is the premise of mining the big data value of new energy vehicles. Only by promoting the data sharing of new energy vehicles, can we play a greater market value and provide better services for users. However, it is difficult to realize the interconnection between industries at this stage. Charging difficulty is one of the important problems restricting the development of new energy vehicles, but it is not only the number of charging piles but also closely related to the charging service of charging pile enterprises. Now, there are many enterprises providing charging services. Charging data is the core data of the pile enterprises, so it is difficult to integrate the relevant data of the pile enterprises [6].

Big data application realizes the interconnection of cars, people, and piles, integrates the data of charging piles into mobile phones and car machines, and provides intimate data services for new energy owners at all times. Rich fun in use will also have many benefits for the promotion of new energy vehicles. Driving behavior is a very typical application scenario of big data. The quality of driving behavior not only affects the level of energy consumption but also affects the driver’s driving portrait, including driving habits, charging habits, emergency response, and other scenarios [7, 8].

At present, many enterprises are fully mining the potential value of big data in the field of new energy vehicles, for example, through monitoring the operation data, improving the user’s use behaviour, and realizing the intelligent health management of the whole vehicle and the power battery; through OTA system, realizing the upgrading of the BMS and ensuring the charging safety of the vehicle; building an intelligent interactive system between the BMS on the edge and the cloud, making the functions of the BMS on the edge more accurate; building a high-precision calculation model, eliminating the estimation deviation to the greatest extent, and realizing high-precision electricity pool capacity calculation; building the system model of battery capacity and battery safety to realize safety early warning; and analyzing the attenuation data of the power battery, realizing rapid classification of echelon utilization, and optimizing the whole life cycle management of the power battery [9, 10].

The integration of massive heterogeneous data in the Internet of Vehicles is an important technical means to build a green city and green transportation. With the support of Internet technology, valuable information can be quickly and accurately extracted from massive traffic data, and the foresight, initiative, timeliness, and coordination of traffic management can be greatly improved. Under the background of the rapid development of the Internet of Vehicles, heterogeneous data fusion in the Internet of Vehicles based on cloud computing will certainly play an important role in the improvement of road traffic management, so as to make green traffic more “sustainable” [5].

2. Big Data Foundation of New Energy Vehicles

The traditional data analysis field is mainly based on structured data such as table data, which are relatively solid. With the rise of computers, Internet of Things, and other technologies, a large number of unstructured data, such as images, sounds, and videos, began to emerge, and the data scale also showed an explosive growth trend. To fully understand the basis of big data and accurately grasp the characteristics of big data will help to mine the intrinsic value of massive data [11].

2.1. Big Data Features

In 1890, Hollerith, an American statistician, invented an electric machine to make statistics of American census data and completed the expected work for eight years in one year, which is considered as the earliest application example of the big data method. At the end of the 20th century, human society stepped into the era of computer and Internet, and data also entered a period of explosive growth. In 2008, the global data volume was only 0.49 ZB, while in 2017, the global data volume was 21.6 ZB, 44 times of that in 2008. Researchers predict that, by 2020, the global data volume will reach 35 ZB. Take today’s famous Internet enterprises as an example, Google needs to process nearly 100 Pb of data every month, and Taobao’s daily online transaction data reach 10 TB [3].

However, big data is not only a large-scale data set but also it is biased to simply define big data in terms of quantity. Laney [12], an analyst at Meta Group, put forward that there are three major challenges in big data management in the future: volume, velocity, and variety. On this basis, some researchers supplement the two concepts of veracity and value depth, forming the “5V” characteristics of big data. The “5V” characteristics of big data are mainly reflected in its processing, calculation, and storage process. However, the traditional technology is not competent for big data analysis and processing nor can it realize real-time online calculation of big data. At the same time, the traditional data processing technology is mostly based on structured data, which cannot deal with unstructured data such as text, pictures, and media. The development of big data processing technology is an effective way to solve the current data processing needs [12].

2.2. Big Data Processing Technology

Before the emergence of modern big data processing architecture, technicians used the MPI (Message Passing Interface) programming model and method to process large-scale data. MPI is a kind of high-performance parallel message passing interface, which is the main data programming and computing carrier at that time. It can make full use of hardware resources for parallel computing and is widely used in physical, meteorological, and other fields [13].

Due to the lack of good architecture support, low degree of automation, complex programming, and heavy tasks of programmers, researchers developed Hadoop MapReduce processing system. MapReduce is mainly for parallel processing of large-scale data. It was first developed by Google’s research team for internal employees to process data. After that, the technical team of Apache Nutch expanded MapReduce to Hadoop MapReduce, an open-source parallel computing framework system based on Java language. With its outstanding functions of task scheduling, data recovery, and system optimization, it has become the mainstream big data processing system, which is widely used in academia and industry [14].

MapReduce is designed for offline batch processing of data. When online rapid data processing is required, MapReduce efficiency is low. The Spark big data processing system developed in 2013 absorbed the advantages of Hadoop MapReduce, greatly improved the parallel computing performance, made up for the shortcomings of the latter in data real-time computing, and made the modern big data analysis technology more complete. At the beginning of Spark, Scala, a professional functional programming language, was used as the development language, which restricted the use and promotion of Spark. Many common programming languages (such as Python and R) support the addition of functions, as well as the update of the data structure dataset. Spark is gradually accepted by the majority of data researchers [15].

In addition to Spark, Flink from Europe is also a commonly used parallel big data processing system. Flink supports both streaming and batch computing and has rich data conversion interfaces. Different from Spark, Flink has a unique storage management mechanism, which can save a lot of computing space. At the same time, it can automatically optimize the program to avoid redundant result cache. It provides a variety of programming language interfaces such as Java, Scala, and Python to further facilitate the use of users; Flink also provides table computing, complex event processing, and other big data computing libraries, which can be integrated with other mainstream processing systems. Users can choose corresponding processing systems flexibly and easily according to their actual needs [16].

The aforementioned big data processing systems can be divided into four categories according to the processing objects and processing forms: batch processing system, streaming real-time processing system, real-time interactive query system, and graph data processing system.

2.3. National Monitoring and Management Platform for New Energy Vehicles

The premise of the combination of big data technology and new energy vehicles is to establish a big data platform to efficiently collect massive data resources. In order to solve the safety problems of new energy vehicles in China, improve the supervision of the new energy vehicle industry, and promote the development of the new energy vehicle industry, the Ministry of Industry and Information Technology established the national monitoring and management platform for new energy vehicles (hereinafter referred to as the national platform) in Beijing in 2016. By 2019, the number of vehicles connected to the national platform has exceeded 2.2 million. It is estimated that 7 million vehicles will be connected in 2020 and 80 million in 2025. The establishment of the national platform plays an important supporting role for the government to strengthen the safety supervision of new energy enterprises and vehicles [17].

The national platform architecture is mainly based on Linux system and Java programming language and is built with Hadoop system. Hadoop is the mainstream big data processing architecture at present. There are many precedents at home and abroad that adopt Hadoop architecture to build big data platform, covering medical, banking, rail transit, power system, and other fields. Its mode has been very mature [14, 15].

The existing data types of the national platform are mainly divided into static data and dynamic data. Static data, also known as file data, consist of basic vehicle information, such as license plate number, vehicle VIN number, vehicle manufacturer, vehicle type, and sales area. The data types of dynamic data are divided into online real-time running data and offline storage historical data. The difference between the two types of data lies in different stages and storage locations. The real-time operation data are the current transfer data, which are constantly updated and replaced and stored in the real-time cache so that the staff can monitor the safety of vehicle operation. The replaced data will be converted into historical data and stored in a dedicated server for researchers to call and check [15].

There are three kinds of data frame intervals of real-time operation data: 1 s, 10 s, and 30 s. According to the requirements of GB/T 32960, the data items are mainly collected from the following systems: power battery system, motor drive system, vehicle control system, and other parts. The data of the power battery system mainly include the total voltage and current of the battery system, SOC, cell voltage, and characteristic point temperature of the battery system. The data of the motor drive system mainly include motor voltage and current, speed, torque, and temperature. Vehicle control system data mainly include vehicle speed, gear information, accelerator pedal travel, and GPS position. In addition, there are air conditioning information, tire pressure status, and other information data [6].

Based on the principle of privacy protection, new energy vehicles in the private sector only transmit complete monitoring data in the event of failure warning. In the field of public transport, new energy buses, taxis, and logistics vehicles all transfer complete data around the clock to ensure the safety of public transport. The national platform mainly performs the industry regulatory responsibilities, while the researchers use the operation data to analyze and study the battery system, driving behaviour, vehicle energy consumption, charging behaviour, etc., so as to promote the overall development of the new energy automobile industry [18].

2.4. Foreign Development Status

At present, global energy and environmental systems are facing huge challenges. As a major player in oil consumption and carbon dioxide emissions, revolutionary changes are needed. At present, the global new energy vehicle development has reached a consensus. In the long run, pure electric drive, including pure electric and fuel cell technology, will be the main technical direction of new energy vehicles. In the short term, hybrid electric and plug-in hybrid power will be an important transition route. At present, the development of global new energy vehicles still faces some common problems, such as breakthroughs in key technologies, transformation of the automobile industry, construction of infrastructure, and consumer acceptance [19]. Specific to each country, it should be said that the main leaders in the development of new energy vehicles are the United States, Japan, and some European countries. These countries started much earlier than China, and their development focuses on each [19].

The United States has long focused on strategies to reduce oil dependence and ensure the safety of new energy. It has taken the development of new energy vehicles as an important measure to fundamentally get rid of oil dependence in the transportation field and determined the strategic position of new energy vehicles in the form of laws and regulations. As early as the Clinton period, the United States proposed plans to improve fuel economy, and hybrid was the main technical solution at the time. In the Bush era, it became a pursuit of zero emissions and zero oil dependence. The technical solution was mainly hydrogen fuel-cell vehicles. Later, there was a plan to achieve 20% oil replacement and savings in ten years. The main measure was biomass fuel. After the international financial crisis, the Obama administration will vigorously develop electric vehicles as an important part of the implementation of the new energy strategy. It has proposed a total of 4 billion US dollars in power batteries and plans for the development and industrialization of electric vehicles. Focus on power electric vehicles [20].

Compared with the United States and Japan, Europe focuses more on greenhouse gas reduction strategies. Meeting increasingly stringent carbon dioxide emission restrictions has become a major driving force for the development of new energy vehicles in Europe. The development of new energy vehicles in Europe in the early days was mainly based on biomass fuels, natural gas, and hydrogen fuels. At the beginning of this century, a 23% oil replacement target was proposed by 2020. Recently, Europe has paid great attention to electric vehicles. For example, Germany attaches great importance to the development of electric vehicles driven by pure electric power, focusing on pure electric power, and put forward the industrialization and marketization goals of 2012, 2016, and 2020, respectively [21].

3. Overall System Design

Aiming at the data obtained by the new energy electric vehicle management system, it takes a lot of manpower and financial resources to find the charging pile [22]. This paper builds a system hardware platform, including the heterogeneous data fusion of the vehicle network and the massive heterogeneous data application of the cloud storage [16]. In this paper, three ways of data collection and storages are used, i.e., data acquisition, data storage, and the data fusion display. The data acquisition end is responsible for data collection and data uploading; the data storage end uses cloud storage, which is responsible for classified storage of the vehicle network heterogeneous data; and the data fusion display layer communicates with the data storage layer to realize the display of related heterogeneous data fusion. Its complete architecture is shown in Figure 1.

The mobile terminal is also the data acquisition end used for the Internet of Vehicles. It is mainly a smart device with built-in sensors and Android operating systems, such as smart phones and smart rear-view mirrors. The server and database side use a distributed system of three hosts, including distributed file system (HDFS), nonrelational database MongoDB, and relational database MySQL. The hardware device of the data fusion display refers to the laptop or desktop computer. Later realizes the application of related data fusion by remotely calling is of three kinds, i.e., heterogeneous data of text, picture, and video stored on the server.

4. Data Acquisition Module

One advantage of the Android operating system is that it has a rich application for web application integration, and users can easily develop it according to their own needs. This module is the bottom layer of the entire IOT heterogeneous data fusion system [16]. The sensing layer of the IOT three-tier architecture is the data source of the entire system, mainly responsible for collecting real-time GPS data information, picture information, and video information of vehicles. The vehicle data collected by the Android data module laid the foundation for the subsequent implementation of vehicle management [23].

In the whole system, a smart device with an Android operating system is used as a data acquisition end. The smart device itself comes with a variety of sensors, such as GPS positioning systems, cameras, and wireless network cards. The real-time GPS position information of the vehicle is obtained by the sensor provided by the device, and the camera and the video recording function can be performed by using the camera. Heterogeneous data collected throughout the process can be uploaded to the cloud storage database server via wireless network or data traffic. The specific Android data acquisition end architecture diagram is shown in Figure 2.

The main function of the Android data acquisition end is to help the user registration management module of the rights management, obtain the real-time GPS text information, and upload it to the positioning information module of the MySQL relational database. Select picture from the photo stored place and upload the picture data to the real-time. MongoDB nonrelational database image upload module and real-time video capture, and upload it to the HDFS for storage video upload module [24].

5. Cloud Storage Data Module

This module is mainly used to save the data uploaded by the Android data acquisition module in real time. As shown in the cloud storage model of Figure 3, the state of data-related storage is represented. The main purpose is to classify and store the three heterogeneous data of text, picture, and video and select the appropriate storage system for related storage according to its characteristics. For the text data, the relational database MySQL was selected, and for the image data, the nonrelational database MongoDB was selected. The video data were stored in the distributed file system (HDFS) because they occupy more memory.

5.1. MySQL Text Data Storage

The text data uploaded by the data acquisition end are stored by the MySQL database, and the main uploaded text information is GPS real-time location information. There are 7 fields in the uploaded data, as shown in the GPS data field in Table 1, where the main fields are specifically distinguished. A few portions of storage data in the MySQL are shown in Table 2.

The data obtained by the Android client are first uploaded to the Tomcat server, and then the data in the Tomcat server are transferred to the MySQL database. The content stored in the MySQL database has ID (self-growth ID), Longitude (longitude), Latitude, Time, Serial number, Mac (IMSI code), Remark (IMEI code).

The upload process is real time. As long as the Android client application is started, the acquired GPS text data are uploaded to the corresponding Tomcat server in real time, and the data acquired in the Tomcat server are also uploaded to the corresponding MySQL relational database in real time. The data stored in the MySQL database can be used for path backtracking, as well as for related applications such as positioning. In the whole system, a smart device with an Android operating system is used as a data acquisition end. The smart device itself comes with a variety of sensors, such as GPS positioning systems, cameras, and wireless network cards. The real-time GPS position information of the vehicle is obtained by the sensor provided by the device, and the camera and the video recording function can be performed by using the camera. Heterogeneous data collected throughout the process can be uploaded to the cloud storage database server via wireless network or data traffic.

5.2. MongoDB Image Data Storage

The image storage module uses a MongoDB distributed nonrelational database cluster built by three hosts. The distributed cluster adopts the form of Sharing + Replica Sets. Sharing is used to add related machines to slice large files for storage. Replica Sets ensure that each shard node has automatic backup and automatic failover capabilities [24].

In the cloud storage system build, the operating system of all nodes is CentOS-7-x86_64-DVD-1161.iso. The MongoDB cluster consists of three servers: Server A: 192.168.118.100, Server B: 192.168.118.101, and Server C: 192.168.118.102. In Table 3, the MongoDB architecture diagram is shown [24].

5.3. HDFS Video Data Storage

Install a fully distributed cluster of Hadoop to store video data uploaded by the data acquisition end. Because the HDFS is a distributed file system used in common hardware devices, HDFS is highly fault-tolerant and can be deployed on low-cost hardware. It provides high-throughput features for accessing application data and is suitable for applications with very large data sets [25]. So, the video storage module uses a three-host Hadoop fully distributed cluster. The cluster-related node allocation status is shown in Figure 4.

The sHadoop (2.5.2) cluster configuration can be divided into two steps. The first step is configured on Hadoop252. The second step uses the SCP command to copy the configuration file to the slave 01 and slave 02 subnodes.

Three virtual hosts are created through VMware Workstation Pro for platform testing. Hadoop clusters are assigned as Hadoop252 as the management node of HDFS, slave01 as the management node of yarn, hadoop252, slave01, slave02 host installed processes as storage data nodes, yarn Node Manager node as shown in Figure 4.

In order to upload video data to the HDFS in real time, it needs to be mounted by NFS (Network File System). The main function of NFS is to achieve file sharing across networks, which allows computers on the network to share resources over a TCP/IP network. A local NFS client application can transparently read and write files located on a remote NFS server just as if it were a local file. This article mainly uses NFS to mount HDFS to a local directory. File sharing can be performed between the two directories. The display content is the same. Therefore, the Android client video data are uploaded to the HDFS and uploaded to the local directory in real time. In this way, the video data are uploaded to the HDFS in real time [26].

6. Data Fusion Display Module

With the rapid development of the Internet, many information systems related to the Internet of Vehicles have gradually shifted from C/S architecture to Web application form based on B/S architecture. This module is mainly developed with the B/S architecture. The development process uses the relevant functions of the server and then implements the corresponding functions on the browser side through the related call work. The main modules for data fusion implementation include login registration, real-time location, and path backtracking [16].

6.1. Node.js Real-Time Location

Real-time location display application is developed through the JavaScript language on the Node.js platform, mainly based on Baidu Map as the result of the system display page and also the page that collects user information and interacts with the user. JavaScript calls Baidu Map through the Baidu Map API to add custom function components to meet the needs of users. Baidu Map API is a set of application interface based on Baidu Map service provided free for developers. Users can introduce Baidu Map API in JavaScript code by using <script> tag to introduce Baidu Map API in the page. Good result data are presented in a graphical interface on the map.

When the Android client wirelessly transmits the location information, picture information, and video information to the server, the server performs extraction, analysis, and processing. The vehicle is monitored in real time according to the location information uploaded by the client, and the real-time image is displayed in the background of Baidu Map in combination with the uploaded real-time image, and the real-time location information and corresponding picture information are displayed when the corresponding vehicle is clicked.

6.2. Java Web Path Backtracking

The dynamic display of the page is inevitable using Ajax technology, which is a web development technology for creating interactive web applications, which can dynamically refresh the display of a certain part of the page. This part of the path backtracking is through Ajax technology, dynamically displayed to the user.

The application implementation process is mainly carried out from the following three aspects, namely, data reading, loading data, and map page.

6.2.1. Data Reading

Querying the serial number, time, and GPS latitude and longitude of the device dynamically stored in the MySQL database and providing the data needed to draw the vehicle trajectory.

6.2.2. Loading Data

This application is developed through Java Web. By processing the query data received by the page and processing the processed data as the filtering condition of the SQL query data, the queried record is encapsulated into a Java class and processed and transmitted to the page for processing. Map development use.

6.2.3. Map Page

Through the Baidu Map API call, secondary development for Baidu Map, draw the vehicle historical running track according to the query data, complete the page layout in the browser, collect the query conditions, and query the request to send and respond to the data. Processing, complete filtering of the location information of the vehicle and the serial number of the device from the returned JSON object, and finally implementing the path backtracking function.

7. Conclusion

In this paper, the massive heterogeneous data generated by the charging piles corresponding to the new energy electric vehicles in the Internet of Vehicles are fused, and cloud computing is introduced as its storage module, which is convenient for dealing with the storage and related expansion of massive data. For the problem of heterogeneous data acquisition, storage, and fusion in the Internet of Vehicles, a system scheme of heterogeneous data fusion method in the Internet of Vehicles based on cloud computing is proposed. After testing, the system runs stably and effectively in practical application and can meet the design requirements of the system.

In the future, if the taxi demand data, charging pile data, and vehicle operation data can be fully connected, then the vehicle operation, charging pile information, vehicle residual power, order distribution, etc. will be comprehensively optimized and upgraded, which will be of great benefit to the whole travel ecology. Data will drive continuous innovation in R&D, products, manufacturing, and supply chain and business model, and automobile will build a new industrial ecosystem around big data. At the same time, massive data drive the rise of computing and analysis platform, and vehicle enterprises urgently need to build computing soft power to win future differentiated competition.

Data Availability

The authors confirm that the data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Li Qin Hu, Amit Yadav, and Asif Khan were responsible for conceptualization, methodology, and writing and preparing the original draft. Hong Liu was responsible for writing and preparing the original draft, validation, formal analysis, and supervision of the study. Amin Ul Haq worked on facts visualization.