Difference between revisions of "Infrastructure"

From OpenCellID wiki
Jump to: navigation, search
(Frontend)
(The brain)
 
(18 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[File:OpenCellID server strukture.PNG|right|400px]]
 +
{| width=998px
 +
|
 
==Servers==
 
==Servers==
[[File:OpenCellID server strukture.PNG|right|600px]]
+
 
 
{| class="wikitable sortable" style="font-size: 85%; text-align: left;"
 
{| class="wikitable sortable" style="font-size: 85%; text-align: left;"
 
|-
 
|-
Line 23: Line 26:
 
| title="Resources" | 1 vCPU, 2 GB
 
| title="Resources" | 1 vCPU, 2 GB
 
|-
 
|-
| title="Server" | prod-ocid-web-02.colt.enaikoon.de
+
| title="Server" | prod-ocid-cfgsrv-02.colt.enaikoon.de
 
| title="Software" | MongoDB ConfigServer
 
| title="Software" | MongoDB ConfigServer
 
| title="Operating system" | Ubuntu 12.04 LTS
 
| title="Operating system" | Ubuntu 12.04 LTS
 
| title="Resources" | 1 vCPU, 2 GB
 
| title="Resources" | 1 vCPU, 2 GB
 
|-
 
|-
| title="Server" | prod-ocid-web-03.colt.enaikoon.de
+
| title="Server" | prod-ocid-cfgsrv-03.colt.enaikoon.de
 
| title="Software" | MongoDB ConfigServer
 
| title="Software" | MongoDB ConfigServer
 
| title="Operating system" | Ubuntu 12.04 LTS
 
| title="Operating system" | Ubuntu 12.04 LTS
Line 55: Line 58:
  
 
===Frontend===
 
===Frontend===
The web frontend uses Apache web server as a proxy for serving web requests to Tomcat.
+
*the web frontend uses Apache web server as a proxy for serving web requests to Tomcat
 
+
*the OpenCellID web application is running on Tomcat and is reading and writing cell measurement data to/from the MongoDB database backend
The OpenCellID web application is running on Tomcat and is reading and writing cell measurements data to/from the MongoDB database backend.
+
*jQuery Mobile is responsible for providing a cross-platform user interface
 
+
*the map is displayed using OpenStreetMap combined with Leaflet library
jQuery Mobile is responsible for providing a cross-platform user interface.
+
 
+
The map is displayed using OpenStreetMap combined with Leaflet library.
+
  
 
===Database Backend===
 
===Database Backend===
The database backend, with a current 4.4 million cell towers and about 565 million measurements (1.1.2014), is a MongoDB database cluster with six servers.<br>
+
The OpenCellID backend uses Kafka queuing system in order to be able to handle periodic peaks. Kafka producers embedded into the web application send all incoming data to Kafka brokers. Kafka consumers pull data from brokers, process measurements and store them in MongoDB.
Three servers are serving as MongoDB configuration servers and the other three servers are serving as database backend with one replication set spread across the three servers.
+
 
 +
The database backend, with a current 7 million cell towers and about 1.2 billion measurements (1.1.2015), is a MongoDB database cluster with six servers:
 +
*three servers are serving as MongoDB configuration servers and Zookeeper instances
 +
*the other three servers are serving as the database backend with one replication set spread across the three servers and Kafka brokers
  
 
==Challenges and solutions==
 
==Challenges and solutions==
The OpenCellID community is very strong and continously provides a high number of measurements.<br>
+
The OpenCellID community is very strong and continuously provides a high number of measurements.<br>
 
This immediately poses a few challenges:
 
This immediately poses a few challenges:
* High Volume<br>data arrives from many differnet sources and is rapidly growing
+
*High Volume<br>data arrives from many different sources and is rapidly growing
* Scale<br>growth of data should go along with predictable, incremental costs and no downtime should be needed when adding additional server resources
+
*Scale<br>growth of data should go along with predictable, incremental costs and no downtime should be needed when adding additional server resources
* Data Processing<br>analyzing and processing of rapidly growing data must be constantly efficient.<br>The current solutions are based on MongoDB and its features:<br>
+
*Data Processing<br>the analysis and process of the rapidly growing data must be constantly efficient
** Native Analytics<br>using the integrated aggregation framework and Map/Reduce to calculate aggregates and analyses in place without the need of prior exporting data to other systems
+
 
** Advanced Geo Queries<br>using geospatial MongoDB support to execute complex queries
+
The current solutions are based on Kafka queuing system and MongoDB with its features:<br>
** Horizontal Scaling<br>sharding makes it easy to scale applications horizontally on commodity hardware for accommodating constantly increased throughput
+
* Native Analytics<br>using the integrated aggregation framework and Map/Reduce to calculate aggregates and analyses in place without the need of prior exporting data to other systems
** Reduced Total Cost of Ownership (TCO)<br>as open-source storage MongoDB is a very cost-effective solution
+
* Advanced Geo Queries<br>using geospatial MongoDB support to execute complex queries
 +
* Horizontal Scaling<br>sharding makes it easy to scale applications horizontally on commodity hardware for accommodating constantly increased throughput
 +
* Reduced Total Cost of Ownership (TCO)<br>as open-source storage MongoDB and Kafka queuing system are a very cost-effective solution
 +
 
 +
==The brain==
 +
Krzysztof Ociepa (email: [email protected]) has designed the big-data infrastructure as well as the new OpenCellID server software based on Java, Kafka queuing system and MongoDB, and has also implemented most of the current features after two other developers failed to do so.
 +
 
 +
Details about the implemented software and infrastructure can be found above.
 +
 
 +
There are plans to publish the entire server software as open source for stimulating the contribution of software features of other community members of the OpenCellID project. This will most likely happen before the end of 2015.
 +
|}

Latest revision as of 19:02, 3 January 2015

OpenCellID server strukture.PNG

Servers

Server Software Operating system Resources
prod-ocid-web-01.colt.enaikoon.de Apache + Tomcat + MongoS Ubuntu 12.04 LTS 2 vCPU, 4 GB
prod-ocid-web-02.colt.enaikoon.de Apache + Tomcat + MongoS Ubuntu 12.04 LTS 2 vCPU, 4 GB
prod-ocid-cfgsrv-01.colt.enaikoon.de MongoDB ConfigServer Ubuntu 12.04 LTS 1 vCPU, 2 GB
prod-ocid-cfgsrv-02.colt.enaikoon.de MongoDB ConfigServer Ubuntu 12.04 LTS 1 vCPU, 2 GB
prod-ocid-cfgsrv-03.colt.enaikoon.de MongoDB ConfigServer Ubuntu 12.04 LTS 1 vCPU, 2 GB
prod-ocid-db-01.colt.enaikoon.de MongoDB Replication Set Ubuntu 12.04 LTS 4 vCPU, 48 GB
prod-ocid-db-02.colt.enaikoon.de MongoDB Replication Set Ubuntu 12.04 LTS 4 vCPU, 48 GB
prod-ocid-db-03.colt.enaikoon.de MongoDB Replication Set Ubuntu 12.04 LTS 4 vCPU, 48 GB

Software stack

Operating System

All OpenCellID servers are running with Ubuntu Linux 12.04 LTS.

Frontend

  • the web frontend uses Apache web server as a proxy for serving web requests to Tomcat
  • the OpenCellID web application is running on Tomcat and is reading and writing cell measurement data to/from the MongoDB database backend
  • jQuery Mobile is responsible for providing a cross-platform user interface
  • the map is displayed using OpenStreetMap combined with Leaflet library

Database Backend

The OpenCellID backend uses Kafka queuing system in order to be able to handle periodic peaks. Kafka producers embedded into the web application send all incoming data to Kafka brokers. Kafka consumers pull data from brokers, process measurements and store them in MongoDB.

The database backend, with a current 7 million cell towers and about 1.2 billion measurements (1.1.2015), is a MongoDB database cluster with six servers:

  • three servers are serving as MongoDB configuration servers and Zookeeper instances
  • the other three servers are serving as the database backend with one replication set spread across the three servers and Kafka brokers

Challenges and solutions

The OpenCellID community is very strong and continuously provides a high number of measurements.
This immediately poses a few challenges:

  • High Volume
    data arrives from many different sources and is rapidly growing
  • Scale
    growth of data should go along with predictable, incremental costs and no downtime should be needed when adding additional server resources
  • Data Processing
    the analysis and process of the rapidly growing data must be constantly efficient

The current solutions are based on Kafka queuing system and MongoDB with its features:

  • Native Analytics
    using the integrated aggregation framework and Map/Reduce to calculate aggregates and analyses in place without the need of prior exporting data to other systems
  • Advanced Geo Queries
    using geospatial MongoDB support to execute complex queries
  • Horizontal Scaling
    sharding makes it easy to scale applications horizontally on commodity hardware for accommodating constantly increased throughput
  • Reduced Total Cost of Ownership (TCO)
    as open-source storage MongoDB and Kafka queuing system are a very cost-effective solution

The brain

Krzysztof Ociepa (email: [email protected]) has designed the big-data infrastructure as well as the new OpenCellID server software based on Java, Kafka queuing system and MongoDB, and has also implemented most of the current features after two other developers failed to do so.

Details about the implemented software and infrastructure can be found above.

There are plans to publish the entire server software as open source for stimulating the contribution of software features of other community members of the OpenCellID project. This will most likely happen before the end of 2015.