Assessing Brainspace Server Performance
Brainspace Application Server
To achieve fast performance with advanced analytics, Brainspace loads much of the relevant dataset information into system memory. Active datasets, users, search queries and communication analysis all drive Application server memory needs. To get a good estimation for the amount of memory required, you can add up those Brainspace build components on disk that will be loaded into memory.
Approach
Add up the components in each of your build folders (/data/brainspace/[hashed build directories]) in the output/brains, output/clusters and output/graph-data folders, then add 40% to account for various other memory demand needs, such as user activity, concept searches, Cluster Wheel and classifier activity. That will give you a good estimation for the total memory needs for the datasets in your system (assuming all can be kept Active/Running), which is required to be 70% of the total system/server RAM. You will want to extrapolate this number for your future growth needs.
Brainspace has a script that will add up the components of your datasets and estimate memory needs for you. The script, named dataset-ram.sh, can be downloaded as a zip file from the Article Attachments link at the bottom of this article. The script will create a list of all of your datasets and build directories, then add up the relevant file sizes in those directories and present you with a total, which is the memory needed by the Brainspace Java application. Since by default Java gets allocated 70% of total system memory, you should divide your sum by 0.7 to get total Application server system memory requirements. Copy the script to the /data/brainspace directory as the user root and run by typing bash dataset-ram.sh at the command prompt.
Brainspace Analytics Server
The heart of your Brainspace instance is the Analytics server, which builds the Brain, a multi-dimensional array (index) representation of the relationship (semantic distance) between a set of documents and terms. It is the Brain that enables our advanced analytics, search and machine learning capabilities. Building the Brain is a deep analytics and very resource-intensive process.
With adequate server resources (Application and Analytics servers), most Brainspace customers are able to achieve a rate of 250,000 to 450,000 records per hour for Brain building performance. This excludes streaming time, which is dependent on your data source resources and network throughput between the data source and Brainspace. The Brain build time is also highly dependent on the average document size (more words requires more analysis) and the makeup and complexity of your data. For instance, a large percentage of emails that have the same (or blank) subject will require additional processing during the email threading (EMT) phase of the build.
The Analytics server requires a Processor and Memory balance for optimal performance. The application is tuned for a 1:4 ratio of processors to memory (GB) but certain portions of the build will require variations to this formula. A high volume of computations requires more processors, such as during creation of the Term Document Matrix where every term in all documents are compared to all other terms in the collection.
Brainspace has a script to help you assess build performance of your Analytics server. Download the proc-usage.sh script from the Article Attachments link at the bottom of this article. Unzip the file and copy to your Brainspace Analytics (Build) server to the /data/brainspace directory as the user root and run by typing bash proc-usage.sh at the command prompt.
Review system performance regularly, as often as after every full build, to ensure your Analytics server is optimized for your needs.