Submission Instructions
Create a folder named elec_1520_hw7_your_name. Substitute your first and last name for “your_name” in the folder name.
Place the source code files in the respective folder. Source code files are .cpp and .h files.
Compress the elec_1520 folder as a zip file with a .zip extension. Right click on the folder and select Send to -> Compressed (zipped) folder. Upload the .zip file to Canvas.
Problem Statement
Create a program that displays frequency and statistical information for a data set. The format of program output should closely resemble the example output shown below. Output is, of course, dependent on the data set.
Figure 1: Group size 20, Asterisks per count 3
Input
Data sets will be read from files. The program asks the user to input the data file name, opens the data, and obtains the following information from the file: group size, asterisks per count, total number of elements in the data set, and the data set contents. All numbers in the file are integers.
Input File Format
Line 1: group size
Line 2: asterisks per count
Line 3 total number of elements in the data set
Lines 4 until end of file: data values (guaranteed to be in the range [0, 255]
You may assume group size, asterisks per count, and total number of elements in the data set will all be positive, non-zero integers.
Two example data files will be provided to aid in testing your solution. It is recommended that you create your own input file, using a simple data set to test the accuracy of your solution. After verifying your solution works as expected, then test it with the larger data sets provided.
Process
Frequency count of values in the data set. A frequency count is a total count of the number of 0’s, 1’s, 2’s, 3’s, …, 8’s, 9’s
Calculate the mean, standard deviation, and median values
Output
Output should be written to both a text file and the screen. The text file should have an extension of .txt. Use the same function to write to the file and the screen. Use an ostream parameter to direct the output.
Frequency Data Output
Groups
The group size variable controls the output group count. Figure 1 shows the output for a group size of 20. This results in group in the range [0, 19] [20, 39] [40, 59] … [220, 239] [240, 255]. Note the last group contains the remaining values [240, 255] as 256 is not evenly divisible by 20.
The frequency data count column output is a sum of the frequency count for numbers in that group For example, in Figure 1, the count value of 192 for group [0, 19] is a count of the number of 0’s, 1’s, 2’s, …, 17’s, 18’s, and 19’s.
The histogram output depicts the frequency data count, using the variable asterisk per count, to determine the number of asterisks to print. Figure 1, group [0,19] has a count of 192. The asterisks per count are 3. Each asterisk represents a count value of 3. There are 64 asterisks printed for a count of 192. 192/3 equals 64.
We cannot print partial asterisks, so we do not print an asterisk for any remainder values. The [20, 39] group count 190 is not evenly divisible by 3. 190/3 = 63 1/3. Only 63 asterisks are printed.
There is not room to print 192 asterisks, so we compress the data to provide a visualization that fits in the console window.
Figure 2 provides an output example for group size 15, asterisks per count 5.
Figure 2: Group Size 15, Asterisks per count 5
Statistics Output
The mean, standard deviation, and median should all be rounded to 1 decimal place.
Calculations
Mean – average value of data
Median – The median value is the middle value in a sorted list of numbers. If the data set has an even number of elements, the median value is the average of the middle pair of numbers.
Examples:
Median value of this data set { 1, 2, 3, 4, 5} is 3.
Median value of this data set { 1, 2, 3, 4, 5, 6} is 3.5. The average of (3+4)/2.
In that case we find the middle pair of numbers, and then find the value that is half way between them. This is easily done by adding them together and dividing by two.
Use the bubble sort algorithm to sort the array. You do not have to maintain separate copies of the unsorted and sorted data arrays.
Standard Deviation
The standard deviation is a measure of the spread of numbers in a data set. It is the square root of the sum of the squared differences divided by the number of elements in the set.
Symbols:
¯x – mean
s – standard deviation
N – number of elements in set
s= √(∑_(i=1)^N▒〖((x_i- ¯x)^2/N)〗
Frequency Count
When counting numbers, take advantage of C++’s array indexing scheme.
How many elements does the frequency count array require? (Hint: what is the range of random number values?)
Where is a good place to store the count of 0’s? 1’s? 2’s?
Program Files
The main function should be in the main.cpp file. Other functions and their declarations should be defined in additional cpp and header files.
Error Checking, Memory Allocation and Deallocation
The program should perform adequate error checking and handling. Terminate the program when appropriate and display appropriate error messages using the cerr stream. Don’t forget to close files and free memory.
Global Constants and Global Variable Usage
No global constants nor global variables are to be used in your programming solution. All information must be passed to and from functions. Usage of global variables and global constants, other than declaring a struct, will result in a 30% grade reduction.
Global Structure
Globally define a struct in a header file and use that data type in your programming solution. Store the calculated mean, median and standard deviation values.
Example:
struct stats_t
{
double mean;
double median;
double stdDev;
};
Comments
Each file should contain the required header comments. Function prototypes must also be well-commented. Comment standards have been described in earlier assignments.
Grading
Grading Criteria Points Possible
Input File handling (stream declaration, open, close) 10
Output File handling and formatting (stream declaration, open, close) 10
Console output neatness and readability 10
Dynamic Memory Allocation and Deallocation 10
Frequency Data (grouping, count, histogram) 15
Statistics (mean, median, standard deviation functions) 25
Main program control flow, appropriate use of functions, appropriate creation of related header and cpp files. 30
Comments & Style 10
Total 120
Learning Outcomes (for instructor use)
Learning Outcome
L01 – Understand and use the basic programming principles of C++ Multiple functions, multiple files.
LO2 – Build programs that designate specific requirements of data types, data structures, and numeric representations of data Requires a dynamically allocated 1D integer array. Requires creation of a struct to store statistical values. Bubble sort implementation. Frequency count and histogram display.
LO3 – Identify and correct syntax, compilation, and logic errors in programs Requires students to perform all of these when developing a working program.
LO5 – Construct program data for use in various memory spaces (global, stack, heap), pointers and memory allocation Requires a dynamically allocated 1D integer array. Global declaration of struct in a header file.
LO6 – Deploy standard library APIs for string manipulation and math utilities Statistical calculation of standard deviation requires math library function square root. Round function usage required.
LO7 – Design code functionalities that perform input/output programming Requires input and output file stream handing.