Introduction to Data Structures
What are Data Structures?
Data Structures are constructs to define a particular arrangement of data to effectively perform certain operations and processing on them.
These operations can be :
1. Insert: Inserting a new item.
2. Delete: Removing an item.
3. Access: Reading an item.
4. Traversal: Iterating through all the items in the Data structure.
5. Searching: Finding a particular item in Data Structure (Searching)
6. Sorting: arranging the items in a certain order.
7. Merging: Combining 2 instances into 1 instance.
Data Structures are designed to perform the above operations efficiently. One data structure can not achieve the best performance in all operations. Some data structures may be good for fast access and searching but not good for traversal or sorting, some other data structures may be good for traversal but not optimized for insert and delete. Some may be good for many things but could take a lot of space.
How to Select the best Data structure?
Based on the problem we are trying to solve and the volume of operations we have to perform we can select or design different Data structures. To select the best one we need some kind of framework to evaluate performance.
It can not be improved if it can not be measured!
Frameworks to measure the performance of an algorithm
To measure how our algorithm or data structure performs we first need to define a computation model which tells us assumptions about the cost of basic operations.
Measuring time of computation.
Asymptotic Time complexity
For the sake of simplicity and generalization, we don't measure the exact time to perform a task. we assume that every basic operation has a constant cost and we try to compute how many times we are performing constant cost operation with respect to input size N.
Asymptotic complexity analyze the complexity of a particular operation without considering its real-world usage pattern.
Amortized Analysis
Amortized analysis is done to analyze a sequence of operations in real-world usage patterns and analyze its average cost to measure performance. The amortized analysis gets a tight upper bound on actual cost.
For example, take 2 data structures:
DS1: searching cost =very high construction cost first time + low cost for all subsequent searches
DS2: search cost= medium for every search
For the first few searches, DS2 will look better but in a longer run average of time taken by DS1 will be a lot lesser.
Accounting Memory access cost
We also need to make assumptions for the cost of getting data into CPU registers for computations. For simplicity, Asymptotic analysis assumes memory access cost as zero/constant. But in practice memory systems are designed in such a way that not all memory access is the same. Data in the cache are fast to access, but data in ram are slow to access.
Following are the models used for the analysis of the algorithm
Random Access Machine (RAM) Model
The cost of computation is measured in terms of the number of instructions executed by the machine and is referred to as time. One problem with the RAM model is that it assumes that accessing all memory locations has the same cost. As discussed earlier there can be a factor of over 100 between the time for accessing a word of memory from the first level cache and accessing it from the main memory.
The second problem is RAM model is a sequential model, It does not account for operations that can be done faster using parallel processing.
Parallel RAM Model
PRAM model account that operations that can be done in parallel will take less time. This model assumes a shared memory that can be accessed by many workers in parallel in unit time. It can not model memory access of modern systems where shared memory access can not be done in parallel in unit cost due to synchronization and thread safety, It can be done in some VLSI prototypes
Fortunately, most of the algorithms that better performed better in more accurate models also perform better in the RAM model so for understanding and deciding data structures RAM model can be used with some hint of cache friendliness in their analysis.
Post a Comment