Header Ads

Introduction to Data Structures

 What are Data Structures?

Data Structures are constructs to define a particular arrangement of data to effectively perform certain operations and processing on them. 

These operations can be :

1. Insert: Inserting a new item.

2. Delete: Removing an item.

3. Access: Reading an item.

4. Traversal: Iterating through all the items in the Data structure.

5. Searching: Finding a particular item in Data Structure (Searching)

6. Sorting: arranging the items in a certain order.

7. Merging: Combining 2 instances into 1 instance.

Data Structures are designed to perform the above operations efficiently.  One data structure can not achieve the best performance in all operations. Some data structures may be good for fast access and searching but not good for traversal or sorting, some other data structures may be good for traversal but not optimized for insert and delete. Some may be good for many things but could take a lot of space.


How to Select the best Data structure?

Based on the problem we are trying to solve and the volume of operations we have to perform we can select or design different Data structures. To select the best one we need some kind of framework to evaluate performance. 

It can not be improved if it can not be measured!

 Frameworks to measure the performance of an algorithm

To measure how our algorithm or data structure performs we first need to define a computation model which tells us assumptions about the cost of basic operations.

Measuring time of computation.

Asymptotic Time complexity

For the sake of simplicity and generalization, we don't measure the exact time to perform a  task. we assume that every basic operation has a constant cost and we try to compute how many times we are performing constant cost operation with respect to input size N.

Asymptotic complexity analyze the complexity of a particular operation without considering its real-world usage pattern.


Amortized Analysis

Amortized analysis is done to analyze a sequence of operations in real-world usage patterns and analyze its average cost to measure performance. The amortized analysis gets a tight upper bound on actual cost.

 For example, take 2 data structures:

DS1:  searching cost =very  high construction cost first time + low cost for all subsequent searches 

DS2: search cost= medium for  every search


For the first few searches, DS2 will look better but in a longer run average of time taken by DS1 will be a lot lesser.




Accounting Memory access cost

We also need to make assumptions for the cost of getting data into CPU registers for computations. For simplicity, Asymptotic analysis assumes memory access cost as zero/constant. But in practice memory systems are designed in such a way that not all memory access is the same.  Data in the cache are fast to access, but data in ram are slow to access.

Following are the models used for the analysis of the algorithm

Random Access Machine (RAM) Model

The cost of computation is measured in terms of the number of instructions executed by the machine and is referred to as time. One problem with the RAM model is that it assumes that accessing all memory locations has the same cost. As discussed earlier there can be a factor of over 100 between the time for accessing a word of memory from the first level cache and accessing it from the main memory. 

The second problem is RAM model is a sequential model, It does not account for operations that can be done faster using parallel processing.


Parallel RAM Model

PRAM model account that operations that can be done in parallel will take less time. This model assumes a shared memory that can be accessed by many workers in parallel in unit time. It can not model memory access of modern systems where shared memory access can not be done in parallel in unit cost due to synchronization and thread safety, It can be done in some VLSI prototypes 


Fortunately, most of the algorithms that better performed better in more accurate models also perform better in the RAM model so for understanding and deciding data structures RAM model can be used with some hint of cache friendliness in their analysis.



No comments

Powered by Blogger.