Idle curiosity again. Nerdiness for history again. And again a new found fascination for Gen AI tools. This time I tried my hand at evaluating Leadership. Not business leaders or CEOs – but emperors , leaders of countries and reformers and dictators.
Across centuries and across continents.
What makes a leader truly impactful? And what helps leaders leave a legacy?
Can we build a diagnostic model basis observable traits?
And how can Gen AI help?
Objectives
– Define a leader’s short-term impact and long-term legacy.
– Identify measurable inputs that contribute to leadership outcomes.
– Normalize across era, geography, governance type, and ideology.
– Build a predictive and diagnostic framework.
– Test the model on a cohort of diverse historical leaders.
Challenges
The task was complex. Leadership is not a universal currency. What counts as success for a 13th-century nomadic warlord may be the opposite of success for a 20th-century democrat. We had to avoid moral contamination, category bleed, and modern-day bias—while retaining judgment grounded in observable consequence. We refined, retested, and repeatedly stress-tested our framework using a mix of logic, historical context, and rigorous modeling.
Our Methodology
We eventually defined two outcome variables:
– Impact: Tangible, near-to-mid-term results achieved during the leader’s lifetime. Value-neutral.
– Legacy: Long-term influence, moral judgment, and remembered significance. More value-laden.
Then, we defined six input variables—Vision, Executional Efficacy, Resilience, Charisma & Influence, Ethical Governance, and Human Development Focus—each with structured subcategories. We scored each leader in the dataset across all categories.
This is what each of the input variables mean :
Vision – is the ability to articulate a coherent vision. This is not necessarily about a ‘Grand’ vision or ‘Big’ ideas . This is about coherence , the ability to predict future trends , long term planning and importantly the ability to communicate these ideas.
Executional Efficacy – the ability to deliver results despite structural issues , the ability to overcome hurdles. Ofcourse clearer the vision the better it gets for execution.
Resilience – the will and personal endurance to carry on despite setbacks.
Charisma – the emotional impact on followers , propaganda and mythmaking.
Ethical Governance – now this is interesting. This is essentially tolerance for dissent , ‘consultative’ness , protection of vulnerable groups. In many ways , this indicates the leader’s willingness or ability to build lasting institutions.
Human Development focus: as the name suggests how did said leader improve standards of living. As an extension how did he or she ‘develop’ their people .
It was important to overcome modern day biases while ranking leaders on Input variables. I could not let our moral values define how we view leaders during that time. For example , we did not penalize Churchill for being an imperialist – it was a fault of his times.
Similarly , I had to overcome ‘hindsight’ bias. So ,we rewarded or penalized leaders basis their decision making using the best possible information available during their time.
I also had to normalize for monarchies vs modern states. I did not penalize a monarch for ‘centralization’ but rewarded him if he was ‘consultative’. Similarly , I did not reward leaders in a democratic set up for being consultative but penalized them if they trampled on democratic norms.
Each of the input variables was a point system – reward and penalties. For example , in Vision Churchill was rewarded for anticipating the Hitler threat but penalized for his inability to predict the post war world order.
In Ethical Governance – Nehru was rewarded for his institution building but penalized for the first emergency in a state in India (Kerala) and his indifference to corruption in the Congress party.
Testing the Model
We then tested the model with a fixed cohort of 11 leaders, including Churchill, Nehru, Deng Xiaoping, Genghis Khan, Akbar, Atatürk, and others. We added more leaders for stress tests—like Lincoln, Catherine the Great, and Napoleon—using a ‘reverse test’: predicting outcome scores from inputs alone, then comparing against actual outcome evaluations.
We took out correlations for Impact and Legacy separately with each of the input variables. Here is how they look:
| Input Category | Correlation with Impact |
| Executional Efficacy | 0.73 |
| Vision & Clarity | 0.65 |
| Resilience | 0.61 |
| Human Development | 0.47 |
| Charisma & Influence | 0.43 |
| Ethical Governance | -0.22 |
Interestingly in order to create short impact – execution efficacy seems the most important with clarity of vision and resilience being close. Ethical governance doesn’t seem to matter for short term impact too much.
The correlations for Legacy however were very different:
| Input Category | Correlation with Legacy |
| Human Development Focus | 0.76 |
| Vision and Clarity of Thought | 0.72 |
| Executional Efficacy | 0.68 |
| Ethical Governance | 0.63 |
| Resilience | 0.54 |
| Charisma and Influence | 0.51 |
Key Insights
The most predictive input for Impact was Executional Efficacy. For Legacy, Ethical Governance and Human Development emerged stronger.
We learned that:
– Vision matters, but only if it’s translated into reality.
– Ethical leaders don’t always have the highest short-term impact, but they often shape lasting legacies.
– People tend to forgive ‘strongman’ flaws in pursuit of results, but legacies demand fairness and systems.
Hitler for example had a very powerful impact during his time but his legacy has been dark. He had a coherent vision. We may not like it , yet it was a well articulated coherent vision. He had fantastic charisma , resilience and great execution abilities. All of this helped him create a great impact. In fact he is among the top in Impact. However , he is the bottom most in Legacy – being the bottom most in both Ethical Governance and Human Development.
Abraham Lincoln seems to emerge as the all round star – with consistently high rankings on both input and output.
Deng XiaoPing also does well in most categories – both input and output -except in Charisma and Ethical Governance.
Genghis Khan has the highest score in Impact and the highest score in Execution ,while his vision was very limited – he has middling scores in vision. It was not a grand vision yet it had coherence enough for tribal leaders to follow him.
Nehru scores high on vision , but middling scores in all the other input and output variables.
The below is the table of ranks:
Also the rankings are NOT comparisons between leaders. They operated in widely different contexts. They are just a way to aggregate and rank them basis their scores.

The Role of GenAI
This project was co-created with Generative AI. The human provided direction, historical insight, and iterative corrections. The AI enabled rigorous structure, memory, and hypothesis testing at scale. Neither alone would have sufficed. Together, we built something both rigorous and—perhaps—useful. However , it was a long chat. And by the end Chat Gpt had started massively hallucinating.
Final Reflections
This is not an academic exercise. Nor is it rigorous. It is interesting though. I don’t claim to have answered the question of leadership fully. But I have offered a way to structure it—one that blends history, analytics, and AI to illuminate what makes certain leaders resonate across time. This is a living model, and I invite comments and critiques.